[ 
https://issues.apache.org/jira/browse/TIKA-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504703#comment-13504703
 ] 

Michael McCandless commented on TIKA-1033:
------------------------------------------

Interesting: with PowerPoint 2007, when I double-click the embedded chart, it 
pops up a dialogue box saying "To edit this chart using the new features 
available in the 2007 Microsoft Office system, you must first convert it to the 
2007 Office system format.  Do you want to convert this chart to the new 
format?  [Convert] [Convert All] [Edit Existing]".  If I click [Edit Existing] 
it lets me edit the chart data in what looks like Excel, in "Compatibility 
Mode".

OK I'll open a POI bug and reference back to this issue...

Thanks Nick.
                
> Tika doesn't parse embedded OLE Chart/Graph objects
> ---------------------------------------------------
>
>                 Key: TIKA-1033
>                 URL: https://issues.apache.org/jira/browse/TIKA-1033
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: emb.ppt
>
>
> I have an example ppt that embeds a chart, but Tika mis-identifies it
> as an XLS document.
> The progID (oleShape.getProgID() in
> HSLFExtractor.handleSlideEmbeddedResources) is MSGraph.Chart.8 ... and
> we seem to detect it as Excel (application/vnd.ms-excel) but then the
> ExcelExtractor hits this exception:
> {noformat}
> org.apache.poi.hssf.record.RecordFormatException: Unable to construct record 
> instance
>       at 
> org.apache.poi.hssf.record.RecordFactory$ReflectionConstructorRecordCreator.create(RecordFactory.java:65)
>       at 
> org.apache.poi.hssf.record.RecordFactory.createSingleRecord(RecordFactory.java:301)
>       at 
> org.apache.poi.hssf.record.RecordFactoryInputStream.readNextRecord(RecordFactoryInputStream.java:285)
>       at 
> org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:251)
>       at 
> org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:143)
>       at 
> org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:106)
>       at 
> org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:302)
>       at 
> org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:147)
> {noformat}
> Since DelegatingParser silently suppresses all exceptions, when you
> run TikaCLI you won't see any exception nor text extracted, but if you
> run with -z, it will save 1.xls which if you then try to parse with
> TikaCLI hits the above exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to