[ 
https://issues.apache.org/jira/browse/TIKA-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Anderson updated TIKA-795:
---------------------------------

    Description: 
POI-3.8-beta5-daily exposed bug after poi.revision 1198658.  (POI bugzilla bug 
#52262 already opened for root cause).

Bug was discovered using Daily builds of both TIKA and POI.  Root cause of 
issue lies within POI due to an accidental change of the return type provided 
by XSLFSlide.getMasterSheet().  However, TIKA is affected by this bug by making 
use of this call with an unused variable.

I've included a patch file which removes the instance of the unused variable.  
An example multi-embedded word document example used with a Tika based 
RecursiveMetadataParser is also included.

java.lang.NoSuchMethodError: 
org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster;
        at 
org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81)
        at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
        at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
        at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
        at 
com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
        at 
org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
        at 
org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
        at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228)
        at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148)
        at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113)
        at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
        at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
        at 
com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)




  was:
POI-3.8-beta5-daily exposed bug after poi.revision 1190347.  (POI bugzilla bug 
#52262 already opened for root cause).

Bug was discovered using Daily builds of both TIKA and POI.  Root cause of 
issue lies within POI due to an accidental change of the return type provided 
by XSLFSlide.getMasterSheet().  However, TIKA is affected by this bug by making 
use of this call with an unused variable.

I've included a patch file which removes the instance of the unused variable.  
An example multi-embedded word document example used with a Tika based 
RecursiveMetadataParser is also included.

java.lang.NoSuchMethodError: 
org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster;
        at 
org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81)
        at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
        at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
        at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
        at 
com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
        at 
org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
        at 
org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
        at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228)
        at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148)
        at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113)
        at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
        at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
        at 
com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)




    
> [PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - 
> XSLFSlide.getMasterSheet()
> ---------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-795
>                 URL: https://issues.apache.org/jira/browse/TIKA-795
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.1
>            Reporter: Jeremy Anderson
>              Labels: patch, poi
>         Attachments: Patch_795_XSLF.patch, testWORD_embeded.docx
>
>
> POI-3.8-beta5-daily exposed bug after poi.revision 1198658.  (POI bugzilla 
> bug #52262 already opened for root cause).
> Bug was discovered using Daily builds of both TIKA and POI.  Root cause of 
> issue lies within POI due to an accidental change of the return type provided 
> by XSLFSlide.getMasterSheet().  However, TIKA is affected by this bug by 
> making use of this call with an unused variable.
> I've included a patch file which removes the instance of the unused variable. 
>  An example multi-embedded word document example used with a Tika based 
> RecursiveMetadataParser is also included.
> java.lang.NoSuchMethodError: 
> org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster;
>       at 
> org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81)
>       at 
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
>       at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
>       at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
>       at 
> com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
>       at 
> org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
>       at 
> org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
>       at 
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228)
>       at 
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148)
>       at 
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113)
>       at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
>       at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
>       at 
> com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to