https://issues.apache.org/bugzilla/show_bug.cgi?id=51921

             Bug #: 51921
           Summary: Get exception in text extraction  with poi 3.7 jar
           Product: POI
           Version: 3.7
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: major
          Priority: P2
         Component: HDF
        AssignedTo: [email protected]
        ReportedBy: [email protected]
    Classification: Unclassified


Currently i am using the apache tika 0.9[plus tika 0.9's dependent jar file]
and apache poi 3.7 jar for text extraction .

i get the exception when i used some Microsoft office document. i have attached
document zip file. Please check it with said jar file.

i get the following exception when  we upload '1_1.doc' document from the
attached zip file.

Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.microsoft.OfficeParser@d8e54c
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
    at
org.apache.tika.parser.ParsingReader$ParsingTask.run(ParsingReader.java:232)
    ... 1 more
Caused by: java.lang.NullPointerException
    at
org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.uncompressCHP(CharacterSprmUncompressor.java:39)
    at org.apache.poi.hwpf.model.CHPX.getCharacterProperties(CHPX.java:61)
    at org.apache.poi.hwpf.usermodel.CharacterRun.<init>(CharacterRun.java:98)
    at org.apache.poi.hwpf.usermodel.Range.getCharacterRun(Range.java:797)
    at
org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.java:191)
    at
org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:430)
    at
org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:420)
    at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:75)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:182)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
    ... 4 more



i get the following exception when  we upload 'Book1.xlsb' and 'MSPPT2007.thmx'
documents from the attached zip file.

Caused by: org.apache.tika.exception.TikaException: TIKA-418: RuntimeException
while getting content for thmx and xps file types
    at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
    at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:67)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
    at
org.apache.tika.parser.ParsingReader$ParsingTask.run(ParsingReader.java:232)
    ... 1 more
Caused by: java.lang.IllegalArgumentException: No supported documents found in
the OOXML package (found
application/vnd.ms-excel.sheet.binary.macroEnabled.main)
    at
org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:191)
    at
org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:152)
    at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:65)
    ... 6 more

i get the following exception when  we upload 'MSPPT2007.xps' document from the
attached zip file.

Caused by: org.apache.tika.exception.TikaException: Error creating OOXML
extractor
    at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:90)
    at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:67)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
    at
org.apache.tika.parser.ParsingReader$ParsingTask.run(ParsingReader.java:232)
    ... 1 more
Caused by: java.lang.IllegalArgumentException: Invalid OOXML Package received -
expected 1 core document, found 0
    at
org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:161)
    at
org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:152)
    at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:65)
    ... 6 more


Please try to resolve this issue.

Thanks & Regards
Yatin Baraiya

--- Comment #1 from Yegor Kozlov <[email protected]> 2011-10-04 12:27:47 UTC ---
*** Bug 51920 has been marked as a duplicate of this bug. ***

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to