https://issues.apache.org/bugzilla/show_bug.cgi?id=51920
Bug #: 51920
Summary: Get exception in text extraction with poi 3.7 jar
Product: POI
Version: 3.7
Platform: PC
OS/Version: Windows XP
Status: RESOLVED
Severity: major
Priority: P2
Component: HDF
AssignedTo: [email protected]
ReportedBy: [email protected]
Classification: Unclassified
Yegor Kozlov <[email protected]> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |DUPLICATE
Currently i am using the apache tika 0.9[plus tika 0.9's dependent jar file]
and apache poi 3.7 jar for text extraction .
i get the exception when i used some Microsoft office document. i have attached
document zip file. Please check it with said jar file.
i get the following exception when we upload '1_1.doc' document from the
attached zip file.
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.microsoft.OfficeParser@d8e54c
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at
org.apache.tika.parser.ParsingReader$ParsingTask.run(ParsingReader.java:232)
... 1 more
Caused by: java.lang.NullPointerException
at
org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.uncompressCHP(CharacterSprmUncompressor.java:39)
at org.apache.poi.hwpf.model.CHPX.getCharacterProperties(CHPX.java:61)
at org.apache.poi.hwpf.usermodel.CharacterRun.<init>(CharacterRun.java:98)
at org.apache.poi.hwpf.usermodel.Range.getCharacterRun(Range.java:797)
at
org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.java:191)
at
org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:430)
at
org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:420)
at
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:75)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:182)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
... 4 more
i get the following exception when we upload 'Book1.xlsb' and 'MSPPT2007.thmx'
documents from the attached zip file.
Caused by: org.apache.tika.exception.TikaException: TIKA-418: RuntimeException
while getting content for thmx and xps file types
at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:67)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at
org.apache.tika.parser.ParsingReader$ParsingTask.run(ParsingReader.java:232)
... 1 more
Caused by: java.lang.IllegalArgumentException: No supported documents found in
the OOXML package (found
application/vnd.ms-excel.sheet.binary.macroEnabled.main)
at
org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:191)
at
org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:152)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:65)
... 6 more
i get the following exception when we upload 'MSPPT2007.xps' document from the
attached zip file.
Caused by: org.apache.tika.exception.TikaException: Error creating OOXML
extractor
at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:90)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:67)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at
org.apache.tika.parser.ParsingReader$ParsingTask.run(ParsingReader.java:232)
... 1 more
Caused by: java.lang.IllegalArgumentException: Invalid OOXML Package received -
expected 1 core document, found 0
at
org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:161)
at
org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:152)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:65)
... 6 more
Please try to resolve this issue.
Thanks & Regards
Yatin Baraiya
--- Comment #1 from Yegor Kozlov <[email protected]> 2011-10-04 12:27:47 UTC ---
*** This bug has been marked as a duplicate of bug 51921 ***
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]