[
https://issues.apache.org/jira/browse/TIKA-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077416#comment-17077416
]
spadezhang commented on TIKA-3072:
----------------------------------
i have tried this file with tika-1.24,and it parsed error {code}
Apache Tika was unable to parse the documentApache Tika was unable to parse the
documentat D:\download\0000431.xls.
The full exception stack trace is included below:
java.lang.OutOfMemoryError: Java heap space at
java.util.Arrays.copyOf(Arrays.java:3308) at
java.util.BitSet.ensureCapacity(BitSet.java:337) at
java.util.BitSet.expandTo(BitSet.java:352) at
java.util.BitSet.set(BitSet.java:447) at
de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
at
org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
at
org.apache.tika.sax.TeeContentHandler.characters(TeeContentHandler.java:102) at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:47)
at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:83)
at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:141)
at
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:288)
at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:284)
at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:311)
at org.apache.tika.parser.microsoft.TextCell.render(TextCell.java:34) at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processSheet(ExcelExtractor.java:646)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:416)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord(ExcelExtractor.java:367)
at
org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:92)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener$TikaFormatTrackingHSSFListener.processRecord(ExcelExtractor.java:689)
at
org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:106)
at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:172)
at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:129)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:343)
at
org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:172)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:183)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
{code}
> Seeing org.apache.tika.exception.TikaException: Unexpected RuntimeException
> for an XLS file
> -------------------------------------------------------------------------------------------
>
> Key: TIKA-3072
> URL: https://issues.apache.org/jira/browse/TIKA-3072
> Project: Tika
> Issue Type: Bug
> Reporter: Muhammad Yasir Khan
> Priority: Major
> Attachments: 0000431.xls
>
>
> [^0000431.xls]
> {code:java}
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@5d216317
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)