[ 
https://issues.apache.org/jira/browse/TIKA-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077416#comment-17077416
 ] 

spadezhang commented on TIKA-3072:
----------------------------------

i have tried this file with tika-1.24,and it parsed error {code}
Apache Tika was unable to parse the documentApache Tika was unable to parse the 
documentat D:\download\0000431.xls.
The full exception stack trace is included below:
java.lang.OutOfMemoryError: Java heap space at 
java.util.Arrays.copyOf(Arrays.java:3308) at 
java.util.BitSet.ensureCapacity(BitSet.java:337) at 
java.util.BitSet.expandTo(BitSet.java:352) at 
java.util.BitSet.set(BitSet.java:447) at 
de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
 at 
org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
 at 
org.apache.tika.sax.TeeContentHandler.characters(TeeContentHandler.java:102) at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
 at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
 at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
 at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
 at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
 at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:47) 
at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:83) 
at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:141) 
at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:288) 
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:284)
 at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:311)
 at org.apache.tika.parser.microsoft.TextCell.render(TextCell.java:34) at 
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processSheet(ExcelExtractor.java:646)
 at 
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:416)
 at 
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord(ExcelExtractor.java:367)
 at 
org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:92)
 at 
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener$TikaFormatTrackingHSSFListener.processRecord(ExcelExtractor.java:689)
 at 
org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:106)
 at 
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:172)
 at 
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:129)
 at 
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:343)
 at 
org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:172) 
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:183) 
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131) 
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
{code}

> Seeing org.apache.tika.exception.TikaException: Unexpected RuntimeException 
> for an XLS file
> -------------------------------------------------------------------------------------------
>
>                 Key: TIKA-3072
>                 URL: https://issues.apache.org/jira/browse/TIKA-3072
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Muhammad Yasir Khan
>            Priority: Major
>         Attachments: 0000431.xls
>
>
> [^0000431.xls]
> {code:java}
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@5d216317
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to