[ 
https://issues.apache.org/jira/browse/TIKA-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039719#comment-13039719
 ] 

Ken Krugler commented on TIKA-521:
----------------------------------

Tika CLI uses BoilerpipeContentHandler in regular (don't include markup) mode. 
Here the content handler is essentially dispatching to the Boilerpipe package, 
so any memory issues would be in that 3rd party code base.

> OutOfMemoryError Parsing XSLX File
> ----------------------------------
>
>                 Key: TIKA-521
>                 URL: https://issues.apache.org/jira/browse/TIKA-521
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7, 0.8
>            Reporter: Stephen Duncan Jr
>            Assignee: Nick Burch
>             Fix For: 1.0
>
>         Attachments: Out of memory issue in 1.0.jpg, Out of memory issue in 
> 1.0.jpg, TikaExcelEventBasedExtraction.diff, memory-test.xlsx, tika-diff.txt, 
> tika-new-files.tar.bz2
>
>
> I have several XSLX files I'm trying to parse with Tika that are failing with 
> an OutOfMemoryError even when using  a large heap size.  For instance the 
> attached 1.26MB excel file fails using a 512MB heap.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to