[ 
https://issues.apache.org/jira/browse/TIKA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152405#comment-13152405
 ] 

Jukka Zitting commented on TIKA-734:
------------------------------------

Did you see the parse() method [1] that returns a java.io.Reader instead of a 
String? That should achieve the same thing you're doing.

Note however that only some of the parsers in Tika support such streaming. 
Others like the MS Office parser will in any case parse the entire input 
document or at least significant parts of it before starting to output any of 
the extracted content.

[1] http://tika.apache.org/1.0/api/org/apache/tika/Tika.html#parse(java.io.File)
                
> Out of memory exception with Xlsx file less than 5 MB
> -----------------------------------------------------
>
>                 Key: TIKA-734
>                 URL: https://issues.apache.org/jira/browse/TIKA-734
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7
>         Environment: Windows Vista , JUnit test cases running in RAD, JVM 
> heap memory - 500MB
>            Reporter: Anirban Mitra
>         Attachments: Sample BIG Excel 2007 File.xls
>
>
> I am trying to parse and extract a pattern from Xlsx files.i tried using a 5 
> MB file and when i run my
> JUnit test cases, it fails and i see heap memory out of size exception.Do we 
> have any resolution for the same ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to