[ 
https://issues.apache.org/jira/browse/TIKA-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080955#comment-13080955
 ] 

Nick Burch commented on TIKA-636:
---------------------------------

Please remember - Tika is a volunteer project

If this bug matters to you, please help us with working on it. As Maxim has 
pointed out, we'd need an event based parser for DOCX files much as we already 
do for XLSX. Likely the existing POI usermodel code could be used for the other 
streams to make life easy, but the document.xml part will want to be SAX parsed

> Taking very high heap space while parsing docx - Resulting in OOM in tha app
> ----------------------------------------------------------------------------
>
>                 Key: TIKA-636
>                 URL: https://issues.apache.org/jira/browse/TIKA-636
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9
>         Environment: Linux box
> JDK 1.6
>            Reporter: Jayesh K Rajpurohit
>
> I am using Tika-core-0.9 jar and poi 3.2-Final jar and poi-3.7 jars for 
> parsing the documents. But while parsing 3MB docx it is using 500 MB of RAM 
> space which is too high resulting in OOM in the application.
> Do I have to tweak in at some place for reducing down the memory consumption.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to