[ https://issues.apache.org/jira/browse/TIKA-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080955#comment-13080955 ]
Nick Burch commented on TIKA-636: --------------------------------- Please remember - Tika is a volunteer project If this bug matters to you, please help us with working on it. As Maxim has pointed out, we'd need an event based parser for DOCX files much as we already do for XLSX. Likely the existing POI usermodel code could be used for the other streams to make life easy, but the document.xml part will want to be SAX parsed > Taking very high heap space while parsing docx - Resulting in OOM in tha app > ---------------------------------------------------------------------------- > > Key: TIKA-636 > URL: https://issues.apache.org/jira/browse/TIKA-636 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.9 > Environment: Linux box > JDK 1.6 > Reporter: Jayesh K Rajpurohit > > I am using Tika-core-0.9 jar and poi 3.2-Final jar and poi-3.7 jars for > parsing the documents. But while parsing 3MB docx it is using 500 MB of RAM > space which is too high resulting in OOM in the application. > Do I have to tweak in at some place for reducing down the memory consumption. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira