[jira] [Commented] (TIKA-636) Taking very high heap space while parsing docx - Resulting in OOM in tha app

Jayesh K Rajpurohit (JIRA) Fri, 08 Apr 2011 00:03:52 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017319#comment-13017319
 ]


Jayesh K Rajpurohit commented on TIKA-636:
------------------------------------------

Thanks Maxim, Yes, the number of xmlbeans objects are taking the toll.
I have tried using the Local SAX Parser code parsing the word/document.xml. It 
spitted a String of size 3 MB for a 3MB docx (looks like there was repitition 
of data) but Tika only spits 100KB for that. But the native code took only 3MB

So when can we expect this as part of the Tika release ??

Thanks !

> Taking very high heap space while parsing docx - Resulting in OOM in tha app
> ----------------------------------------------------------------------------
>
>                 Key: TIKA-636
>                 URL: https://issues.apache.org/jira/browse/TIKA-636
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9
>         Environment: Linux box
> JDK 1.6
>            Reporter: Jayesh K Rajpurohit
>
> I am using Tika-core-0.9 jar and poi 3.2-Final jar and poi-3.7 jars for 
> parsing the documents. But while parsing 3MB docx it is using 500 MB of RAM 
> space which is too high resulting in OOM in the application.
> Do I have to tweak in at some place for reducing down the memory consumption.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-636) Taking very high heap space while parsing docx - Resulting in OOM in tha app

Reply via email to