[jira] [Commented] (TIKA-636) Taking very high heap space while parsing docx - Resulting in OOM in tha app

2011-10-05 Thread Jukka Zitting (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121157#comment-13121157
 ] 

Jukka Zitting commented on TIKA-636:


Do you still see this problem with Tika 0.10? If yes, please attach an example 
file that can be used to reproduce the issue.

> Taking very high heap space while parsing docx - Resulting in OOM in tha app
> 
>
> Key: TIKA-636
> URL: https://issues.apache.org/jira/browse/TIKA-636
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.9
> Environment: Linux box
> JDK 1.6
>Reporter: Jayesh K Rajpurohit
>
> I am using Tika-core-0.9 jar and poi 3.2-Final jar and poi-3.7 jars for 
> parsing the documents. But while parsing 3MB docx it is using 500 MB of RAM 
> space which is too high resulting in OOM in the application.
> Do I have to tweak in at some place for reducing down the memory consumption.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TIKA-636) Taking very high heap space while parsing docx - Resulting in OOM in tha app

2011-08-08 Thread Jukka Zitting (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080988#comment-13080988
 ] 

Jukka Zitting commented on TIKA-636:


As a related point, see TIKA-416 for a solution that can be used to prevent an 
OOM caused by a parsing process from wreaking havoc in your JVM. Instead of 
reducing memory consumption, TIKA-416 sandboxes the parser to a separate JVM 
process where it can safely fail with OOM or other errors.

> Taking very high heap space while parsing docx - Resulting in OOM in tha app
> 
>
> Key: TIKA-636
> URL: https://issues.apache.org/jira/browse/TIKA-636
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.9
> Environment: Linux box
> JDK 1.6
>Reporter: Jayesh K Rajpurohit
>
> I am using Tika-core-0.9 jar and poi 3.2-Final jar and poi-3.7 jars for 
> parsing the documents. But while parsing 3MB docx it is using 500 MB of RAM 
> space which is too high resulting in OOM in the application.
> Do I have to tweak in at some place for reducing down the memory consumption.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TIKA-636) Taking very high heap space while parsing docx - Resulting in OOM in tha app

2011-08-08 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080955#comment-13080955
 ] 

Nick Burch commented on TIKA-636:
-

Please remember - Tika is a volunteer project

If this bug matters to you, please help us with working on it. As Maxim has 
pointed out, we'd need an event based parser for DOCX files much as we already 
do for XLSX. Likely the existing POI usermodel code could be used for the other 
streams to make life easy, but the document.xml part will want to be SAX parsed

> Taking very high heap space while parsing docx - Resulting in OOM in tha app
> 
>
> Key: TIKA-636
> URL: https://issues.apache.org/jira/browse/TIKA-636
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.9
> Environment: Linux box
> JDK 1.6
>Reporter: Jayesh K Rajpurohit
>
> I am using Tika-core-0.9 jar and poi 3.2-Final jar and poi-3.7 jars for 
> parsing the documents. But while parsing 3MB docx it is using 500 MB of RAM 
> space which is too high resulting in OOM in the application.
> Do I have to tweak in at some place for reducing down the memory consumption.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TIKA-636) Taking very high heap space while parsing docx - Resulting in OOM in tha app

2011-08-08 Thread Nicholas Dodd (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080944#comment-13080944
 ] 

Nicholas Dodd commented on TIKA-636:


I am really surprised this is not scheduled for the 1.0 release. We also are 
seeing 500MB RAM usage for small docx files - this is simply not a shippable 
bug!

> Taking very high heap space while parsing docx - Resulting in OOM in tha app
> 
>
> Key: TIKA-636
> URL: https://issues.apache.org/jira/browse/TIKA-636
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.9
> Environment: Linux box
> JDK 1.6
>Reporter: Jayesh K Rajpurohit
>
> I am using Tika-core-0.9 jar and poi 3.2-Final jar and poi-3.7 jars for 
> parsing the documents. But while parsing 3MB docx it is using 500 MB of RAM 
> space which is too high resulting in OOM in the application.
> Do I have to tweak in at some place for reducing down the memory consumption.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (TIKA-636) Taking very high heap space while parsing docx - Resulting in OOM in tha app

2011-04-08 Thread Jayesh K Rajpurohit (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017322#comment-13017322
 ] 

Jayesh K Rajpurohit commented on TIKA-636:
--

What I meant is the Fix for the OOM issue as part of tika release ? Thanks 

> Taking very high heap space while parsing docx - Resulting in OOM in tha app
> 
>
> Key: TIKA-636
> URL: https://issues.apache.org/jira/browse/TIKA-636
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.9
> Environment: Linux box
> JDK 1.6
>Reporter: Jayesh K Rajpurohit
>
> I am using Tika-core-0.9 jar and poi 3.2-Final jar and poi-3.7 jars for 
> parsing the documents. But while parsing 3MB docx it is using 500 MB of RAM 
> space which is too high resulting in OOM in the application.
> Do I have to tweak in at some place for reducing down the memory consumption.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TIKA-636) Taking very high heap space while parsing docx - Resulting in OOM in tha app

2011-04-08 Thread Jayesh K Rajpurohit (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017319#comment-13017319
 ] 

Jayesh K Rajpurohit commented on TIKA-636:
--

Thanks Maxim, Yes, the number of xmlbeans objects are taking the toll.
I have tried using the Local SAX Parser code parsing the word/document.xml. It 
spitted a String of size 3 MB for a 3MB docx (looks like there was repitition 
of data) but Tika only spits 100KB for that. But the native code took only 3MB

So when can we expect this as part of the Tika release ??

Thanks !

> Taking very high heap space while parsing docx - Resulting in OOM in tha app
> 
>
> Key: TIKA-636
> URL: https://issues.apache.org/jira/browse/TIKA-636
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.9
> Environment: Linux box
> JDK 1.6
>Reporter: Jayesh K Rajpurohit
>
> I am using Tika-core-0.9 jar and poi 3.2-Final jar and poi-3.7 jars for 
> parsing the documents. But while parsing 3MB docx it is using 500 MB of RAM 
> space which is too high resulting in OOM in the application.
> Do I have to tweak in at some place for reducing down the memory consumption.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TIKA-636) Taking very high heap space while parsing docx - Resulting in OOM in tha app

2011-04-07 Thread Maxim Valyanskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017311#comment-13017311
 ] 

Maxim Valyanskiy commented on TIKA-636:
---

It is known problem in POI, afaik there is no event-model for parsing docx, so 
we had to build complete object (xmlbean) tree to process it

> Taking very high heap space while parsing docx - Resulting in OOM in tha app
> 
>
> Key: TIKA-636
> URL: https://issues.apache.org/jira/browse/TIKA-636
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.9
> Environment: Linux box
> JDK 1.6
>Reporter: Jayesh K Rajpurohit
>
> I am using Tika-core-0.9 jar and poi 3.2-Final jar and poi-3.7 jars for 
> parsing the documents. But while parsing 3MB docx it is using 500 MB of RAM 
> space which is too high resulting in OOM in the application.
> Do I have to tweak in at some place for reducing down the memory consumption.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira