[ https://issues.apache.org/jira/browse/TIKA-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139875#comment-17139875 ]
Tim Allison commented on TIKA-3097: ----------------------------------- Not that I'm aware of. If you build your own (extending org.apache.tika.sax.ContentHandlerDecorator?), make sure to use a buffer because there's no guarantee that {{characters(char[] ch, int start, int length)}} will be called on a logical unit of text. > Out of memory while parsing docx > -------------------------------- > > Key: TIKA-3097 > URL: https://issues.apache.org/jira/browse/TIKA-3097 > Project: Tika > Issue Type: Bug > Components: core, parser > Affects Versions: 1.24 > Reporter: suchendra > Priority: Major > Attachments: Screenshot from 2020-05-07 08-14-25.png, samplefile.txt, > test.docx > > > I have written simple Scala code to extract the content from uploaded file > which is docx. JVM goes OOM when tika tries to parse the file. I have > configured JVM heap to 1GB and tried with 2GB same issue occurs, issue both > with jar as well as in my code. > Attached the file for reference. -- This message was sent by Atlassian Jira (v8.3.4#803005)