[ https://issues.apache.org/jira/browse/TIKA-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217218#comment-14217218 ]
Sean Zhao commented on TIKA-1482: --------------------------------- Hello Nick, Thank you very much for quick response. And here is the stack trace, {quote} org.apache.tika.exception.TikaException: Unexpected error in forked server process at org.apache.tika.fork.ForkParser.parse(ForkParser.java:158) at com.tika.TikaForkTest.batchExtractFile(TikaForkTest.java:76) at com.tika.TikaForkTest.main(TikaForkTest.java:29) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:295) at org.apache.pdfbox.io.RandomAccessBuffer.write(RandomAccessBuffer.java:262) at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:108) at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:132) at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:102) at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:314) at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:262) at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:188) at org.apache.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStreamArray.java:197) at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:110) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344) at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:130) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:159) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:121) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.tika.fork.ForkServer.call(ForkServer.java:144) at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:124) at org.apache.tika.fork.ForkServer.main(ForkServer.java:69) {quote} Best, Sean > ForkParser throws exceptions when process some large pdf files > -------------------------------------------------------------- > > Key: TIKA-1482 > URL: https://issues.apache.org/jira/browse/TIKA-1482 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.6 > Environment: Windows 7_x64 / JDK 1.7.0_17 > Reporter: Sean Zhao > Priority: Critical > Fix For: 1.6 > > Attachments: SRCH-13412.pdf > > > In Tika 1.6, ForkParser throws org.apache.tika.exception.TikaException , > message:Unexpected error in forked server process, when parsing some large > pdf files. While tika 1.3 won't. -- This message was sent by Atlassian JIRA (v6.3.4#6332)