[ 
https://issues.apache.org/jira/browse/TIKA-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217218#comment-14217218
 ] 

Sean Zhao commented on TIKA-1482:
---------------------------------

Hello Nick,
Thank you very much for quick response. And here is the stack trace,
{quote}
org.apache.tika.exception.TikaException: Unexpected error in forked server 
process
        at org.apache.tika.fork.ForkParser.parse(ForkParser.java:158)
        at com.tika.TikaForkTest.batchExtractFile(TikaForkTest.java:76)
        at com.tika.TikaForkTest.main(TikaForkTest.java:29)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:295)
        at 
org.apache.pdfbox.io.RandomAccessBuffer.write(RandomAccessBuffer.java:262)
        at 
org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:108)
        at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:132)
        at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:102)
        at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:314)
        at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:262)
        at 
org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:188)
        at 
org.apache.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStreamArray.java:197)
        at 
org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:110)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
        at 
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
        at 
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
        at 
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
        at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:130)
        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:159)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:121)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.tika.fork.ForkServer.call(ForkServer.java:144)
        at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:124)
        at org.apache.tika.fork.ForkServer.main(ForkServer.java:69)
{quote}

Best,
Sean

> ForkParser throws exceptions when process some large pdf files
> --------------------------------------------------------------
>
>                 Key: TIKA-1482
>                 URL: https://issues.apache.org/jira/browse/TIKA-1482
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.6
>         Environment: Windows 7_x64 / JDK 1.7.0_17
>            Reporter: Sean Zhao
>            Priority: Critical
>             Fix For: 1.6
>
>         Attachments: SRCH-13412.pdf
>
>
> In Tika 1.6, ForkParser throws org.apache.tika.exception.TikaException , 
> message:Unexpected error in forked server process, when parsing some large 
> pdf files.  While tika 1.3 won't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to