Sorry for cross-posting, but the tika-ml does not seem to be  too "lively":
I am trying to make use of the ForkParser. Unfortunately I am getting „Lost 
connection to a forked server process“  for an (encrypted) pdf which I can 
extract „in-process“. Extracting the document "in-process" takes approx 40s 
(!). Also, extracting other (smaller) docs works in/with the ForkParser. 

Memory should be no problem:
forkParser.setJavaCommand("java -Xmx2048m -Xdebug");

Running the unitTest with the forkparser the test stops after 10seconds. The 
console output is alike:
...
SLF4J: Found binding in [tika-in-memory://localhost/3]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type 
[ch.qos.logback.classic.util.ContextSelectorStaticBinder]
07:28:01.909 [main] INFO  o.apache.pdfbox.pdfparser.PDFParser - Document is 
encrypted
07:28:02.239 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{706, 
0}
07:28:02.239 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{707, 
0}
07:28:02.239 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{708, 
0} ...
07:28:02.249 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{752, 
0}
07:28:02.249 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{753, 
0}
07:28:02.249 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{754, 
0}
07:28:11.465 [main] ERROR ch.mysign.sky.indexing.IndexUtility - failed to 
extract text from input stream
org.apache.tika.exception.TikaException: Failed to communicate with a forked 
parser process. The process has most likely crashed due to some error like 
running out of memory. A new process will be started for the next parsing 
request.
        at org.apache.tika.fork.ForkParser.parse(ForkParser.java:142) 
~[tika-core.jar:1.7]
        at 
ch.mysign.sky.indexing.IndexUtility.extractTextFrom(IndexUtility.java:158) 
[target/:na]
        at 
ch.mysign.sky.indexing.IndexUtility.extractTextFrom(IndexUtility.java:84) 
[target/:na]
        at 
ch.mysign.sky.indexing.IndexUtility.extractTextFrom(IndexUtility.java:70) 
[target/:na]
        at 
ch.mysign.sky.indexing.IndexUtilityTest.diesesPdfAuslesenDauertEwig(IndexUtilityTest.java:193)
 [target/:na]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[na:1.8.0_25]
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[na:1.8.0_25]
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[na:1.8.0_25] ...
        at org.junit.runners.ParentRunner.run(ParentRunner.java:309) 
[selenium-server-standalone.jar:na]
        at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
 [.cp/:na]
        at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) 
[.cp/:na]
        at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
 [.cp/:na]
        at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
 [.cp/:na]
        at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
 [.cp/:na]
        at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
 [.cp/:na] Caused by: java.io.IOException: Lost connection to a forked server 
process
        at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:191) 
~[tika-core.jar:1.7]
        at org.apache.tika.fork.ForkClient.call(ForkClient.java:125) 
~[tika-core.jar:1.7]
        at org.apache.tika.fork.ForkParser.parse(ForkParser.java:134) 
~[tika-core.jar:1.7]
        ... 38 common frames omitted

Any timeouts I am running in? What else can I investigate on?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to