[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705283#comment-16705283 ] Hudson commented on TIKA-2776: -- SUCCESS: Integrated in Jenkins build tika-branch-1x #133 (See [https://builds.apache.org/job/tika-branch-1x/133/]) TIKA-2776 -- improve documentation for -maxFiles (tallison: [https://github.com/apache/tika/commit/4141411773a321fe614167584d23e376c4dbcb3c]) * (edit) tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, > log4j_child.xml, man_tika.zip, tikalogchild.log > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705281#comment-16705281 ] Hudson commented on TIKA-2776: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1601 (See [https://builds.apache.org/job/Tika-trunk/1601/]) TIKA-2776 -- improve documentation for -maxFiles (tallison: [https://github.com/apache/tika/commit/a477d73ac56c169075b5c9ea66bf57be1f3dc672]) * (edit) tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, > log4j_child.xml, man_tika.zip, tikalogchild.log > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705251#comment-16705251 ] Hudson commented on TIKA-2776: -- UNSTABLE: Integrated in Jenkins build tika-2.x-windows #354 (See [https://builds.apache.org/job/tika-2.x-windows/354/]) TIKA-2776 -- improve documentation for -maxFiles (tallison: rev a477d73ac56c169075b5c9ea66bf57be1f3dc672) * (edit) tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, > log4j_child.xml, man_tika.zip, tikalogchild.log > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704474#comment-16704474 ] Mario Bisonti commented on TIKA-2776: - Hallo Tim. I obtained a restart of child: 2018-11-30 01:21:01 INFO TikaServerWatchDog:104 - About to restart the child process 2018-11-30 01:21:02 INFO TikaServerWatchDog:106 - Successfully restarted child process -- 11 restarts so far) 2018-11-30 05:39:09 WARN TikaServerWatchDog:253 - Received status from child: HIT_MAX 2018-11-30 05:39:10 INFO TikaServerWatchDog:104 - About to restart the child process 2018-11-30 05:39:12 INFO TikaServerWatchDog:106 - Successfully restarted child process -- 13 restarts so far) 2018-11-30 08:38:03 WARN TikaServerWatchDog:253 - Received status from child: HIT_MAX 2018-11-30 08:38:03 INFO TikaServerWatchDog:104 - About to restart the child process 2018-11-30 08:38:04 INFO TikaServerWatchDog:106 - Successfully restarted child process -- 15 restarts so far) Is this related about the parameter : _{{-maxFiles}}: restart the child process after it has processed {{maxFiles}}. If there is a slow building memory leak, this restart of the JVM should help._ I didn't set the parameter. Which is default value of maxFiles ? Thanks Mario > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, > log4j_child.xml, man_tika.zip, tikalogchild.log > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703173#comment-16703173 ] Mario Bisonti commented on TIKA-2776: - Hi Tim. 3. Yes, you are right, when the parent process, never shutdown, but only the child restarted. Your opinion/advice: Yes, I workarounded the restart of the child increasing the timeout, in the future I will ask if there is the possibility that ManifoldCF can manage the case of timeout. In this moment, I am tring to end the indexing of the 70 documents that it stops many times due to java exception of the agent that manipulate the indexing for ManifoldCF You do not have not to thank me, because I thank to you for the big support > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, > log4j_child.xml, man_tika.zip, tikalogchild.log > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701938#comment-16701938 ] Tim Allison commented on TIKA-2776: --- Thank you for the follow up! To confirm/summarize... 1. I introduced a change in behavior (bug) into legacy server mode in 1.19 (maybe 1.18?) that causes tika-server to return 'not available' forever after an OOM. The legacy behavior was to ignore OOMs and _hope_ nothing too bad happened to your JVM. That said, the change of behavior I introduced is bad, very bad. I've fixed this in 1.20, which should be out in a few weeks. 2. tika-server in -spawnChild mode was restarting the child because you were getting timeouts. This caused problems with Manifold. You've bumped out the timeout to ~16 minutes, and you currently don't have any files that take longer than that...so all appears to work for now. 3. I _think_ we found that {{-spawnChild}} was behaving as it was designed to do. To confirm, we did not find that the parent process shutdown, and we did find that the child restarted within a few seconds. Is this correct? My opinion/advice: Depending on the nature of your documents, if you have large enough batches of crazy enough documents, you will eventually hit an infinite loop, and the child will timeout and restart. So, for now, you've wallpapered over a problem by bumping out the timeout, but the timeouts will eventually happen. So, what can we do in Tika, what can Manifold do, what can you do to help avoid this eventuality? Again, many, many thanks for your patience getting the logging up and running. I still need to improve our wiki on logging with tika-server (based on our interaction) even more. > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, > log4j_child.xml, man_tika.zip, tikalogchild.log > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701864#comment-16701864 ] Mario Bisonti commented on TIKA-2776: - Hallo Tim tesseract is not installed It seems that with the parameter "-spawnChild -taskTimeoutMillis 100" no more shutdown the child I will update about the evolution of my issue with the client that index many documents > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, > log4j_child.xml, man_tika.zip, tikalogchild.log > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699108#comment-16699108 ] Tim Allison commented on TIKA-2776: --- Three cheers for logging, and thank you for your patience in configuring those! Yes, exactly! It looks like the child process restarted at 2018-11-26 13:18:26 {{2018-11-26 13:18:26 INFO MetadataResource:431 - meta (application/vnd.openxmlformats}} and then processed more files successfully. It can take few seconds for the server to restart, and it looks in the {{manifoldcf.log}} like the initial connectivity dropped at 13:18:25, and then there are problems logged through the end of 13:18:26 with worker threads not able to reach the server. This is expected. Are the clients (worker thread 88, 39, 8, 86, 87, 982, 99, 75, 12) able to sleep and retry after failed connectivity or do they just try once and give up? As a side note, if you add a header telling tika-server what the file name is, that filename will be included in the log message so you can figure out which file caused the timeout. See: https://wiki.apache.org/tika/TikaJAXRS ... in short, add the header to your request: {{"Content-Disposition: attachment; filename=foo.csv"}} Some reasons for timeouts: the vm is overtaxed and processing is just slow, infinite loop in a parser (these are rare but they can happen), OCR can take minutes per document (do you have tesseract installed)? > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, > log4j_child.xml, man_tika.zip, tikalogchild.log > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698892#comment-16698892 ] Mario Bisonti commented on TIKA-2776: - Now I tried to start tika with the command: java -Dlog4j.configuration=file:/opt/tika/log4j.xml -jar /opt/tika/tika-server-1.20-20181114.215706-48.jar -JDlog4j.configuration=file:/opt/tika/log4j_child.xml --host=sengvivv01.vimar.net -spawnChild -taskTimeoutMillis 100 Perhaps, could the -taskTimeoutMillis parameter be useful? i see many error in the tikalogchild.log: ERROR ServerStatusWatcher:129 - Timeout task PARSE, millis elapsed 120253 . . ERROR ServerStatusWatcher:129 - Timeout task PARSE, millis elapsed 120253 . . ERROR ServerStatusWatcher:129 - Timeout task PARSE, millis elapsed 120335 . . etc. > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, > log4j_child.xml, man_tika.zip, tikalogchild.log > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698868#comment-16698868 ] Mario Bisonti commented on TIKA-2776: - Hallo Tim. now I have both logs for tika server : parent (tikalog.log) and child (tikalogchild.log) In manifoldcf.log you can see in the last line, that job aborted at 13:18 You can see a restart at that time in the tika log. In the man_tika.zip I attached the logs. [^man_tika.zip] > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, > log4j_child.xml, man_tika.zip, tikalogchild.log > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697364#comment-16697364 ] Tim Allison commented on TIKA-2776: --- Try adding that parameter immediately after {{java}}. > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, > log4j_child.xml, tikalogchild.log > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697249#comment-16697249 ] Mario Bisonti commented on TIKA-2776: - Yes, there were any files processed in the interval 13:24-13:34 No files processed after because the job crashed as you can see from the MCF_JOB.png I tried to start tika server with the two log specification but it doesn't work. I obtain the error: administrator@sengvivv02:/opt/tika$ sudo -u tomcat java -jar /opt/tika/tika-server-1.19.1.jar -Dlog4j.configuration=file:/opt/tika/log4j.xml -JDlog4j.configuration=file:/opt/tika/log4j_child.xml --host=sengvivv02 -spawnChild Nov 23, 2018 3:55:45 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Nov 23, 2018 3:55:45 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. INFO Starting Apache Tika 1.19.1 server org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -Dlog4j.configuration=file:/opt/tika/log4j.xml at org.apache.commons.cli.Parser.processOption(Parser.java:363) at org.apache.commons.cli.Parser.parse(Parser.java:199) at org.apache.commons.cli.Parser.parse(Parser.java:85) at org.apache.tika.server.TikaServerCli.execute(TikaServerCli.java:133) at org.apache.tika.server.TikaServerCli.main(TikaServerCli.java:117) ERROR Can't start: org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -Dlog4j.configuration=file:/opt/tika/log4j.xml at org.apache.commons.cli.Parser.processOption(Parser.java:363) at org.apache.commons.cli.Parser.parse(Parser.java:199) at org.apache.commons.cli.Parser.parse(Parser.java:85) at org.apache.tika.server.TikaServerCli.execute(TikaServerCli.java:133) at org.apache.tika.server.TikaServerCli.main(TikaServerCli.java:117) > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, > log4j_child.xml, tikalogchild.log > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697234#comment-16697234 ] Mario Bisonti commented on TIKA-2776: - !MCF_JOB.png! > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, > log4j_child.xml, tikalogchild.log > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697209#comment-16697209 ] Tim Allison commented on TIKA-2776: --- It would also be useful to add logging to the parent process. I think the same configuration can be used with the one critical change of writing to a different file... add {{-Dlog4j.configuration...}} early in the commandline > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: Log.zip, log4j.xml, log4j_child.xml, log4j_child.xml, > tikalogchild.log > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697134#comment-16697134 ] Tim Allison commented on TIKA-2776: --- Ugh...looks like the append=false forces the logs to overwrite, which means that we're missing the critical points: the child log is missing between the end of tikalogchild.log.1 (12:57) and the beginning of tikalogchild.log (13:34). Let's change this line to "true": {noformat} {noformat} >From the MCF_client_log, tika was unavailable at 13:24, 13:26, 13:29, 13:31 >and 13:34. Can you tell from the MCF logs if any files were processed between >13:24 and 13:34? Were any files processed after 13:34? > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: Log.zip, log4j.xml, log4j_child.xml, log4j_child.xml, > tikalogchild.log > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695874#comment-16695874 ] Mario Bisonti commented on TIKA-2776: - Hallo Tim, now I am able to generate the log, finally. Today, I started a processing from my client to parse with tika. It started to process at 8:30 a.m. and at 13:34 as you see in the MCF_Client.log I see that Tika created log tikalogchild.log1and wrote on it in at the 12:57, after a new log tikalogchild.log at 13:34 I suppose, when the child is restarted. So, I suppose that the client crashed because this restart? I attatch in the Log.zip the three files. Could you help me to understand, how to solve this issue? I am using tika-server-1.20-20181114.215706-48.jar Thanks a lot. Mario[^Log.zip] > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: Log.zip, log4j.xml, log4j_child.xml, log4j_child.xml, > tikalogchild.log > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694777#comment-16694777 ] Tim Allison commented on TIKA-2776: --- Sorry. I do have a file produced on the file system. I just attached the literal .xml log config file and the log that was written to my file system. {noformat} C:\data\tika_server>java -jar tika-server-1.19.1.jar -JDlog4j.configuration=file:log4j_child.xml -spawnChild Nov 21, 2018 9:30:08 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Nov 21, 2018 9:30:08 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. INFO Starting Apache Tika 1.19.1 server Nov 21, 2018 9:30:09 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Nov 21, 2018 9:30:09 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. 2018-11-21 09:30:09 INFO TikaServerCli:115 - Starting Apache Tika 1.19.1 server 2018-11-21 09:30:09 INFO ServerImpl:85 - Setting the server's publish address to be http://localhost:9998/ 2018-11-21 09:30:09 INFO log:193 - Logging initialized @1296ms to org.eclipse.jetty.util.log.Slf4jLog 2018-11-21 09:30:09 INFO Server:374 - jetty-9.4.z-SNAPSHOT; built: 2018-06-05T18:24:03.829Z; git: d5fc0523cfa96bfebfbda19606cad384d772f04c; jvm 1.8.0_192-b12 2018-11-21 09:30:10 INFO AbstractConnector:289 - Started ServerConnector@1af2d44a{HTTP/1.1,[http/1.1]}{localhost:9998} 2018-11-21 09:30:10 INFO Server:411 - Started @1811ms 2018-11-21 09:30:10 WARN ContextHandler:1572 - Empty contextPath 2018-11-21 09:30:10 INFO ContextHandler:851 - Started o.e.j.s.h.ContextHandler@342c38f8{/,null,AVAILABLE} 2018-11-21 09:30:10 INFO TikaServerCli:316 - Started Apache Tika server at http://localhost:9998/ 2018-11-21 09:30:27 INFO RecursiveMetadataResource:429 - rmeta (autodetecting type){noformat} I then curled a file a couple of times against the server: {noformat} curl.exe -T somePDF.pdf http://localhost:9998/rmeta {noformat} > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: log4j.xml, log4j_child.xml, log4j_child.xml, > tikalogchild.log > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694760#comment-16694760 ] Mario Bisonti commented on TIKA-2776: - Hallo. Tim, I have no log on filesystem with this log4j_child.xml __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ I don't understand if you have a log produced on the filesystem or not. > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694694#comment-16694694 ] Tim Allison commented on TIKA-2776: --- {quote} it looks like the client only waited a quarter of a second. It can take 20-30 seconds to restart tika-server. {quote} Make that 1-3 seconds...from semi-manual integration tests outside of the Tika test framework. > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694671#comment-16694671 ] Tim Allison commented on TIKA-2776: --- If you're using 1.19.1, you've hit TIKA-2785. The temporary fix is to have the ConsoleAppender write to stderr: {noformat} {noformat} I tested this with 1.19.1 on Windows, and I had success. If you have an interest in testing the new mechanism, grab a nightly build of tika-server from, e.g. [here|https://builds.apache.org/job/tika-branch-1x/131/org.apache.tika$tika-server/artifact/org.apache.tika/tika-server/1.20-20181120.215531-52/tika-server-1.20-20181120.215531-52.jar], and you can use the log file as you had it configured. :D > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694508#comment-16694508 ] Mario Bisonti commented on TIKA-2776: - I tried to start: C:\Temp\Tika>java -jar tika-server-1.19.1.jar --host=myhostname -spawnchild -JDlog4j.configuration=file:"log4j_child.xml" -log info Where log4_child.xml is: __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ but It doesn't produce any log in file system : C:/Temp/tika/tikalog_child.log wasn't created What's wrong? Thanks a lot Mario > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691675#comment-16691675 ] Mario Bisonti commented on TIKA-2776: - Hallo. I obtained, from the client calling Tika server, after 5 hours of processing files, the error: WARN 2018-11-19T13:47:17,888 (Worker thread '56') - Service interruption reported for job 1533797717712 connection 'WinShare': Tika down, retrying: Connect to hostanmeubuntu:9998 [hostanmeubuntu/172.16.1.135] failed: Connection refused (Connection refused) WARN 2018-11-19T13:47:18,006 (Worker thread '96') - Service interruption reported for job 1533797717712 connection 'WinShare': Tika down, retrying: Connect to hostanmeubuntu:9998 [hostanmeubuntu/172.16.1.135] failed: Connection refused (Connection refused) WARN 2018-11-19T13:47:18,006 (Worker thread '20') - Service interruption reported for job 1533797717712 connection 'WinShare': Tika down, retrying: Connect to hostanmeubuntu:9998 [hostanmeubuntu/172.16.1.135] failed: Connection refused (Connection refused) WARN 2018-11-19T13:47:18,071 (Worker thread '26') - Service interruption reported for job 1533797717712 connection 'WinShare': Tika down, retrying: Connect to hostanmeubuntu:9998 [hostanmeubuntu/172.16.1.135] failed: Connection refused (Connection refused) WARN 2018-11-19T13:47:18,116 (Worker thread '27') - JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy. Perhaps it could be to: _If the server times out on a file, the client will receive an IOException from the closed socket. Note that all other files that are being processed will end with an IOException from a closed socket when the child process shuts down; e.g. if you send three files to tika-server concurrently, and one of them causes a catastrophic problem requiring the child to shut down, you won't be able to tell which file caused the problems. In the future, we may implement a gentler shutdown than we currently have._ Perhaps could a gentler shutdown solve the problem? Thanks Mario > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688313#comment-16688313 ] Mario Bisonti commented on TIKA-2776: - Hallo Tim I delayed the client to avoid the 503 error and I am using on another ubuntu host, the tika server snapshot: java -jar /opt/tika/tika-server-1.20-20181114.215706-48.jar --host=hostnameubuntu -spawnChild I will update you if the scanning works. Thanks a lot! Mario > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687264#comment-16687264 ] Hudson commented on TIKA-2776: -- SUCCESS: Integrated in Jenkins build tika-branch-1x #129 (See [https://builds.apache.org/job/tika-branch-1x/129/]) TIKA-2776 -- update CHANGES.txt (tallison: [https://github.com/apache/tika/commit/0a3a8be32659f1f0e5c29eb602d8caf149c347e5]) * (edit) CHANGES.txt > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687267#comment-16687267 ] Hudson commented on TIKA-2776: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1596 (See [https://builds.apache.org/job/Tika-trunk/1596/]) TIKA-2776 -- update CHANGES.txt (tallison: [https://github.com/apache/tika/commit/c037fb71e1d5e9e549a12f77c620229465763abf]) * (edit) CHANGES.txt > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687242#comment-16687242 ] Hudson commented on TIKA-2776: -- UNSTABLE: Integrated in Jenkins build tika-2.x-windows #350 (See [https://builds.apache.org/job/tika-2.x-windows/350/]) TIKA-2776 -- tika-server in legacy mode should ignore oom. (tallison: rev 8d9061d281f7f46ecae8b902e27c49777ec43919) * (edit) tika-server/src/main/java/org/apache/tika/server/ServerStatus.java * (edit) tika-server/src/test/java/org/apache/tika/server/TikaServerIntegrationTest.java * (edit) tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java TIKA-2776 -- update CHANGES.txt (tallison: rev 6710c52efe3cf8b220773ac9df85bb2f292d4415) * (edit) CHANGES.txt TIKA-2776 -- update CHANGES.txt (tallison: rev c037fb71e1d5e9e549a12f77c620229465763abf) * (edit) CHANGES.txt > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687208#comment-16687208 ] Hudson commented on TIKA-2776: -- SUCCESS: Integrated in Jenkins build tika-branch-1x #128 (See [https://builds.apache.org/job/tika-branch-1x/128/]) TIKA-2776 -- tika-server in legacy mode should ignore oom. (tallison: [https://github.com/apache/tika/commit/22f57072dfc96da6ef27e053a09f9b648552f2f2]) * (edit) tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java * (edit) tika-server/src/main/java/org/apache/tika/server/ServerStatus.java * (edit) tika-server/src/test/java/org/apache/tika/server/TikaServerIntegrationTest.java TIKA-2776 -- update CHANGES.txt (tallison: [https://github.com/apache/tika/commit/9e2a9bbd6e6ec6dc1ec6ecf39b6b850318df3219]) * (edit) CHANGES.txt > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687197#comment-16687197 ] Hudson commented on TIKA-2776: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1595 (See [https://builds.apache.org/job/Tika-trunk/1595/]) TIKA-2776 -- tika-server in legacy mode should ignore oom. (tallison: [https://github.com/apache/tika/commit/8d9061d281f7f46ecae8b902e27c49777ec43919]) * (edit) tika-server/src/test/java/org/apache/tika/server/TikaServerIntegrationTest.java * (edit) tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java * (edit) tika-server/src/main/java/org/apache/tika/server/ServerStatus.java TIKA-2776 -- update CHANGES.txt (tallison: [https://github.com/apache/tika/commit/6710c52efe3cf8b220773ac9df85bb2f292d4415]) * (edit) CHANGES.txt > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.0.0, 1.20 > > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687047#comment-16687047 ] Tim Allison commented on TIKA-2776: --- Oops...on mobile...2... should you notify Manifold project. Yes, once they are comfortable with the -spawnChild option, the client has to be able to handle the tika-server being actually down or 503 intermittently when it restarts on serious problem. As for logging, let me know if you’ve gotten that working...that is critical. > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Priority: Major > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687043#comment-16687043 ] Tim Allison commented on TIKA-2776: --- >>. 1. Problem wasn’t with -spawnChild OMG...legacy mode shouldn’t check if server is shutting down. This is a bad bug. If running in legacy mode, once you hit an oom, the server now returns 503 indefinitely...ugh... at least you can use the new feature -spawnChild to get around the bug I introduced into legacy when I added -spawnChild...head in hands...will fix ASAP. >> 2. Still seems to be running... Great! Y. That’s the goal of -spawnChild. The server will restart itself on serious error/timeout. >>3. On diff versions of java I’m not aware of any significant differences, but I haven’t looked yet. > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Priority: Major > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686689#comment-16686689 ] Mario Bisonti commented on TIKA-2776: - Hallo Tim. # The error "Caused by: java.lang.OutOfMemoryError: Java heap space" happened when I tried to use Tika, launching java -jar /opt/tika/tika-server-1.19.1.jar so WITHOUT the option "-spawnChild. # When you said _The other thing you might want to do...if you aren't already...is add a {{waitForServer}} loop along the lines of what I did in TikaServerIntegrationTest...for when your client hits a 503._ Do you mean to put the code that you mention, in the client that calls tika server? In my case ManifoldCF ? If yes, I will forward your suggestion to the ManifoldCF owner # Now I tried to start tika server in my windows host, to split ManildCF-Solr and Tika server, and the job is working by 5 hours without crash! Note that in my widows host I use: java -version java version "1.8.0_92" Java(TM) SE Runtime Environment (build 1.8.0_92-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode) instead, in the Ubuntu host, where there are ManildCF-Solr and where I used, before this test of splitting, the tika server with the job that stpped repeatedly, I use: java -version openjdk version "10.0.2" 2018-07-17 OpenJDK Runtime Environment (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3) OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3, mixed mode) Do you know if there is any issue about the java version where tika server runs? Thanks a lot a lot. Mario > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Priority: Major > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686620#comment-16686620 ] Tim Allison commented on TIKA-2776: --- The other thing you might want to do...if you aren't already...is add a {{waitForServer}} loop along the lines of what I did in TikaServerIntegrationTest...for when your client hits a 503. {noformat} private void awaitServerStartup() throws Exception { Instant started = Instant.now(); long elapsed = Duration.between(started, Instant.now()).toMillis(); while (elapsed < 3) { try { Response response = WebClient .create(endPoint + "/tika") .accept("text/plain") .get(); if (response.getStatus() == 200) { return; } } catch (javax.ws.rs.ProcessingException e) { } Thread.sleep(100); elapsed = Duration.between(started, Instant.now()).toMillis(); } } {noformat} > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Priority: Major > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686597#comment-16686597 ] Tim Allison commented on TIKA-2776: --- bq. Caused by: java.lang.OutOfMemoryError: Java heap space Victory!!! Try bumping up your -Xmx for the child process: {{-JXmx2g}} or similar. This actually shows that {{-spawnChild}} is working...I think? > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Priority: Major > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686522#comment-16686522 ] Tim Allison commented on TIKA-2776: --- bq. I am not so expert on logging.. be patient please Ha! No problem at all! bq. I configures log4j.xml and log4j_child.xml as in the attachment...I _think_ I just fixed this in TIKA-2782. For now, you have to avoid {{debug="true"}} > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Priority: Major > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686336#comment-16686336 ] Mario Bisonti commented on TIKA-2776: - This is the errors that I see on my screen: ERROR Problem with writing the data, class org.apache.tika.server.resource.TikaResource$5, ContentType: text/plain WARN Interceptor for {http://resource.server.tika.apache.org/}MetadataResource has thrown exception, unwinding now org.apache.cxf.interceptor.Fault: Java heap space at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleWriteException(JAXRSOutInterceptor.java:396) at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:272) at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:122) at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:84) at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308) at org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90) at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308) at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:267) at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247) at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:205) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:531) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Unknown Source) at java.io.ByteArrayOutputStream.toByteArray(Unknown Source) at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:166) at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:120) at org.apache.poi.openxml4j.util.ZipArchiveFakeEntry.(ZipArchiveFakeEntry.java:47) at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.(ZipInputStreamZipEntrySource.java:51) at org.apache.poi.openxml4j.opc.ZipPackage.(ZipPackage.java:106) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:298) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:85) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:110) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686191#comment-16686191 ] Mario Bisonti commented on TIKA-2776: - Hallo Tim. I am not so expert on logging.. be patient please :-) I configures log4j.xml and log4j_child.xml as in the attachment I started: java -jar /opt/tika/tika-server-1.19.1.jar -Dlog4j.configuration=file:log4j.xml -JDlog4j.configuration=file:log4j_child.xml but I obtain: administrator@sengvivv02:/opt/tika$ java -jar /opt/tika/tika-server-1.19.1.jar -Dlog4j.configuration=file:log4j.xml -JDlog4j.configuration=file:log4j_child.xml Nov 14, 2018 9:14:58 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Nov 14, 2018 9:14:58 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. INFO Starting Apache Tika 1.19.1 server org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -Dlog4j.configuration=file:log4j.xml at org.apache.commons.cli.Parser.processOption(Parser.java:363) at org.apache.commons.cli.Parser.parse(Parser.java:199) at org.apache.commons.cli.Parser.parse(Parser.java:85) at org.apache.tika.server.TikaServerCli.execute(TikaServerCli.java:133) at org.apache.tika.server.TikaServerCli.main(TikaServerCli.java:117) ERROR Can't start: org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -Dlog4j.configuration=file:log4j.xml at org.apache.commons.cli.Parser.processOption(Parser.java:363) at org.apache.commons.cli.Parser.parse(Parser.java:199) at org.apache.commons.cli.Parser.parse(Parser.java:85) at org.apache.tika.server.TikaServerCli.execute(TikaServerCli.java:133) at org.apache.tika.server.TikaServerCli.main(TikaServerCli.java:117) and no log in the directory. What's wrong? Thanks Mario [^log4j_child.xml] [^log4j.xml] > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Priority: Major > Attachments: log4j.xml, log4j_child.xml > > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685881#comment-16685881 ] Tim Allison commented on TIKA-2776: --- The limitation on STDOUT is a serious (but trivial-to-fix) bug: TIKA-2782 I wonder if that's causing you problems? > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Priority: Major > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685853#comment-16685853 ] Tim Allison commented on TIKA-2776: --- Y, the above works with the caveat that log appenders should not go to STDOUT in the child process: See: https://wiki.apache.org/tika/TikaJAXRS#Logging > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Priority: Major > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685641#comment-16685641 ] Tim Allison commented on TIKA-2776: --- I haven't actually tried the above yet... Will do so soon, though, and update our wiki. > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Priority: Major > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685640#comment-16685640 ] Tim Allison commented on TIKA-2776: --- bq. For me is very difficult to investigate why tika server child is restarted/crashed. Is there any way to log Tika server? You should be able to use log4j for the parent process as you would expect: {{-Dlog4j.configuration=file:log4j.xml}}. You can select between {{info}} and {{debug}} when you start the server: {{-log info}} To configure logging in the child process, add {{-J}} to the beginning {{-JDlog4j.configuration=file:log4j_child.xml}}. > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Priority: Major > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684907#comment-16684907 ] Mario Bisonti commented on TIKA-2776: - Hallo Tim. >From the ManifoldCF side I read the log: org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: The target server failed to respond at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:489) [mcf-pull-agent.jar:?] Caused by: org.apache.http.NoHttpResponseException: The target server failed to respond at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) ~[httpcore-4.4.10.jar:4.4.10] at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) ~[httpcore-4.4.10.jar:4.4.10] at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) ~[httpcore-4.4.10.jar:4.4.10] at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) ~[httpcore-4.4.10.jar:4.4.10] at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:118) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[httpclient-4.5.6.jar:4.5.6] at org.apache.manifoldcf.agents.transformation.tikaservice.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:608) ~[?:?] at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?] at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?] at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$MonitoredAddActivityWrapper.sendDocument(IncrementalIngester.java:3471) ~[mcf-agents.jar:?] at org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter.addOrReplaceDocumentWithException(DocumentFilter.java:208) ~[?:?] at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) ~[mcf-agents.jar:?] at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) ~[mcf-agents.jar:?] at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) ~[mcf-agents.jar:?] at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) ~[mcf-agents.jar:?] at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) ~[mcf-pull-agent.jar:?] at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) ~[mcf-pull-agent.jar:?] at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) ~[?:?] at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) ~[mcf-pull-agent.jar:?] WARN 2018-11-13T09:50:58,546 (Worker thread '48') - Service interruption reported for job 1533797717712 connection 'WinShare': Tika down, retrying: Connect to localhost:9998 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] failed: Connection refused (Connection refused) WARN 2018-11-13T09:50:58,606 (Worker thread '34') - Service interruption reported for job 1533797717712 connection 'WinShare': Tika down, retrying: Connect to localhost:9998 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] failed: Connection refused (Connection refused) WARN 2018-11-13T09:50:58,947
[jira] [Commented] (TIKA-2776) Tika server child restart
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681992#comment-16681992 ] Tim Allison commented on TIKA-2776: --- How often is the child crashing? Can you tell what is causing the problems? Is the child actually starting successfully...are you getting any files parsed? > Tika server child restart > - > > Key: TIKA-2776 > URL: https://issues.apache.org/jira/browse/TIKA-2776 > Project: Tika > Issue Type: Bug >Reporter: Mario Bisonti >Priority: Major > > Hallo. > I use tika server standalone started with the option: > java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild > I use ManifoldCF and Solr to index file using tika server. > It happens that indexing is continuously crashed because I obtain many: > Tika down, retrying: Connection reset > etc. > I suspect that, when a process is restarted, the client crash as mentioned > here: > _If the child process is in the process of shutting down, and it gets a new > request it will return 503 -- Service Unavailable. If the server times out on > a file, the client will receive an IOException from the closed socket. Note > that all other files that are being processed will end with an IOException > from a closed socket when the child process shuts down; e.g. if you send > three files to tika-server concurrently, and one of them causes a > catastrophic problem requiring the child to shut down, you won't be able to > tell which file caused the problems. In the future, we may implement a > gentler shutdown than we currently have._ > as reported here https://wiki.apache.org/tika/TikaJAXRS > How could I workaround it ? > Thanks a lot > Mario -- This message was sent by Atlassian JIRA (v7.6.3#76005)