[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705283#comment-16705283
 ] 

Hudson commented on TIKA-2776:
--

SUCCESS: Integrated in Jenkins build tika-branch-1x #133 (See 
[https://builds.apache.org/job/tika-branch-1x/133/])
TIKA-2776 -- improve documentation for -maxFiles (tallison: 
[https://github.com/apache/tika/commit/4141411773a321fe614167584d23e376c4dbcb3c])
* (edit) tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java


> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, 
> log4j_child.xml, man_tika.zip, tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705281#comment-16705281
 ] 

Hudson commented on TIKA-2776:
--

SUCCESS: Integrated in Jenkins build Tika-trunk #1601 (See 
[https://builds.apache.org/job/Tika-trunk/1601/])
TIKA-2776 -- improve documentation for -maxFiles (tallison: 
[https://github.com/apache/tika/commit/a477d73ac56c169075b5c9ea66bf57be1f3dc672])
* (edit) tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java


> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, 
> log4j_child.xml, man_tika.zip, tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705251#comment-16705251
 ] 

Hudson commented on TIKA-2776:
--

UNSTABLE: Integrated in Jenkins build tika-2.x-windows #354 (See 
[https://builds.apache.org/job/tika-2.x-windows/354/])
TIKA-2776 -- improve documentation for -maxFiles (tallison: rev 
a477d73ac56c169075b5c9ea66bf57be1f3dc672)
* (edit) tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java


> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, 
> log4j_child.xml, man_tika.zip, tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-30 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704474#comment-16704474
 ] 

Mario Bisonti commented on TIKA-2776:
-

Hallo Tim.

I obtained a restart of child:

2018-11-30 01:21:01 INFO TikaServerWatchDog:104 - About to restart the child 
process
2018-11-30 01:21:02 INFO TikaServerWatchDog:106 - Successfully restarted child 
process -- 11 restarts so far)
2018-11-30 05:39:09 WARN TikaServerWatchDog:253 - Received status from child: 
HIT_MAX
2018-11-30 05:39:10 INFO TikaServerWatchDog:104 - About to restart the child 
process
2018-11-30 05:39:12 INFO TikaServerWatchDog:106 - Successfully restarted child 
process -- 13 restarts so far)
2018-11-30 08:38:03 WARN TikaServerWatchDog:253 - Received status from child: 
HIT_MAX
2018-11-30 08:38:03 INFO TikaServerWatchDog:104 - About to restart the child 
process
2018-11-30 08:38:04 INFO TikaServerWatchDog:106 - Successfully restarted child 
process -- 15 restarts so far)

 

Is this related about the parameter :

_{{-maxFiles}}: restart the child process after it has processed {{maxFiles}}. 
If there is a slow building memory leak, this restart of the JVM should help._

I didn't set the parameter.

Which is default value of maxFiles ?

Thanks

 

Mario

 

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, 
> log4j_child.xml, man_tika.zip, tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-29 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703173#comment-16703173
 ] 

Mario Bisonti commented on TIKA-2776:
-

Hi Tim.

3. Yes, you are right, when the parent process, never shutdown, but only the 
child restarted.

 

Your opinion/advice:

Yes, I workarounded the restart of the child increasing the timeout, in the 
future I will ask if there is the possibility that ManifoldCF can manage the 
case of timeout.

In this moment, I am tring to end the indexing of the 70 documents that it 
stops many times due to java exception of the agent that manipulate the 
indexing for ManifoldCF

 

You  do not have not to thank me, because I thank to you for the big support

 

 

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, 
> log4j_child.xml, man_tika.zip, tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-28 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701938#comment-16701938
 ] 

Tim Allison commented on TIKA-2776:
---

Thank you for the follow up!

To confirm/summarize...
1. I introduced a change in behavior (bug) into legacy server mode in 1.19 
(maybe 1.18?) that causes tika-server to return 'not available' forever after 
an OOM.  The legacy behavior was to ignore OOMs and _hope_ nothing too bad 
happened to your JVM.  That said, the change of behavior I introduced is bad, 
very bad.  I've fixed this in 1.20, which should be out in a few weeks.
2. tika-server in -spawnChild mode was restarting the child because you were 
getting timeouts.  This caused problems with Manifold.  You've bumped out the 
timeout to ~16 minutes, and you currently don't have any files that take longer 
than that...so all appears to work for now.
3. I _think_ we found that {{-spawnChild}} was behaving as it was designed to 
do.  To confirm, we did not find that the parent process shutdown, and we did 
find that the child restarted within a few seconds.  Is this correct?

My opinion/advice:
Depending on the nature of your documents, if you have large enough batches of 
crazy enough documents, you will eventually hit an infinite loop, and the child 
will timeout and restart.  So, for now, you've wallpapered over a problem by 
bumping out the timeout, but the timeouts will eventually happen.  So, what can 
we do in Tika, what can Manifold do, what can you do to help avoid this 
eventuality?

Again, many, many thanks for your patience getting the logging up and running.  
I still need to improve our wiki on logging with tika-server (based on our 
interaction) even more.  


> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, 
> log4j_child.xml, man_tika.zip, tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-28 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701864#comment-16701864
 ] 

Mario Bisonti commented on TIKA-2776:
-

Hallo Tim

tesseract is not installed

 

It seems that with the parameter "-spawnChild -taskTimeoutMillis 100" no 
more shutdown the child

 

I will update about the evolution of my issue with the client that index many 
documents

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, 
> log4j_child.xml, man_tika.zip, tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-26 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699108#comment-16699108
 ] 

Tim Allison commented on TIKA-2776:
---

Three cheers for logging, and thank you for your patience in configuring those!

Yes, exactly!  It looks like the child process restarted at 2018-11-26 13:18:26 
{{2018-11-26 13:18:26 INFO  MetadataResource:431 - meta 
(application/vnd.openxmlformats}} and then processed more files successfully.  
It can take few seconds for the server to restart, and it looks in the 
{{manifoldcf.log}} like the initial connectivity dropped at 13:18:25, and then 
there are problems logged through the end of 13:18:26 with worker threads not 
able to reach the server.  This is expected.  Are the clients (worker thread 
88, 39, 8, 86, 87, 982, 99, 75, 12) able to sleep and retry after failed 
connectivity or do they just try once and give up?  

As a side note, if you add a header telling tika-server what the file name is, 
that filename will be included in the log message so you can figure out which 
file caused the timeout.  

See: https://wiki.apache.org/tika/TikaJAXRS ... in short, add the header to 
your request:
{{"Content-Disposition: attachment; filename=foo.csv"}}

Some reasons for timeouts: the vm is overtaxed and processing is just slow, 
infinite loop in a parser (these are rare but they can happen), OCR can take 
minutes per document (do you have tesseract installed)?



> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, 
> log4j_child.xml, man_tika.zip, tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-26 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698892#comment-16698892
 ] 

Mario Bisonti commented on TIKA-2776:
-

Now I tried to start tika with the command:
 java -Dlog4j.configuration=file:/opt/tika/log4j.xml -jar 
/opt/tika/tika-server-1.20-20181114.215706-48.jar 
-JDlog4j.configuration=file:/opt/tika/log4j_child.xml 
--host=sengvivv01.vimar.net -spawnChild -taskTimeoutMillis 100

Perhaps, could the -taskTimeoutMillis parameter be useful?

 

i see many error in the tikalogchild.log:

ERROR ServerStatusWatcher:129 - Timeout task PARSE, millis elapsed 120253

.

.

ERROR ServerStatusWatcher:129 - Timeout task PARSE, millis elapsed 120253

.

.

ERROR ServerStatusWatcher:129 - Timeout task PARSE, millis elapsed 120335

.

.

 

 

etc.

 

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, 
> log4j_child.xml, man_tika.zip, tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-26 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698868#comment-16698868
 ] 

Mario Bisonti commented on TIKA-2776:
-

Hallo Tim.

now I have both logs for tika server : parent (tikalog.log) and child 
(tikalogchild.log)

In manifoldcf.log you can see  in the last line, that job aborted at 13:18

You can see a restart at that time in the tika log.

 

In the man_tika.zip I attached the logs.

 

[^man_tika.zip]

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, 
> log4j_child.xml, man_tika.zip, tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-23 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697364#comment-16697364
 ] 

Tim Allison commented on TIKA-2776:
---

Try adding that parameter immediately after {{java}}.

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, 
> log4j_child.xml, tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-23 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697249#comment-16697249
 ] 

Mario Bisonti commented on TIKA-2776:
-

Yes, there were any files processed in the interval 13:24-13:34

No files processed after because the job crashed as you can see from the 
MCF_JOB.png

 

I tried to start tika server with the two log specification but it doesn't work.

I obtain the error:

administrator@sengvivv02:/opt/tika$ sudo -u tomcat java -jar 
/opt/tika/tika-server-1.19.1.jar -Dlog4j.configuration=file:/opt/tika/log4j.xml 
-JDlog4j.configuration=file:/opt/tika/log4j_child.xml --host=sengvivv02 
-spawnChild
Nov 23, 2018 3:55:45 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.

Nov 23, 2018 3:55:45 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
INFO Starting Apache Tika 1.19.1 server
org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: 
-Dlog4j.configuration=file:/opt/tika/log4j.xml
 at org.apache.commons.cli.Parser.processOption(Parser.java:363)
 at org.apache.commons.cli.Parser.parse(Parser.java:199)
 at org.apache.commons.cli.Parser.parse(Parser.java:85)
 at org.apache.tika.server.TikaServerCli.execute(TikaServerCli.java:133)
 at org.apache.tika.server.TikaServerCli.main(TikaServerCli.java:117)
ERROR Can't start:
org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: 
-Dlog4j.configuration=file:/opt/tika/log4j.xml
 at org.apache.commons.cli.Parser.processOption(Parser.java:363)
 at org.apache.commons.cli.Parser.parse(Parser.java:199)
 at org.apache.commons.cli.Parser.parse(Parser.java:85)
 at org.apache.tika.server.TikaServerCli.execute(TikaServerCli.java:133)
 at org.apache.tika.server.TikaServerCli.main(TikaServerCli.java:117)

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, 
> log4j_child.xml, tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-23 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697234#comment-16697234
 ] 

Mario Bisonti commented on TIKA-2776:
-

!MCF_JOB.png!

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, 
> log4j_child.xml, tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-23 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697209#comment-16697209
 ] 

Tim Allison commented on TIKA-2776:
---

It would also be useful to add logging to the parent process. I think the same 
configuration can be used with the one critical change of writing to a 
different file... add {{-Dlog4j.configuration...}} early in the commandline

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, log4j.xml, log4j_child.xml, log4j_child.xml, 
> tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-23 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697134#comment-16697134
 ] 

Tim Allison commented on TIKA-2776:
---

Ugh...looks like the append=false forces the logs to overwrite, which means 
that we're missing the critical points: the child log is missing between the 
end of tikalogchild.log.1 (12:57) and the beginning of tikalogchild.log (13:34).

Let's change this line to "true":
{noformat}
 
{noformat}

>From the MCF_client_log, tika was unavailable at 13:24, 13:26, 13:29, 13:31 
>and 13:34.  Can you tell from the MCF logs if any files were processed between 
>13:24 and 13:34?  Were any files processed after 13:34?



> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, log4j.xml, log4j_child.xml, log4j_child.xml, 
> tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-22 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695874#comment-16695874
 ] 

Mario Bisonti commented on TIKA-2776:
-

Hallo Tim, now I am able to generate the log, finally.

Today, I started a processing from my client to parse with tika.

It started to process at 8:30 a.m. and at 13:34 as you see in the MCF_Client.log

I see that Tika created log tikalogchild.log1and wrote on it in at the 12:57, 
after a new log tikalogchild.log at 13:34 I suppose, when the child is 
restarted.

So, I suppose that the client crashed because this restart?

 

I attatch in the Log.zip the three files.

 

Could you help me to understand, how to solve this issue?

 

I am using tika-server-1.20-20181114.215706-48.jar

 

Thanks a lot.

Mario[^Log.zip]

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: Log.zip, log4j.xml, log4j_child.xml, log4j_child.xml, 
> tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-21 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694777#comment-16694777
 ] 

Tim Allison commented on TIKA-2776:
---

Sorry.  I do have a file produced on the file system.  I just attached the 
literal .xml log config file and the log that was written to my file system.


{noformat}

C:\data\tika_server>java -jar tika-server-1.19.1.jar 
-JDlog4j.configuration=file:log4j_child.xml -spawnChild

Nov 21, 2018 9:30:08 AM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.

Nov 21, 2018 9:30:08 AM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
INFO Starting Apache Tika 1.19.1 server
Nov 21, 2018 9:30:09 AM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.

Nov 21, 2018 9:30:09 AM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
2018-11-21 09:30:09 INFO TikaServerCli:115 - Starting Apache Tika 1.19.1 server
2018-11-21 09:30:09 INFO ServerImpl:85 - Setting the server's publish address 
to be http://localhost:9998/
2018-11-21 09:30:09 INFO log:193 - Logging initialized @1296ms to 
org.eclipse.jetty.util.log.Slf4jLog
2018-11-21 09:30:09 INFO Server:374 - jetty-9.4.z-SNAPSHOT; built: 
2018-06-05T18:24:03.829Z; git: d5fc0523cfa96bfebfbda19606cad384d772f04c; jvm 
1.8.0_192-b12
2018-11-21 09:30:10 INFO AbstractConnector:289 - Started 
ServerConnector@1af2d44a{HTTP/1.1,[http/1.1]}{localhost:9998}
2018-11-21 09:30:10 INFO Server:411 - Started @1811ms
2018-11-21 09:30:10 WARN ContextHandler:1572 - Empty contextPath
2018-11-21 09:30:10 INFO ContextHandler:851 - Started 
o.e.j.s.h.ContextHandler@342c38f8{/,null,AVAILABLE}
2018-11-21 09:30:10 INFO TikaServerCli:316 - Started Apache Tika server at 
http://localhost:9998/
2018-11-21 09:30:27 INFO RecursiveMetadataResource:429 - rmeta (autodetecting 
type){noformat}

I then curled a file a couple of times against the server:

{noformat}
curl.exe -T somePDF.pdf http://localhost:9998/rmeta
{noformat}

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: log4j.xml, log4j_child.xml, log4j_child.xml, 
> tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-21 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694760#comment-16694760
 ] 

Mario Bisonti commented on TIKA-2776:
-

Hallo.

Tim, I have no log on filesystem with this log4j_child.xml



__
__
__

__
 __
 __
 __
 __
 __

__
 __
 __
 __
 __
 __
 __
 __
 __

__
 __
 __
 __
 __

__

 

I don't understand if you have a log produced on the filesystem or not.

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-21 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694694#comment-16694694
 ] 

Tim Allison commented on TIKA-2776:
---

{quote} it looks like the client only waited a quarter of a second. It can take 
20-30 seconds to restart tika-server.
{quote}
Make that 1-3 seconds...from semi-manual integration tests outside of the Tika 
test framework.

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-21 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694671#comment-16694671
 ] 

Tim Allison commented on TIKA-2776:
---

If you're using 1.19.1, you've hit TIKA-2785.  The temporary fix is to have the 
ConsoleAppender write to stderr:

{noformat}

 
 
 
 
 
{noformat}

I tested this with 1.19.1 on Windows, and I had success.  If you have an 
interest in testing the new mechanism, grab a nightly build of tika-server 
from, e.g. 
[here|https://builds.apache.org/job/tika-branch-1x/131/org.apache.tika$tika-server/artifact/org.apache.tika/tika-server/1.20-20181120.215531-52/tika-server-1.20-20181120.215531-52.jar],
 and you can use the log file as you had it configured. :D

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-21 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694508#comment-16694508
 ] 

Mario Bisonti commented on TIKA-2776:
-

I tried to start:
C:\Temp\Tika>java -jar tika-server-1.19.1.jar --host=myhostname -spawnchild 
-JDlog4j.configuration=file:"log4j_child.xml" -log info

 

Where log4_child.xml is:

__
__
__

__
 __
 __
 __
 __

__
 __
 __
 __
 __
 __
 __
 __
 __

__
 __
 __
 __
 __

__

 

but It doesn't produce any log in file system : C:/Temp/tika/tikalog_child.log 
wasn't created

 

What's wrong?

Thanks a lot

Mario

 

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-19 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691675#comment-16691675
 ] 

Mario Bisonti commented on TIKA-2776:
-

Hallo.

I obtained, from the client calling Tika server, after 5 hours of processing 
files, the error:

WARN 2018-11-19T13:47:17,888 (Worker thread '56') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Tika down, retrying: 
Connect to hostanmeubuntu:9998 [hostanmeubuntu/172.16.1.135] failed: Connection 
refused (Connection refused)
 WARN 2018-11-19T13:47:18,006 (Worker thread '96') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Tika down, retrying: 
Connect to hostanmeubuntu:9998 [hostanmeubuntu/172.16.1.135] failed: Connection 
refused (Connection refused)
 WARN 2018-11-19T13:47:18,006 (Worker thread '20') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Tika down, retrying: 
Connect to hostanmeubuntu:9998 [hostanmeubuntu/172.16.1.135] failed: Connection 
refused (Connection refused)
 WARN 2018-11-19T13:47:18,071 (Worker thread '26') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Tika down, retrying: 
Connect to hostanmeubuntu:9998 [hostanmeubuntu/172.16.1.135] failed: Connection 
refused (Connection refused)
 WARN 2018-11-19T13:47:18,116 (Worker thread '27') - JCIFS: Possibly transient 
exception detected on attempt 1 while getting share security: All pipe 
instances are busy.

 

 

Perhaps it could be to:
_If the server times out on a file, the client will receive an IOException from 
the closed socket. Note that all other files that are being processed will end 
with an IOException from a closed socket when the child process shuts down; 
e.g. if you send three files to tika-server concurrently, and one of them 
causes a catastrophic problem requiring the child to shut down, you won't be 
able to tell which file caused the problems. In the future, we may implement a 
gentler shutdown than we currently have._

 

Perhaps could a gentler shutdown solve the problem?

 

Thanks

Mario

 

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-15 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688313#comment-16688313
 ] 

Mario Bisonti commented on TIKA-2776:
-

Hallo Tim

I delayed the client to avoid the 503 error and I am using on another ubuntu 
host, the tika server snapshot:
 java -jar /opt/tika/tika-server-1.20-20181114.215706-48.jar 
--host=hostnameubuntu -spawnChild

 

I will update you if the scanning works.

 

Thanks a lot!

Mario

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687264#comment-16687264
 ] 

Hudson commented on TIKA-2776:
--

SUCCESS: Integrated in Jenkins build tika-branch-1x #129 (See 
[https://builds.apache.org/job/tika-branch-1x/129/])
TIKA-2776 -- update CHANGES.txt (tallison: 
[https://github.com/apache/tika/commit/0a3a8be32659f1f0e5c29eb602d8caf149c347e5])
* (edit) CHANGES.txt


> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687267#comment-16687267
 ] 

Hudson commented on TIKA-2776:
--

SUCCESS: Integrated in Jenkins build Tika-trunk #1596 (See 
[https://builds.apache.org/job/Tika-trunk/1596/])
TIKA-2776 -- update CHANGES.txt (tallison: 
[https://github.com/apache/tika/commit/c037fb71e1d5e9e549a12f77c620229465763abf])
* (edit) CHANGES.txt


> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687242#comment-16687242
 ] 

Hudson commented on TIKA-2776:
--

UNSTABLE: Integrated in Jenkins build tika-2.x-windows #350 (See 
[https://builds.apache.org/job/tika-2.x-windows/350/])
TIKA-2776 -- tika-server in legacy mode should ignore oom. (tallison: rev 
8d9061d281f7f46ecae8b902e27c49777ec43919)
* (edit) tika-server/src/main/java/org/apache/tika/server/ServerStatus.java
* (edit) 
tika-server/src/test/java/org/apache/tika/server/TikaServerIntegrationTest.java
* (edit) tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java
TIKA-2776 -- update CHANGES.txt (tallison: rev 
6710c52efe3cf8b220773ac9df85bb2f292d4415)
* (edit) CHANGES.txt
TIKA-2776 -- update CHANGES.txt (tallison: rev 
c037fb71e1d5e9e549a12f77c620229465763abf)
* (edit) CHANGES.txt


> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687208#comment-16687208
 ] 

Hudson commented on TIKA-2776:
--

SUCCESS: Integrated in Jenkins build tika-branch-1x #128 (See 
[https://builds.apache.org/job/tika-branch-1x/128/])
TIKA-2776 -- tika-server in legacy mode should ignore oom. (tallison: 
[https://github.com/apache/tika/commit/22f57072dfc96da6ef27e053a09f9b648552f2f2])
* (edit) tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java
* (edit) tika-server/src/main/java/org/apache/tika/server/ServerStatus.java
* (edit) 
tika-server/src/test/java/org/apache/tika/server/TikaServerIntegrationTest.java
TIKA-2776 -- update CHANGES.txt (tallison: 
[https://github.com/apache/tika/commit/9e2a9bbd6e6ec6dc1ec6ecf39b6b850318df3219])
* (edit) CHANGES.txt


> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687197#comment-16687197
 ] 

Hudson commented on TIKA-2776:
--

SUCCESS: Integrated in Jenkins build Tika-trunk #1595 (See 
[https://builds.apache.org/job/Tika-trunk/1595/])
TIKA-2776 -- tika-server in legacy mode should ignore oom. (tallison: 
[https://github.com/apache/tika/commit/8d9061d281f7f46ecae8b902e27c49777ec43919])
* (edit) 
tika-server/src/test/java/org/apache/tika/server/TikaServerIntegrationTest.java
* (edit) tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java
* (edit) tika-server/src/main/java/org/apache/tika/server/ServerStatus.java
TIKA-2776 -- update CHANGES.txt (tallison: 
[https://github.com/apache/tika/commit/6710c52efe3cf8b220773ac9df85bb2f292d4415])
* (edit) CHANGES.txt


> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 2.0.0, 1.20
>
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-14 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687047#comment-16687047
 ] 

Tim Allison commented on TIKA-2776:
---

Oops...on mobile...2... should you notify Manifold project. Yes, once they are 
comfortable with the -spawnChild option, the client has to be able to handle 
the tika-server being actually down or 503 intermittently when it restarts on 
serious problem.

As for logging, let me know if you’ve gotten that working...that is critical.

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-14 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687043#comment-16687043
 ] 

Tim Allison commented on TIKA-2776:
---

>>. 1. Problem wasn’t with -spawnChild

OMG...legacy mode shouldn’t check if server is shutting down.  This is a bad 
bug.  If running in legacy mode, once you hit an oom, the server now returns 
503 indefinitely...ugh... at least you can use the new feature -spawnChild to 
get around the bug I introduced into legacy when I added -spawnChild...head in 
hands...will fix ASAP.

>> 2. Still seems to be running...
Great! Y. That’s the goal of -spawnChild. The server will restart itself on 
serious error/timeout.

>>3. On diff versions of java

I’m not aware of any significant differences, but I haven’t looked yet.

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-14 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686689#comment-16686689
 ] 

Mario Bisonti commented on TIKA-2776:
-

Hallo Tim.
 # The error "Caused by: java.lang.OutOfMemoryError: Java heap space" happened 
when I tried to use Tika, launching
 java -jar /opt/tika/tika-server-1.19.1.jar
 so WITHOUT the option "-spawnChild.
 # When you said _The other thing you might want to do...if you aren't 
already...is add a {{waitForServer}} loop along the lines of what I did in 
TikaServerIntegrationTest...for when your client hits a 503._ 
Do you mean to put the code that you mention, in the client that calls tika 
server?
In my case ManifoldCF ?
If yes, I will forward your suggestion to the ManifoldCF owner
 # Now I tried to start tika server in my windows host, to split ManildCF-Solr 
and Tika server, and the job is working by 5 hours without crash!
Note that in my widows host I use:
java -version
java version "1.8.0_92"
Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)

instead, in the Ubuntu host, where there are ManildCF-Solr and where I used, 
before this test of splitting, the tika server with the job that stpped 
repeatedly, I use:
java -version
openjdk version "10.0.2" 2018-07-17
OpenJDK Runtime Environment (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3)
OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.3, mixed mode)

 

Do you know if there is any issue about the java version where tika server runs?

 

Thanks a lot a lot.

 

Mario

 

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-14 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686620#comment-16686620
 ] 

Tim Allison commented on TIKA-2776:
---

The other thing you might want to do...if you aren't already...is add a 
{{waitForServer}} loop along the lines of what I did in 
TikaServerIntegrationTest...for when your client hits a 503.

{noformat}
private void awaitServerStartup() throws Exception {

Instant started = Instant.now();
long elapsed = Duration.between(started, Instant.now()).toMillis();
while (elapsed < 3) {
try {
Response response = WebClient
.create(endPoint + "/tika")
.accept("text/plain")
.get();
if (response.getStatus() == 200) {
return;
}
} catch (javax.ws.rs.ProcessingException e) {
}
Thread.sleep(100);
elapsed = Duration.between(started, Instant.now()).toMillis();
}

}
{noformat}

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-14 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686597#comment-16686597
 ] 

Tim Allison commented on TIKA-2776:
---

bq. Caused by: java.lang.OutOfMemoryError: Java heap space

Victory!!!  Try bumping up your -Xmx for the child process: {{-JXmx2g}} or 
similar.

This actually shows that {{-spawnChild}} is working...I think?

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-14 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686522#comment-16686522
 ] 

Tim Allison commented on TIKA-2776:
---

bq. I am not so expert on logging.. be patient please 

Ha!  No problem at all! 

bq. I configures log4j.xml and log4j_child.xml as in the attachment...I _think_ 
I just fixed this in TIKA-2782.  For now, you have to avoid {{debug="true"}}




> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-14 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686336#comment-16686336
 ] 

Mario Bisonti commented on TIKA-2776:
-

This is the errors that I see on my screen:
ERROR Problem with writing the data, class 
org.apache.tika.server.resource.TikaResource$5, ContentType: text/plain
WARN  Interceptor for {http://resource.server.tika.apache.org/}MetadataResource 
has thrown exception, unwinding now
org.apache.cxf.interceptor.Fault: Java heap space
at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleWriteException(JAXRSOutInterceptor.java:396)
at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:272)
at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:122)
at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:84)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at 
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:267)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:205)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:531)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.io.ByteArrayOutputStream.toByteArray(Unknown Source)
at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:166)
at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:120)
at 
org.apache.poi.openxml4j.util.ZipArchiveFakeEntry.(ZipArchiveFakeEntry.java:47)
at 
org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.(ZipInputStreamZipEntrySource.java:51)
at org.apache.poi.openxml4j.opc.ZipPackage.(ZipPackage.java:106)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:298)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:85)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:110)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at 
org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at 

[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-14 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686191#comment-16686191
 ] 

Mario Bisonti commented on TIKA-2776:
-

Hallo Tim.
I am not so expert on logging.. be patient please :-)

I configures log4j.xml and log4j_child.xml as in the attachment
I started:
java -jar /opt/tika/tika-server-1.19.1.jar -Dlog4j.configuration=file:log4j.xml 
-JDlog4j.configuration=file:log4j_child.xml

but I obtain:
administrator@sengvivv02:/opt/tika$ java -jar /opt/tika/tika-server-1.19.1.jar 
-Dlog4j.configuration=file:log4j.xml -JDlog4j.configuration=file:log4j_child.xml
Nov 14, 2018 9:14:58 AM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.

Nov 14, 2018 9:14:58 AM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
INFO  Starting Apache Tika 1.19.1 server
org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: 
-Dlog4j.configuration=file:log4j.xml
at org.apache.commons.cli.Parser.processOption(Parser.java:363)
at org.apache.commons.cli.Parser.parse(Parser.java:199)
at org.apache.commons.cli.Parser.parse(Parser.java:85)
at org.apache.tika.server.TikaServerCli.execute(TikaServerCli.java:133)
at org.apache.tika.server.TikaServerCli.main(TikaServerCli.java:117)
ERROR Can't start:
org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: 
-Dlog4j.configuration=file:log4j.xml
at org.apache.commons.cli.Parser.processOption(Parser.java:363)
at org.apache.commons.cli.Parser.parse(Parser.java:199)
at org.apache.commons.cli.Parser.parse(Parser.java:85)
at org.apache.tika.server.TikaServerCli.execute(TikaServerCli.java:133)
at org.apache.tika.server.TikaServerCli.main(TikaServerCli.java:117)

and no log in the directory.
What's wrong?

Thanks
Mario


 [^log4j_child.xml]  [^log4j.xml] 

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
> Attachments: log4j.xml, log4j_child.xml
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-13 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685881#comment-16685881
 ] 

Tim Allison commented on TIKA-2776:
---

The limitation on STDOUT is a serious (but trivial-to-fix) bug: TIKA-2782

 

I wonder if that's causing you problems?

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-13 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685853#comment-16685853
 ] 

Tim Allison commented on TIKA-2776:
---

Y, the above works with the caveat that log appenders should not go to STDOUT 
in the child process:  See: https://wiki.apache.org/tika/TikaJAXRS#Logging

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-13 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685641#comment-16685641
 ] 

Tim Allison commented on TIKA-2776:
---

I haven't actually tried the above yet... Will do so soon, though, and update 
our wiki.

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-13 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685640#comment-16685640
 ] 

Tim Allison commented on TIKA-2776:
---

bq. For me is very difficult to investigate why tika server child is 
restarted/crashed. Is there any way to log Tika server?

You should be able to use log4j for the parent process as you would expect: 
{{-Dlog4j.configuration=file:log4j.xml}}.  You can select between {{info}} and 
{{debug}} when you start the server: {{-log info}}

To configure logging in the child process, add {{-J}} to the beginning 
{{-JDlog4j.configuration=file:log4j_child.xml}}.

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-13 Thread Mario Bisonti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684907#comment-16684907
 ] 

Mario Bisonti commented on TIKA-2776:
-

Hallo Tim.
>From the ManifoldCF side I read the log:
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service 
interruptions - failure processing document: The target server failed to respond
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:489) 
[mcf-pull-agent.jar:?]
Caused by: org.apache.http.NoHttpResponseException: The target server failed to 
respond
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
 ~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
 ~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
 ~[httpcore-4.4.10.jar:4.4.10]
at 
org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
 ~[httpcore-4.4.10.jar:4.4.10]
at 
org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) 
~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
 ~[httpcore-4.4.10.jar:4.4.10]
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
 ~[httpcore-4.4.10.jar:4.4.10]
at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) 
~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) 
~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) 
~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
 ~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:118)
 ~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
 ~[httpclient-4.5.6.jar:4.5.6]
at 
org.apache.manifoldcf.agents.transformation.tikaservice.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:608)
 ~[?:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$MonitoredAddActivityWrapper.sendDocument(IncrementalIngester.java:3471)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter.addOrReplaceDocumentWithException(DocumentFilter.java:208)
 ~[?:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
 ~[mcf-agents.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
 ~[mcf-pull-agent.jar:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
 ~[mcf-pull-agent.jar:?]
at 
org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
 ~[?:?]
at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
~[mcf-pull-agent.jar:?]
 WARN 2018-11-13T09:50:58,546 (Worker thread '48') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Tika down, retrying: 
Connect to localhost:9998 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] 
failed: Connection refused (Connection refused)
 WARN 2018-11-13T09:50:58,606 (Worker thread '34') - Service interruption 
reported for job 1533797717712 connection 'WinShare': Tika down, retrying: 
Connect to localhost:9998 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] 
failed: Connection refused (Connection refused)
 WARN 2018-11-13T09:50:58,947 

[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-09 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681992#comment-16681992
 ] 

Tim Allison commented on TIKA-2776:
---

How often is the child crashing?  Can you tell what is causing the problems?  
Is the child actually starting successfully...are you getting any files parsed?

> Tika server child restart
> -
>
> Key: TIKA-2776
> URL: https://issues.apache.org/jira/browse/TIKA-2776
> Project: Tika
>  Issue Type: Bug
>Reporter: Mario Bisonti
>Priority: Major
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)