[
https://issues.apache.org/jira/browse/TIKA-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tino Schöllhorn updated TIKA-4422:
----------------------------------
Issue Type: Improvement (was: Bug)
> Availability problem with TikaServer 3.1.0
> ------------------------------------------
>
> Key: TIKA-4422
> URL: https://issues.apache.org/jira/browse/TIKA-4422
> Project: Tika
> Issue Type: Improvement
> Components: tika-server
> Affects Versions: 3.1.0
> Environment: Java21
> Ubuntu22
>
> Reporter: Tino Schöllhorn
> Priority: Major
>
> Hi,
> we have a problem when running the TikaServer. We use Tika 3.1.0 on Ubuntu
> with Java21.
> Previously, we used Tika 2.4.x - there we could not observe this problem.
> We run a *lot* of text-extraction requests. After a few hours (8-10h) Tika is
> not able to restart its worker processes.
> Tika runs via systemd and via journalctl we see the following output:
>
> {noformat}
> May 28 04:39:39 dss-index java[350084]: INFO [pool-2-thread-1] 04:39:39,752
> org.apache.tika.server.core.TikaServerWatchDog forked process exited with
> exit value 3
> May 28 04:39:40 dss-index java[376963]: May 28, 2025 4:39:40 AM
> org.apache.cxf.endpoint.ServerImpl initDestination
> May 28 04:39:40 dss-index java[376963]: INFO: Setting the server's publish
> address to be http://localhost:9998/
> May 28 05:35:32 dss-index java[350084]: INFO [pool-2-thread-1] 05:35:32,896
> org.apache.tika.server.core.TikaServerWatchDog forked process exited with
> exit value 2
> May 28 05:35:34 dss-index java[377213]: May 28, 2025 5:35:34 AM
> org.apache.cxf.endpoint.ServerImpl initDestination
> May 28 05:35:34 dss-index java[377213]: INFO: Setting the server's publish
> address to be http://localhost:9998/{noformat}
> After these messages the TikaServer does not respond to requests any more. A
> restart of the Tika-Parent process is the only thing which helps.
> The error messages are emitted in TikaServerWatchDog:161. Yet, I do not
> understand what is going wrong here. Probably the messages are error
> messages from the OS. perror gives the following output:
> {noformat}
> OS error code 2: No such file or directory
> OS error code 3: No such process{noformat}
> Yet, it is unclear to me, what happens. Below you'll find the tika.config.
> As far as I understand the situation this seems a bug which has been
> introduced sometime between version 2.4.x and 3.1.0.
> Hope that someone has an idea what is going on and how this can be remedied.
> Tino
> – tika.config.start
> {code:java}
> <?xml version="1.0" encoding="UTF-8"?>
> <properties>
> <parsers>
> <parser class="org.apache.tika.parser.DefaultParser">
> </parser>
> </parsers>
> <server>
> <params>
> <port>9998</port>
> <host>localhost</host>
> <digest>sha256</digest>
> <digestMarkLimit>1000000</digestMarkLimit>
> <id></id>
> <cors>NONE</cors>
> <logLevel>info</logLevel>
> <returnStackTrace>false</returnStackTrace>
> <noFork>false</noFork>
> <taskTimeoutMillis>300000</taskTimeoutMillis>
> <maxForkedStartupMillis>120000</maxForkedStartupMillis>
> <maxRestarts>-1</maxRestarts>
> <maxFiles>25000</maxFiles>
> <javaPath>java</javaPath>
> <forkedJvmArgs>
> <arg>-Xms4g</arg>
> <arg>-Xmx4g</arg>
> <arg>-Dlog4j.configurationFile=tika-forked-log4j2.xml</arg>
> </forkedJvmArgs>
> <enableUnsecureFeatures>false</enableUnsecureFeatures>
> <endpoints>
> <endpoint>status</endpoint>
> <endpoint>tika</endpoint>
> <endpoint>rmeta</endpoint>
> <endpoint>language</endpoint>
> </endpoints>
> </params>
> </server>
> </properties>
> {code}
> – tika.config.stop
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)