[
https://issues.apache.org/jira/browse/TIKA-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605919#comment-16605919
]
Tim Allison commented on TIKA-2725:
-----------------------------------
{quote}In this approach, probably it is the only way ...
What is tika-server typical env? stand-alone, distributed ... like replicas in
cluster?
Are there some time limitation for recovery? How do we know what point to start
processing from?
Do we mark documents which were processed?
For example, if tika-server had run on Docker swarm/K8S then orchestrator would
have restarted a failed replica itself ...
{quote}
> Make tika-server robust against ooms/infinite loops/memory leaks
> ----------------------------------------------------------------
>
> Key: TIKA-2725
> URL: https://issues.apache.org/jira/browse/TIKA-2725
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Assignee: Tim Allison
> Priority: Major
>
> Currently, tika-server is vulnerable to ooms, inifinite loops and memory
> leaks. I see two ways of making it robust:
> 1) use the ForkParser
> 2) have tika-server spawn a child process that actually runs the server, put
> a watcher thread in the child that will kill the child on oom/timeout/after x
> files. The parent process can then restart the child if it dies.
> I somewhat prefer 2) so that we don't have to doubly pass the inputstream. I
> propose 2), and I propose making it optional in Tika 1.x, but then the
> default in Tika 2.x. We could also add a status ping from parent to child in
> case the child gets caught up in stop the world gc (h/t [~bleskes]).
> Other options/recommendations?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)