The answer will be different for the different Beam runners, and even then
probably different in batch and streaming runners.

On Fri, Sep 22, 2017 at 5:01 AM, Allison, Timothy B. <talli...@mitre.org>
wrote:

> @Eugene: What's the best way to have Beam help us with these issues, or do
> these come for free with the Beam framework?
>
> 1) a process-level timeout (because you can't actually kill a thread in
> Java)
>

While some runners might do this, many runners process many items in
parallel on different threads. If this is necessary, the user code
processing Tika should do it itself (e..g delegate processing to a new
worker thread and kill the process if the worker thread exceeds some
timeout).


> 2) a process-level restart on OOM
>

I believe all current runners restart processes on any crash.


> 3) avoid trying to reprocess a badly behaving document
>

There's no obvious way to do this. If an exception is thrown while
processing a document, you can catch the exception and skip the document.
However if processing the document causes the process to crash, then it
will be retried.

Reply via email to