The answer will be different for the different Beam runners, and even then probably different in batch and streaming runners.
On Fri, Sep 22, 2017 at 5:01 AM, Allison, Timothy B. <talli...@mitre.org> wrote: > @Eugene: What's the best way to have Beam help us with these issues, or do > these come for free with the Beam framework? > > 1) a process-level timeout (because you can't actually kill a thread in > Java) > While some runners might do this, many runners process many items in parallel on different threads. If this is necessary, the user code processing Tika should do it itself (e..g delegate processing to a new worker thread and kill the process if the worker thread exceeds some timeout). > 2) a process-level restart on OOM > I believe all current runners restart processes on any crash. > 3) avoid trying to reprocess a badly behaving document > There's no obvious way to do this. If an exception is thrown while processing a document, you can catch the exception and skip the document. However if processing the document causes the process to crash, then it will be retried.