[jira] [Commented] (STORM-2282) Streams api - provide options for handling errors

Arun Mahadevan (JIRA) Wed, 11 Jan 2017 21:28:43 -0800

    [ 
https://issues.apache.org/jira/browse/STORM-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15820222#comment-15820222
 ]


Arun Mahadevan commented on STORM-2282:
---------------------------------------

bq. Its ok to keep retry-ing errors that are considered retry-worthy. The 
non-retry worthy are the ones we must fail fast and move them out as they will 
cause a jam and prevent good tuples from flowing. Also no good way to recover 
from that.

bq.One approach to do this...
bq.For non-retry worthy error conditions (like bad data), any processing 
element in the pipeline can throw a specific exception. This can then be 
handled by the runtime to send that tuple to a configurable dead letter queue. 
Ideally this needs a new kind of Fail() notification in the spout to avoid 
re-emit. Alternatively, we can send the spout an ACK instead of FAIL to avoid 
retry. The DeadLetterQ bolt's metrics will capture these failure metrics. 
Better not to have each spout/bolt explicitly deal with dead letter queue ... 
that will complicate the topology definition...as every spout bolt will need to 
be configured and wired up.
bq. For retry-worthy errors (timeouts, destination unavailable, etc)... The 
existing retry mechanism can kick in. However, in today's core API, there is 
one pain point for spout writers. Each spout needs to implement the logic to 
track inflight tuples and attempt retry on fail(). The implementation is 
moderately complicated as ACKs/Fails can come in any order. All the spouts have 
to do the same thing but end up doing slightly differently. Some have retry 
limits, some don't. This retry logic should ideally be lifted out of the Spout 
and handled in the API. This new API is a good opportunity to address this 
issue.

Yes the idea is good if we can expose the right api for users and also keep the 
implementation simple. At a high level there could be a global api at 
StreamBuilder or a more granular one at a specific stage in the pipeline. 
Something like,

{code:java}
// globally at stream builder
streamBuilder.setRetryPolicy(...);
Stream<T> errors = streamBuilder.deadLetterQueue();

// at stream api level
Stream<T>[] streams = stream.map(..).branchErrors();
Stream<T> success = stream[0];
Stream<T> errors = streams[1];
{code}

We also need to see how it will work with Storm's current timeout based replay 
mechanism and so on. We can discuss further and come up with the right approach.

> Streams api - provide options for handling errors
> -------------------------------------------------
>
>                 Key: STORM-2282
>                 URL: https://issues.apache.org/jira/browse/STORM-2282
>             Project: Apache Storm
>          Issue Type: Sub-task
>            Reporter: Arun Mahadevan
>
> Adding relevant discussions from PR 1693 below.
> Allow users to be explicit about how to handle errors. I don't know if any 
> API out there does it... so this would be unique to Storm.
> In short:
> Broadly speaking there are two kinds of tuple process errors that the users 
> need to be concerned about:
> 1- Retry worthy Errors - For instance, failure to deliver to destination 
> service due to connection issues.
> 2- Not worth retrying - For instance, Parsing errors due to bad data in the 
> tuple. These problems can jam up the pipeline if they are retried repeatedly. 
> Such tuples can be sent to a configurable Dead-Letter-Queue.
> ---
> >1- Retry worthy Errors
> Right now theres no explicit `fail` api. When a stage in the stream completes 
> processing (and possibly emits results), the underlying tuples are acked 
> automatically. The only way spout will re-emit is after the message timeout. 
> It may be good to have a fail fast api, but I am not sure how it would help. 
> The replayed tuple could fail again in processing. Instead the processing 
> logic itself can have some retry logic (say retry 3 times) and forward to an 
> error stream  and ack the tuple.
> > 2- Not worth retrying
> This can be handled via branch logic. E.g. send valid values to stream1 and 
> bad values to stream2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-2282) Streams api - provide options for handling errors

Reply via email to