[
https://issues.apache.org/jira/browse/STORM-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15820222#comment-15820222
]
Arun Mahadevan commented on STORM-2282:
---------------------------------------
bq. Its ok to keep retry-ing errors that are considered retry-worthy. The
non-retry worthy are the ones we must fail fast and move them out as they will
cause a jam and prevent good tuples from flowing. Also no good way to recover
from that.
bq.One approach to do this...
bq.For non-retry worthy error conditions (like bad data), any processing
element in the pipeline can throw a specific exception. This can then be
handled by the runtime to send that tuple to a configurable dead letter queue.
Ideally this needs a new kind of Fail() notification in the spout to avoid
re-emit. Alternatively, we can send the spout an ACK instead of FAIL to avoid
retry. The DeadLetterQ bolt's metrics will capture these failure metrics.
Better not to have each spout/bolt explicitly deal with dead letter queue ...
that will complicate the topology definition...as every spout bolt will need to
be configured and wired up.
bq. For retry-worthy errors (timeouts, destination unavailable, etc)... The
existing retry mechanism can kick in. However, in today's core API, there is
one pain point for spout writers. Each spout needs to implement the logic to
track inflight tuples and attempt retry on fail(). The implementation is
moderately complicated as ACKs/Fails can come in any order. All the spouts have
to do the same thing but end up doing slightly differently. Some have retry
limits, some don't. This retry logic should ideally be lifted out of the Spout
and handled in the API. This new API is a good opportunity to address this
issue.
Yes the idea is good if we can expose the right api for users and also keep the
implementation simple. At a high level there could be a global api at
StreamBuilder or a more granular one at a specific stage in the pipeline.
Something like,
{code:java}
// globally at stream builder
streamBuilder.setRetryPolicy(...);
Stream<T> errors = streamBuilder.deadLetterQueue();
// at stream api level
Stream<T>[] streams = stream.map(..).branchErrors();
Stream<T> success = stream[0];
Stream<T> errors = streams[1];
{code}
We also need to see how it will work with Storm's current timeout based replay
mechanism and so on. We can discuss further and come up with the right approach.
> Streams api - provide options for handling errors
> -------------------------------------------------
>
> Key: STORM-2282
> URL: https://issues.apache.org/jira/browse/STORM-2282
> Project: Apache Storm
> Issue Type: Sub-task
> Reporter: Arun Mahadevan
>
> Adding relevant discussions from PR 1693 below.
> Allow users to be explicit about how to handle errors. I don't know if any
> API out there does it... so this would be unique to Storm.
> In short:
> Broadly speaking there are two kinds of tuple process errors that the users
> need to be concerned about:
> 1- Retry worthy Errors - For instance, failure to deliver to destination
> service due to connection issues.
> 2- Not worth retrying - For instance, Parsing errors due to bad data in the
> tuple. These problems can jam up the pipeline if they are retried repeatedly.
> Such tuples can be sent to a configurable Dead-Letter-Queue.
> ---
> >1- Retry worthy Errors
> Right now theres no explicit `fail` api. When a stage in the stream completes
> processing (and possibly emits results), the underlying tuples are acked
> automatically. The only way spout will re-emit is after the message timeout.
> It may be good to have a fail fast api, but I am not sure how it would help.
> The replayed tuple could fail again in processing. Instead the processing
> logic itself can have some retry logic (say retry 3 times) and forward to an
> error stream and ack the tuple.
> > 2- Not worth retrying
> This can be handled via branch logic. E.g. send valid values to stream1 and
> bad values to stream2.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)