I've seen a few Beam users mention the need to handle errors in their
transforms by using a try/catch and routing to different outputs based on
whether an exception was thrown. This was particularly nicely written up in
a post by Vallery Lancey:

https://medium.com/@vallerylancey/error-handling-elements-in-apache-beam-pipelines-fffdea91af2a

I'd love to see this pattern better supported directly in the Beam API,
because it currently requires the user to implement a full DoFn even for
the simplest cases.

I propose we support for a MapElements-like transform that allows the user
to specify a set of exceptions to catch and route to a failure output.
Something like:

MapElements
.via(myFunctionThatThrows)
.withSuccessTag(successTag)
.withFailureTag(failureTag, JsonParsingException.class)

which would output a PCollectionTuple with both the successful outcomes of
the map operation and also a collection of the inputs that threw
JsonParsingException.

To make this more concrete, I put together a proof of concept PR:
https://github.com/apache/beam/pull/6518  I'd appreciate feedback about
whether this seems like a worthwhile addition and a feasible approach.

Reply via email to