It would be ideal to have some higher-level way of wrapping a PTransform to handle errors inside, but that indeed seems like a substantially trickier thing to implement.
On Fri, Oct 5, 2018 at 3:38 PM Reuven Lax <[email protected]> wrote: > Cool! I've left a few comments. > > This also makes me think whether we can implement this on ParDo as well, > though that might be a bit trickier since it involves hooking into > DoFnInvoker. > > Reuven > > On Fri, Oct 5, 2018 at 10:33 AM Jeff Klukas <[email protected]> wrote: > >> I've posted a full PR for the Java exception handling API that's ready >> for review: https://github.com/apache/beam/pull/6586 >> >> It implements new WithErrors nested classes on MapElements, >> FlatMapElements, Filter, AsJsons, and ParseJsons. >> >> On Wed, Oct 3, 2018 at 7:55 PM Jeff Klukas <[email protected]> wrote: >> >>> Jira issues for adding exception handling in Java and Python SDKs: >>> >>> https://issues.apache.org/jira/browse/BEAM-5638 >>> https://issues.apache.org/jira/browse/BEAM-5639 >>> >>> I'll plan to have a complete PR for the Java SDK put together in the >>> next few days. >>> >>> On Wed, Oct 3, 2018 at 1:29 PM Jeff Klukas <[email protected]> wrote: >>> >>>> I don't personally have experience with the Python SDK, so am not >>>> immediately in a position to comment on how feasible it would be to >>>> introduce a similar change there. I'll plan to write up two separate issues >>>> for adding exception handling in the Java and Python SDKs. >>>> >>>> On Wed, Oct 3, 2018 at 12:17 PM Thomas Weise <[email protected]> wrote: >>>> >>>>> +1 for the proposal as well as the suggestion to offer it in other >>>>> SDKs, where applicable >>>>> >>>>> On Wed, Oct 3, 2018 at 8:58 AM Chamikara Jayalath < >>>>> [email protected]> wrote: >>>>> >>>>>> Sounds like a very good addition. I'd say this can be a single PR >>>>>> since changes are related. Please open a JIRA for tracking. >>>>>> >>>>>> Have you though about introducing a similar change to Python SDK ? >>>>>> (doesn't have to be the same PR). >>>>>> >>>>>> - Cham >>>>>> >>>>>> On Wed, Oct 3, 2018 at 8:31 AM Jeff Klukas <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> If this looks good for MapElements, I agree that it makes sense to >>>>>>> extend to FlatMapElements and Filter and to keep the API consistent >>>>>>> between >>>>>>> them. >>>>>>> >>>>>>> Do you have suggestions on how to submit changes with that wider >>>>>>> scope? Would one PR altering MapElements, FlatMapElements, Filter, >>>>>>> ParseJsons, and AsJsons be too large to reasonably review? Should I >>>>>>> open an >>>>>>> overall JIRA ticket to track and break this into smaller PRs? >>>>>>> >>>>>>> On Wed, Oct 3, 2018 at 10:31 AM Reuven Lax <[email protected]> wrote: >>>>>>> >>>>>>>> Sounds cool. Why not support this on other transforms as well? >>>>>>>> (FlatMapElements, Filter, etc.) >>>>>>>> >>>>>>>> Reuven >>>>>>>> >>>>>>>> On Tue, Oct 2, 2018 at 4:49 PM Jeff Klukas <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I've seen a few Beam users mention the need to handle errors in >>>>>>>>> their transforms by using a try/catch and routing to different outputs >>>>>>>>> based on whether an exception was thrown. This was particularly nicely >>>>>>>>> written up in a post by Vallery Lancey: >>>>>>>>> >>>>>>>>> >>>>>>>>> https://medium.com/@vallerylancey/error-handling-elements-in-apache-beam-pipelines-fffdea91af2a >>>>>>>>> >>>>>>>>> I'd love to see this pattern better supported directly in the Beam >>>>>>>>> API, because it currently requires the user to implement a full DoFn >>>>>>>>> even >>>>>>>>> for the simplest cases. >>>>>>>>> >>>>>>>>> I propose we support for a MapElements-like transform that allows >>>>>>>>> the user to specify a set of exceptions to catch and route to a >>>>>>>>> failure >>>>>>>>> output. Something like: >>>>>>>>> >>>>>>>>> MapElements >>>>>>>>> .via(myFunctionThatThrows) >>>>>>>>> .withSuccessTag(successTag) >>>>>>>>> .withFailureTag(failureTag, JsonParsingException.class) >>>>>>>>> >>>>>>>>> which would output a PCollectionTuple with both the successful >>>>>>>>> outcomes of the map operation and also a collection of the inputs that >>>>>>>>> threw JsonParsingException. >>>>>>>>> >>>>>>>>> To make this more concrete, I put together a proof of concept PR: >>>>>>>>> https://github.com/apache/beam/pull/6518 I'd appreciate feedback >>>>>>>>> about whether this seems like a worthwhile addition and a feasible >>>>>>>>> approach. >>>>>>>>> >>>>>>>>
