Hey Hong,

Keep in mind that Flink 2.0 is also under discussion and breaking changes
could be introduced -- lets just make sure there is real value in a cleaner
exception hierarchy (which I believe there is).

Cheers,
Panagiotis

On Sat, Jun 10, 2023 at 4:22 AM Teoh, Hong <lian...@amazon.co.uk.invalid>
wrote:

> Thanks for the engagement on the thread! Sorry for the late reply, was off
> on holidays for a bit.
>
> @Paul
>
> Thanks for linking the historical discussion. Yes I would agree that using
> classloading to determine if the exception type has come from a User
> classloader rather than System classloader would be helpful.
>
> In my opinion, we should enhance this further by also introducing a good
> exception hierarchy depending on where the USER code was called. However, I
> also note that this might be a breaking change for some, because they might
> rely on the current exception type for job management. We could address
> this by wrapping the existing exception rather than replacing.
>
> @Panagiotis
> I agree with all your points. This proposal is in synergy with Pluggable
> Failure Enrichers.
>
> Regards,
> Hong
>
> > On 6 Jun 2023, at 06:50, Panagiotis Garefalakis <pga...@apache.org>
> wrote:
> >
> > CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
> >
> >
> >
> > Thanks for bringing this up Hong!
> >
> > Classifying exceptions was also the main driving factor behind pluggable
> > failure enrichers <https://issues.apache.org/jira/browse/FLINK-31508>.
> > However, we could do a much better job maintaining a hierarchy of System
> > and User exceptions thus making the classification logic more
> > straightforward.
> >
> >   - Defining better system/user exceptions with some kind of hierarchy is
> >   definitely a step forward (and refactoring the existing ones)
> >   - Classloader filtering could definitely be used for discovering errors
> >   originating from user defined code, see doc
> >   <
> https://docs.google.com/document/d/1pcHg9F3GoDDeVD5GIIo2wO67Hmjgy0-hRDeuFnrMgT4/edit#heading=h.ato31xdnm7nk
> >
> >   - Eventually we could also release a simple failure enricher using the
> >   above improvements to automatically classify errors on JMs exceptions
> >   endpoint
> >
> > Cheers,
> > Panagiotis
> >
> > On Wed, May 31, 2023 at 9:12 PM Paul Lam <paullin3...@gmail.com> wrote:
> >
> >> Hi Hong,
> >>
> >> Thanks for starting the discussion! I believe the exception
> classification
> >> between
> >> user exceptions and system exceptions has been long-awaited.
> >>
> >> It's worth mentioning that years ago there was a related discussion [1],
> >> FYI.
> >>
> >> I’m in favor of the heuristic approach to classify the exceptions by
> which
> >> classloader it comes from. In addition, we could introduce extra
> >> configurations
> >> to allow manual execution classification based on the package name of
> >> exceptions.
> >>
> >> [1] https://lists.apache.org/thread/gms4nysnb3o4v2k6421m5hsq0g7gtr81
> >>
> >> Best,
> >> Paul Lam
> >>
> >>> 2023年5月25日 23:07,Teoh, Hong <lian...@amazon.co.uk.INVALID> 写道:
> >>>
> >>> Hi all,
> >>>
> >>> This discussion thread is to gauge community opinion and gather
> feedback
> >> on implementing a better exception hierarchy in Flink to identify
> >> exceptions that come from running “User job code” and exceptions coming
> >> from “Flink engine code”.
> >>>
> >>> Problem:
> >>> Flink provides a distributed processing engine (SYSTEM) to run a data
> >> streaming job (USER). There are many places in code where the engine
> runs
> >> “user job provided java classes”, such as serialization/deserialization,
> >> configuration objects, credential loading, running setup() method on
> >> certain Operators.
> >>> Sometimes when evaluating a stack trace, it might be hard to
> >> automatically determine if an exception is arising out of a Flink engine
> >> problem, or a problem associated to a particular job.
> >>>
> >>> Proposed way forward:
> >>> - It would be good to have an exception hierarchy maintained by Flink
> >> that separates out the exceptions arising from running “USER provided
> >> classes”. That way, we can improve our ability to automatically classify
> >> and mitigate these exceptions.
> >>> - We could also include separating out the places where exception
> >> originates based on function - FlinkSerializationException,
> >> FlinkConfigurationException.. etc. (we already have a similar concept
> with
> >> IncompatibleKeysException)
> >>> - This has synergy with FLIP-304: Pluggable Failure Enrichers (since it
> >> would simplify the logic in the USER/SYSTEM classifier there) [1].
> >>> - In addition, this has been discussed before in the context of
> updating
> >> the exception thrown by serialisers to be a Flink-specific serialisation
> >> exception instead of IllegalStateException [2]
> >>>
> >>>
> >>> Any thoughts on the above?
> >>>
> >>> Regards,
> >>> Hong
> >>>
> >>>
> >>> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-304%3A+Pluggable+Failure+Enrichers
> >>> [2] https://lists.apache.org/thread/0o859h1vdx6mwv0fqvmybpn574692jtg
> >>>
> >>>
> >>
> >>
>
>

Reply via email to