Hi all,

This discussion thread is to gauge community opinion and gather feedback on 
implementing a better exception hierarchy in Flink to identify exceptions that 
come from running “User job code” and exceptions coming from “Flink engine 
code”.

Problem:
Flink provides a distributed processing engine (SYSTEM) to run a data streaming 
job (USER). There are many places in code where the engine runs “user job 
provided java classes”, such as serialization/deserialization, configuration 
objects, credential loading, running setup() method on certain Operators.
Sometimes when evaluating a stack trace, it might be hard to automatically 
determine if an exception is arising out of a Flink engine problem, or a 
problem associated to a particular job.

Proposed way forward:
- It would be good to have an exception hierarchy maintained by Flink that 
separates out the exceptions arising from running “USER provided classes”. That 
way, we can improve our ability to automatically classify and mitigate these 
exceptions.
- We could also include separating out the places where exception originates 
based on function - FlinkSerializationException, FlinkConfigurationException.. 
etc. (we already have a similar concept with IncompatibleKeysException)
- This has synergy with FLIP-304: Pluggable Failure Enrichers (since it would 
simplify the logic in the USER/SYSTEM classifier there) [1].
- In addition, this has been discussed before in the context of updating the 
exception thrown by serialisers to be a Flink-specific serialisation exception 
instead of IllegalStateException [2]


Any thoughts on the above?

Regards,
Hong


[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-304%3A+Pluggable+Failure+Enrichers
[2] https://lists.apache.org/thread/0o859h1vdx6mwv0fqvmybpn574692jtg


Reply via email to