[ https://issues.apache.org/jira/browse/FLINK-20833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267128#comment-17267128 ]
Robert Metzger commented on FLINK-20833: ---------------------------------------- 1) See my comment in the PR: I wasn't aware of the "numRestarts" metric. Maybe it adds more confusion to count the restarts and the failures in two metrics?! 4) Good question. Maybe add it into the Deployment / Advanced section? https://ci.apache.org/projects/flink/flink-docs-master/deployment/advanced/index.html > Expose pluggable interface for exception analysis and metrics reporting in > Execution Graph > ------------------------------------------------------------------------------------------- > > Key: FLINK-20833 > URL: https://issues.apache.org/jira/browse/FLINK-20833 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination > Affects Versions: 1.12.0 > Reporter: Zhenqiu Huang > Assignee: Zhenqiu Huang > Priority: Minor > Labels: pull-request-available > > For platform users of Apache flink, people usually want to classify the > failure reason( for example user code, networking, dependencies and etc) for > Flink jobs and emit metrics for those analyzed results. So that platform can > provide an accurate value for system reliability by distinguishing the > failure due to user logic from the system issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)