[ https://issues.apache.org/jira/browse/FLINK-14206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhu Zhu updated FLINK-14206: ---------------------------- Affects Version/s: 1.10.0 > Let fullRestart metric count fine grained restarts as well > ---------------------------------------------------------- > > Key: FLINK-14206 > URL: https://issues.apache.org/jira/browse/FLINK-14206 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination > Affects Versions: 1.9.0, 1.10.0 > Reporter: Zhu Zhu > Assignee: Zhu Zhu > Priority: Major > Labels: pull-request-available > Fix For: 1.9.1 > > Time Spent: 10m > Remaining Estimate: 0h > > With fine grained recovery introduced in 1.9.0, the {{fullRestart}} metric > only counts how many times the entire graph has been restarted, not including > the number of fine grained failure restarts. > As many users leverage this metric for failure detecting monitoring and > alerting, I'd propose to make it also count fine grained restarts. > The concrete proposal is: > - Add a counter {{numberOfRestartsCounter}} in {{ExecutionGraph}} to count > all restarts. The counter is not to be registered to metric groups. > - Let fullRestart query the value of the counter, instead of > {{ExecutionGraph#globalModVersion}} > - increment {{numberOfRestartsCounter}} in > {{ExecutionGraph#incrementGlobalModVersion()}} > - increment {{numberOfRestartsCounter}} in > {{AdaptedRestartPipelinedRegionStrategyNG#restartTasks(...)}}, to ensure that > the fine grained recovery really happens -- This message was sent by Atlassian Jira (v8.3.4#803005)