[jira] [Updated] (FLINK-14206) Let fullRestart metric count fine grained restarts as well

2019-09-28 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-14206:

Fix Version/s: (was: 1.9.1)
   1.9.2

> Let fullRestart metric count fine grained restarts as well
> --
>
> Key: FLINK-14206
> URL: https://issues.apache.org/jira/browse/FLINK-14206
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0, 1.9.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> With fine grained recovery introduced in 1.9.0, the {{fullRestart}} metric 
> only counts how many times the entire graph has been restarted, not including 
> the number of fine grained failure restarts.
> As many users leverage this metric for failure detecting monitoring and 
> alerting, I'd propose to make it also count fine grained restarts.
> The concrete proposal is:
> - Add a counter {{numberOfRestartsCounter}} in {{ExecutionGraph}} to count 
> all restarts. The counter is not to be registered to metric groups.
> - Let fullRestart query the value of the counter, instead of 
> {{ExecutionGraph#globalModVersion}}
> - increment {{numberOfRestartsCounter}} in 
> {{ExecutionGraph#incrementGlobalModVersion()}}
> - increment {{numberOfRestartsCounter}} in 
> {{AdaptedRestartPipelinedRegionStrategyNG#restartTasks(...)}}, to ensure that 
> the fine grained recovery really happens



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-14206) Let fullRestart metric count fine grained restarts as well

2019-09-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-14206:

Fix Version/s: 1.10.0

> Let fullRestart metric count fine grained restarts as well
> --
>
> Key: FLINK-14206
> URL: https://issues.apache.org/jira/browse/FLINK-14206
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0, 1.9.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> With fine grained recovery introduced in 1.9.0, the {{fullRestart}} metric 
> only counts how many times the entire graph has been restarted, not including 
> the number of fine grained failure restarts.
> As many users leverage this metric for failure detecting monitoring and 
> alerting, I'd propose to make it also count fine grained restarts.
> The concrete proposal is:
> - Add a counter {{numberOfRestartsCounter}} in {{ExecutionGraph}} to count 
> all restarts. The counter is not to be registered to metric groups.
> - Let fullRestart query the value of the counter, instead of 
> {{ExecutionGraph#globalModVersion}}
> - increment {{numberOfRestartsCounter}} in 
> {{ExecutionGraph#incrementGlobalModVersion()}}
> - increment {{numberOfRestartsCounter}} in 
> {{AdaptedRestartPipelinedRegionStrategyNG#restartTasks(...)}}, to ensure that 
> the fine grained recovery really happens



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-14206) Let fullRestart metric count fine grained restarts as well

2019-09-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-14206:

Affects Version/s: 1.10.0

> Let fullRestart metric count fine grained restarts as well
> --
>
> Key: FLINK-14206
> URL: https://issues.apache.org/jira/browse/FLINK-14206
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> With fine grained recovery introduced in 1.9.0, the {{fullRestart}} metric 
> only counts how many times the entire graph has been restarted, not including 
> the number of fine grained failure restarts.
> As many users leverage this metric for failure detecting monitoring and 
> alerting, I'd propose to make it also count fine grained restarts.
> The concrete proposal is:
> - Add a counter {{numberOfRestartsCounter}} in {{ExecutionGraph}} to count 
> all restarts. The counter is not to be registered to metric groups.
> - Let fullRestart query the value of the counter, instead of 
> {{ExecutionGraph#globalModVersion}}
> - increment {{numberOfRestartsCounter}} in 
> {{ExecutionGraph#incrementGlobalModVersion()}}
> - increment {{numberOfRestartsCounter}} in 
> {{AdaptedRestartPipelinedRegionStrategyNG#restartTasks(...)}}, to ensure that 
> the fine grained recovery really happens



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-14206) Let fullRestart metric count fine grained restarts as well

2019-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-14206:
---
Labels: pull-request-available  (was: )

> Let fullRestart metric count fine grained restarts as well
> --
>
> Key: FLINK-14206
> URL: https://issues.apache.org/jira/browse/FLINK-14206
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.1
>
>
> With fine grained recovery introduced in 1.9.0, the {{fullRestart}} metric 
> only counts how many times the entire graph has been restarted, not including 
> the number of fine grained failure restarts.
> As many users leverage this metric for failure detecting monitoring and 
> alerting, I'd propose to make it also count fine grained restarts.
> The concrete proposal is:
> - Add a counter {{numberOfRestartsCounter}} in {{ExecutionGraph}} to count 
> all restarts. The counter is not to be registered to metric groups.
> - Let fullRestart query the value of the counter, instead of 
> {{ExecutionGraph#globalModVersion}}
> - increment {{numberOfRestartsCounter}} in 
> {{ExecutionGraph#incrementGlobalModVersion()}}
> - increment {{numberOfRestartsCounter}} in 
> {{AdaptedRestartPipelinedRegionStrategyNG#restartTasks(...)}}, to ensure that 
> the fine grained recovery really happens



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-14206) Let fullRestart metric count fine grained restarts as well

2019-09-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-14206:

Description: 
With fine grained recovery introduced in 1.9.0, the {{fullRestart}} metric only 
counts how many times the entire graph has been restarted, not including the 
number of fine grained failure restarts.

As many users leverage this metric for failure detecting monitoring and 
alerting, I'd propose to make it also count fine grained restarts.

The concrete proposal is:
- Add a counter {{numberOfRestartsCounter}} in {{ExecutionGraph}} to count all 
restarts. The counter is not to be registered to metric groups.
- Let fullRestart query the value of the counter, instead of 
{{ExecutionGraph#globalModVersion}}
- increment {{numberOfRestartsCounter}} in 
{{ExecutionGraph#incrementGlobalModVersion()}}
- increment {{numberOfRestartsCounter}} in 
{{AdaptedRestartPipelinedRegionStrategyNG#restartTasks(...)}}, to ensure that 
the fine grained recovery really happens


  was:
With fine grained recovery introduced in 1.9.0, the {{fullRestart}} metric only 
counts how many times the entire graph has been restarted, not including the 
number of fine grained failure restarts.

As many users leverage this metric for failure detecting monitoring and 
alerting, I'd propose to make it also count fine grained failure restarts.

The concrete proposal is:
1. Add a counter  {{numberOfRestartCounter}} in ExecutionGraph to count all 
restarts. The counter is not to be registered to metric groups.
2. Let {{fullRestart}} query the value of the counter, instead of 
{{ExecutionGraph#globalModVersion}}
3. increment {{numberOfRestartCounter}} in {{ExecutionGraph#failGlobal}}
4. increment {{numberOfRestartCounter}} in 
{{ExecutionGraph#notifyExecutionChange}} where notifying the failover strategy, 
or maybe in {{AdaptedRestartPipelinedRegionStrategyNG}} to only count failovers 
really happened



> Let fullRestart metric count fine grained restarts as well
> --
>
> Key: FLINK-14206
> URL: https://issues.apache.org/jira/browse/FLINK-14206
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Major
> Fix For: 1.9.1
>
>
> With fine grained recovery introduced in 1.9.0, the {{fullRestart}} metric 
> only counts how many times the entire graph has been restarted, not including 
> the number of fine grained failure restarts.
> As many users leverage this metric for failure detecting monitoring and 
> alerting, I'd propose to make it also count fine grained restarts.
> The concrete proposal is:
> - Add a counter {{numberOfRestartsCounter}} in {{ExecutionGraph}} to count 
> all restarts. The counter is not to be registered to metric groups.
> - Let fullRestart query the value of the counter, instead of 
> {{ExecutionGraph#globalModVersion}}
> - increment {{numberOfRestartsCounter}} in 
> {{ExecutionGraph#incrementGlobalModVersion()}}
> - increment {{numberOfRestartsCounter}} in 
> {{AdaptedRestartPipelinedRegionStrategyNG#restartTasks(...)}}, to ensure that 
> the fine grained recovery really happens



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-14206) Let fullRestart metric count fine grained restarts as well

2019-09-25 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-14206:

Summary: Let fullRestart metric count fine grained restarts as well  (was: 
Make fullRestart metric to count fine grained restarts as well)

> Let fullRestart metric count fine grained restarts as well
> --
>
> Key: FLINK-14206
> URL: https://issues.apache.org/jira/browse/FLINK-14206
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0
>Reporter: Zhu Zhu
>Priority: Major
> Fix For: 1.9.1
>
>
> With fine grained recovery introduced in 1.9.0, the {{fullRestart}} metric 
> only counts how many times the entire graph has been restarted, not including 
> the number of fine grained failure restarts.
> As many users leverage this metric for failure detecting monitoring and 
> alerting, I'd propose to make it also count fine grained failure restarts.
> The concrete proposal is:
> 1. Add a counter  {{numberOfRestartCounter}} in ExecutionGraph to count all 
> restarts. The counter is not to be registered to metric groups.
> 2. Let {{fullRestart}} query the value of the counter, instead of 
> {{ExecutionGraph#globalModVersion}}
> 3. increment {{numberOfRestartCounter}} in {{ExecutionGraph#failGlobal}}
> 4. increment {{numberOfRestartCounter}} in 
> {{ExecutionGraph#notifyExecutionChange}} where notifying the failover 
> strategy, or maybe in {{AdaptedRestartPipelinedRegionStrategyNG}} to only 
> count failovers really happened



--
This message was sent by Atlassian Jira
(v8.3.4#803005)