Zhenzhong Xu created FLINK-7894: ----------------------------------- Summary: Improve metrics around fine-grained recovery and associated checkpointing behaviors Key: FLINK-7894 URL: https://issues.apache.org/jira/browse/FLINK-7894 Project: Flink Issue Type: Improvement Affects Versions: 1.3.2, 1.4.0 Reporter: Zhenzhong Xu
Currently, the only metric around fine-grained recovery is "task_failures". It's a very high level metric, it would be nice to have the following improvements: * Allows slice and dice into which tasks were restarted. * Recovery duration. * Recovery associated checkpoint behaviors: cancels, failures, etc -- This message was sent by Atlassian JIRA (v6.4.14#64029)