[ https://issues.apache.org/jira/browse/FLINK-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088602#comment-16088602 ]
Chesnay Schepler commented on FLINK-7200: ----------------------------------------- You can configure the the components contained in the metric name using scope formats, as described in the metrics documentation: https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/metrics.html#scope All your suggestions can be accomplished with this feature. > Make metrics more Datadog friendly > ---------------------------------- > > Key: FLINK-7200 > URL: https://issues.apache.org/jira/browse/FLINK-7200 > Project: Flink > Issue Type: Improvement > Components: Metrics > Affects Versions: 1.3.1 > Reporter: Robert Batts > Priority: Minor > > The current output of the Datadog Reporter is a little unfriendly to the > platform they are going to from a metrics name perspective. Take for example > the metric used reporting with the Datadog Kafka integration. > kafka.consumer_lag=0000 [topic:xxxx, consumer_group: yyyy, partition: 0000] > Through the use of tags (in this case topic, consumer_group, and partition) > you can create graphs in Datadog filtered to a specific topic and > consumer_group and then averaged on each partition. This allows you to > visualize something like a heatmap for lag on each partition for a consumer. > So what am I suggesting for Flink? Currently, I think the tags for Datadog > are in a great place. Tags like job_id and subtask_id would be great for > filtering and grouping. But, the metric name is currently too specific to a > taskmanager and subtask. Currently, the metrics look something like this: > flink_w04.taskmanager.4f378aff5730.TwitterExample.ExtractHashtags.7.numRecordsOut > {host}.taskmanager.{tm_id}.{job_name}.{operator_name}.{subtask_index}.{metric_name} > What I am suggesting is something more like this: > taskmanager.TwitterExample.ExtractHashtags.numRecordsOut > taskmanager.{job_name}.{operator_name}.{metric_name} > (or even taskmanager.{metric_name}, but that would be a lot of tags on a > single metric) > By doing this someone could create a graph on the numRecordsOut for an entire > task's metric with a single metric in Datadog rather than combining the > metric for every subtask_index using the tm_id metric (that could change if a > tm_id dropped out of the cluster.) Additionally, given the current set of > tags being output to Datadog there is a ton of grouping and filtering that > will be available if everything was on a simplified metric. -- This message was sent by Atlassian JIRA (v6.4.14#64029)