Frederic Hemery created FLINK-24756:
---------------------------------------

             Summary: Flink metric identifiers contain group variables.
                 Key: FLINK-24756
                 URL: https://issues.apache.org/jira/browse/FLINK-24756
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Metrics
            Reporter: Frederic Hemery


Metric identifiers are built by concatenating the closest 
{{ComponentMetricGroup}} metric identifier (which is configurable) and the 
whole hierarchy of groups that have been added.

In a monitoring system like Datadog, it poses a challenge because it is tricky 
to aggregate across metric identifiers. Instead, it relies on the same metric 
identifier and different tags to distinguish between different timeseries.

 

Using Flink Datadog integration, we get:
||Metric Name||Tags||
|flink.operator.KafkaSourceReader.topic.resources.partition.0.committedOffset|[topic:resources,partition:0]|
|flink.operator.KafkaSourceReader.topic.resources.partition.1.committedOffset|[topic:resources,partition:1]|
|flink.operator.KafkaSourceReader.topic.resources.partition.2.committedOffset|[topic:resources,partition:2]|
|flink.operator.KafkaSourceReader.topic.resources.partition.3.committedOffset|[topic:resources,partition:3]|
|flink.operator.KafkaSourceReader.topic.resources.partition.4.committedOffset|[topic:resources,partition:4]|
|flink.operator.KafkaSourceReader.topic.resources.partition.5.committedOffset|[topic:resources,partition:5]|
|flink.operator.KafkaSourceReader.topic.resources.partition.6.committedOffset|[topic:resources,partition:6]|
|...|...|

Instead, the native way to represent metrics in Datadog would be:
||Metric Name||Tags||
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:0]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:1]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:2]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:3]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:4]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:5]|
|flink.operator.KafkaSourceReader.committedOffset|[topic:resources,partition:6]|
|...|...|

The recommended way to configure the scopes for the {{ComponentMetricGroup}} in 
[Datadog Docs|https://docs.datadoghq.com/integrations/flink/#metric-collection] 
is to remove all the scopes from the templates for the same reason.

 

The metric identifier is built from the scopes and the tags are built from the 
variables. The issue seems to come from groups being part of both the scopes 
and the user variables. We can override this behavior by creating a custom 
metric group for user reported metrics but this is impossible to override for 
metrics reported by Flink itself (in particular [native 
RocksDB|https://github.com/apache/flink/blob/664fdaeaccf910c587f3478dd80bb327b441e85a/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBNativeMetricMonitor.java#L78-L80]
 metrics and 
[Kafka|https://github.com/apache/flink/blob/99c2a415e9eeefafacf70762b6f54070f7911ceb/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/AbstractFetcher.java#L501-L506]
 metrics).

 

I couldn't think of a simple, clean and backward compatible way to achieve such 
a change though so I'm looking for feedback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to