[ https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151004#comment-15151004 ]
Jamie Grier commented on FLINK-1502: ------------------------------------ I understand [~eastcirclek]'s points about using the InstanceID. This is a unique ID that is automatically generated (I believe). As such if you use it to namespace the metrics you will see new metrics names whenever new TaskMangers are created. Overtime this means the total # of metrics will grow and grow. From my experience it would be better to have a "logical" ID for each TaskManager in the cluster. Literally like (1, 2, 3, 4, etc) and use this value to namespace the metrics. This will provide better continuity over time as TaskManagers come up and down. However, I don't know if this concept actually exists inside Flink at the moment. Does it? I would suggest we use logical ids/indexes for TaskManager level metrics, as well as task level metrics, etc, as opposed to UUIDs. So rather than: taskmanager.<TASK_MANAGER_UUID_1>.gc_time taskmanager.<TASK_MANAGER_UUID_2>.gc_time and task.<TASK_UUID_1>.flatMap.messagesReceived task.<TASK_UUID_2>.flatMap.messagesReceived I would suggest something like cluster.<CLUSTER_NAME>.taskmanager.1.gc_time cluster.<CLUSTER_NAME>.taskmanager.2.gc_time and cluster.<CLUSTER_NAME>.task.1.flatMap.messagesReceived cluster.<CLUSTER_NAME>.task.2.flatMap.messagesReceived I hope that makes sense. The main point is to use Logical ID's wherever possible, especially for things that change otherwise there will be a lack of continuity in the metrics. Also I don't know that we actually have the CLUSTER_NAME concept right now either but we might need this. This would be unique for any given YarnSession if running on YARN for example. Basically we just need some way to group a set of TaskManagers uniquely. I guess this could also be done by using the UUID of the JobManager. Comments? > Expose metrics to graphite, ganglia and JMX. > -------------------------------------------- > > Key: FLINK-1502 > URL: https://issues.apache.org/jira/browse/FLINK-1502 > Project: Flink > Issue Type: Sub-task > Components: JobManager, TaskManager > Affects Versions: 0.9 > Reporter: Robert Metzger > Assignee: Dongwon Kim > Priority: Minor > Fix For: pre-apache > > > The metrics library allows to expose collected metrics easily to other > systems such as graphite, ganglia or Java's JVM (VisualVM). -- This message was sent by Atlassian JIRA (v6.3.4#6332)