[ https://issues.apache.org/jira/browse/HDFS-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204308#comment-13204308 ]
Todd Lipcon commented on HDFS-2510: ----------------------------------- Sorry, missed the comment above: {quote} Similarly, I couldn't think of anything useful an operator could get from this. It also doesn't help the situation that currently all DN metrics are per-DN-daemon, not per BP offer service. Thus, it's not obvious how to get meaningful DN-side metrics for just a single namespace. {quote} I think a useful metric which could be exposed is {{max(time since last successful communication)}}. This would help diagnose if one of the racks gets partitioned off from one of the NNs, for example -- all of the DNs in that rack would start to rise in this metric. That said, the ones you've implemented here are fine and the most crucial, so +1 to the current patch and we can discuss adding some more DN-side metrics separately. > Add HA-related metrics > ---------------------- > > Key: HDFS-2510 > URL: https://issues.apache.org/jira/browse/HDFS-2510 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node, ha, name-node > Affects Versions: HA branch (HDFS-1623) > Reporter: Aaron T. Myers > Assignee: Aaron T. Myers > Attachments: HDFS-2510-HDFS-1623.patch, HDFS-2510.HDFS-1623.patch > > > Off the top of my head, I can think of: > NN metrics: > * A binary metric for active or standby > * The size of the pending DN message queues > * A timestamp for when the standby NN last read from shared edit log > * The difference between highest generation stamp seen from the shared edit > log and the highest generation stamp seen from any DN > It would probably also be useful to have a DN metric which somehow describes > which active/standby NNs its talking to, e.g. "times since last communicated > with standby/active NNs." > I'm sure there are others as well. Comments strongly encouraged. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira