[ 
https://issues.apache.org/jira/browse/HDFS-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17237:
----------------------------------
    Labels: pull-request-available  (was: )

> Remove IPCLoggerChannel Metrics when the logger is closed
> ---------------------------------------------------------
>
>                 Key: HDFS-17237
>                 URL: https://issues.apache.org/jira/browse/HDFS-17237
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>
> When an IPCLoggerChannel is created (which is used to read from and write to 
> the Journal nodes) it also creates a metrics object. When the namenodes 
> failover, the IPC loggers are all closed and reopened in read mode on the new 
> SBNN or the read mode is closed on the SBNN and re-opened in write mode. The 
> closing frees the resources and discards the original IPCLoggerChannel object 
> and causes a new one to be created by the caller.
> If a Journal node was down and added back to the cluster with the same 
> hostname, but a different IP, when the failover happens, you end up with 4 
> metrics objects for the JNs:
> 1. For for each of the original 3 IPs
> 2. One for the new IP
> The old stale metric will remain forever and will no longer be updated, 
> leading to confusing results in any tools that use the metrics for monitoring.
> This change, ensures we un-register the metrics when the logger channel is 
> closed and a new metrics object gets created when the new channel is created.
> I have added a small test to prove this, but also reproduced the original 
> issue on a docker cluster and validated it is resolved with this change in 
> place.
> For info, the logger metrics look like:
> {code}
> {
>    "name" : "Hadoop:service=NameNode,name=IPCLoggerChannel-192.168.32.8-8485",
>     "modelerType" : "IPCLoggerChannel-192.168.32.8-8485",
>     "tag.Context" : "dfs",
>     "tag.IsOutOfSync" : "false",
>     "tag.Hostname" : "957e3e66f10b",
>     "QueuedEditsSize" : 0,
>     "LagTimeMillis" : 0,
>     "CurrentLagTxns" : 0
>   }
> {code}
> Node the name includes the IP, rather than the hostname.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to