[jira] [Updated] (HDFS-17237) Remove IPCLoggerChannel Metrics when the logger is closed
[ https://issues.apache.org/jira/browse/HDFS-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17237: -- Hadoop Flags: Reviewed Target Version/s: 3.4.0, 3.3.7 Affects Version/s: 3.4.0 3.3.7 > Remove IPCLoggerChannel Metrics when the logger is closed > - > > Key: HDFS-17237 > URL: https://issues.apache.org/jira/browse/HDFS-17237 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0, 3.3.7 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.7 > > > When an IPCLoggerChannel is created (which is used to read from and write to > the Journal nodes) it also creates a metrics object. When the namenodes > failover, the IPC loggers are all closed and reopened in read mode on the new > SBNN or the read mode is closed on the SBNN and re-opened in write mode. The > closing frees the resources and discards the original IPCLoggerChannel object > and causes a new one to be created by the caller. > If a Journal node was down and added back to the cluster with the same > hostname, but a different IP, when the failover happens, you end up with 4 > metrics objects for the JNs: > 1. For for each of the original 3 IPs > 2. One for the new IP > The old stale metric will remain forever and will no longer be updated, > leading to confusing results in any tools that use the metrics for monitoring. > This change, ensures we un-register the metrics when the logger channel is > closed and a new metrics object gets created when the new channel is created. > I have added a small test to prove this, but also reproduced the original > issue on a docker cluster and validated it is resolved with this change in > place. > For info, the logger metrics look like: > {code} > { >"name" : "Hadoop:service=NameNode,name=IPCLoggerChannel-192.168.32.8-8485", > "modelerType" : "IPCLoggerChannel-192.168.32.8-8485", > "tag.Context" : "dfs", > "tag.IsOutOfSync" : "false", > "tag.Hostname" : "957e3e66f10b", > "QueuedEditsSize" : 0, > "LagTimeMillis" : 0, > "CurrentLagTxns" : 0 > } > {code} > Node the name includes the IP, rather than the hostname. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17237) Remove IPCLoggerChannel Metrics when the logger is closed
[ https://issues.apache.org/jira/browse/HDFS-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell updated HDFS-17237: - Fix Version/s: 3.4.0 3.3.7 > Remove IPCLoggerChannel Metrics when the logger is closed > - > > Key: HDFS-17237 > URL: https://issues.apache.org/jira/browse/HDFS-17237 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.7 > > > When an IPCLoggerChannel is created (which is used to read from and write to > the Journal nodes) it also creates a metrics object. When the namenodes > failover, the IPC loggers are all closed and reopened in read mode on the new > SBNN or the read mode is closed on the SBNN and re-opened in write mode. The > closing frees the resources and discards the original IPCLoggerChannel object > and causes a new one to be created by the caller. > If a Journal node was down and added back to the cluster with the same > hostname, but a different IP, when the failover happens, you end up with 4 > metrics objects for the JNs: > 1. For for each of the original 3 IPs > 2. One for the new IP > The old stale metric will remain forever and will no longer be updated, > leading to confusing results in any tools that use the metrics for monitoring. > This change, ensures we un-register the metrics when the logger channel is > closed and a new metrics object gets created when the new channel is created. > I have added a small test to prove this, but also reproduced the original > issue on a docker cluster and validated it is resolved with this change in > place. > For info, the logger metrics look like: > {code} > { >"name" : "Hadoop:service=NameNode,name=IPCLoggerChannel-192.168.32.8-8485", > "modelerType" : "IPCLoggerChannel-192.168.32.8-8485", > "tag.Context" : "dfs", > "tag.IsOutOfSync" : "false", > "tag.Hostname" : "957e3e66f10b", > "QueuedEditsSize" : 0, > "LagTimeMillis" : 0, > "CurrentLagTxns" : 0 > } > {code} > Node the name includes the IP, rather than the hostname. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17237) Remove IPCLoggerChannel Metrics when the logger is closed
[ https://issues.apache.org/jira/browse/HDFS-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-17237: -- Labels: pull-request-available (was: ) > Remove IPCLoggerChannel Metrics when the logger is closed > - > > Key: HDFS-17237 > URL: https://issues.apache.org/jira/browse/HDFS-17237 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > > When an IPCLoggerChannel is created (which is used to read from and write to > the Journal nodes) it also creates a metrics object. When the namenodes > failover, the IPC loggers are all closed and reopened in read mode on the new > SBNN or the read mode is closed on the SBNN and re-opened in write mode. The > closing frees the resources and discards the original IPCLoggerChannel object > and causes a new one to be created by the caller. > If a Journal node was down and added back to the cluster with the same > hostname, but a different IP, when the failover happens, you end up with 4 > metrics objects for the JNs: > 1. For for each of the original 3 IPs > 2. One for the new IP > The old stale metric will remain forever and will no longer be updated, > leading to confusing results in any tools that use the metrics for monitoring. > This change, ensures we un-register the metrics when the logger channel is > closed and a new metrics object gets created when the new channel is created. > I have added a small test to prove this, but also reproduced the original > issue on a docker cluster and validated it is resolved with this change in > place. > For info, the logger metrics look like: > {code} > { >"name" : "Hadoop:service=NameNode,name=IPCLoggerChannel-192.168.32.8-8485", > "modelerType" : "IPCLoggerChannel-192.168.32.8-8485", > "tag.Context" : "dfs", > "tag.IsOutOfSync" : "false", > "tag.Hostname" : "957e3e66f10b", > "QueuedEditsSize" : 0, > "LagTimeMillis" : 0, > "CurrentLagTxns" : 0 > } > {code} > Node the name includes the IP, rather than the hostname. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17237) Remove IPCLoggerChannel Metrics when the logger is closed
[ https://issues.apache.org/jira/browse/HDFS-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell updated HDFS-17237: - Summary: Remove IPCLoggerChannel Metrics when the logger is closed (was: Remove IPCLogger Metrics when the logger is closed) > Remove IPCLoggerChannel Metrics when the logger is closed > - > > Key: HDFS-17237 > URL: https://issues.apache.org/jira/browse/HDFS-17237 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > > When an IPCLoggerChannel is created (which is used to read from and write to > the Journal nodes) it also creates a metrics object. When the namenodes > failover, the IPC loggers are all closed and reopened in read mode on the new > SBNN or the read mode is closed on the SBNN and re-opened in write mode. The > closing frees the resources and discards the original IPCLoggerChannel object > and causes a new one to be created by the caller. > If a Journal node was down and added back to the cluster with the same > hostname, but a different IP, when the failover happens, you end up with 4 > metrics objects for the JNs: > 1. For for each of the original 3 IPs > 2. One for the new IP > The old stale metric will remain forever and will no longer be updated, > leading to confusing results in any tools that use the metrics for monitoring. > This change, ensures we un-register the metrics when the logger channel is > closed and a new metrics object gets created when the new channel is created. > I have added a small test to prove this, but also reproduced the original > issue on a docker cluster and validated it is resolved with this change in > place. > For info, the logger metrics look like: > {code} > { >"name" : "Hadoop:service=NameNode,name=IPCLoggerChannel-192.168.32.8-8485", > "modelerType" : "IPCLoggerChannel-192.168.32.8-8485", > "tag.Context" : "dfs", > "tag.IsOutOfSync" : "false", > "tag.Hostname" : "957e3e66f10b", > "QueuedEditsSize" : 0, > "LagTimeMillis" : 0, > "CurrentLagTxns" : 0 > } > {code} > Node the name includes the IP, rather than the hostname. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org