[jira] [Updated] (HDFS-17237) Remove IPCLoggerChannel Metrics when the logger is closed

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17237:
--
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0, 3.3.7
Affects Version/s: 3.4.0
   3.3.7

> Remove IPCLoggerChannel Metrics when the logger is closed
> -
>
> Key: HDFS-17237
> URL: https://issues.apache.org/jira/browse/HDFS-17237
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0, 3.3.7
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.7
>
>
> When an IPCLoggerChannel is created (which is used to read from and write to 
> the Journal nodes) it also creates a metrics object. When the namenodes 
> failover, the IPC loggers are all closed and reopened in read mode on the new 
> SBNN or the read mode is closed on the SBNN and re-opened in write mode. The 
> closing frees the resources and discards the original IPCLoggerChannel object 
> and causes a new one to be created by the caller.
> If a Journal node was down and added back to the cluster with the same 
> hostname, but a different IP, when the failover happens, you end up with 4 
> metrics objects for the JNs:
> 1. For for each of the original 3 IPs
> 2. One for the new IP
> The old stale metric will remain forever and will no longer be updated, 
> leading to confusing results in any tools that use the metrics for monitoring.
> This change, ensures we un-register the metrics when the logger channel is 
> closed and a new metrics object gets created when the new channel is created.
> I have added a small test to prove this, but also reproduced the original 
> issue on a docker cluster and validated it is resolved with this change in 
> place.
> For info, the logger metrics look like:
> {code}
> {
>"name" : "Hadoop:service=NameNode,name=IPCLoggerChannel-192.168.32.8-8485",
> "modelerType" : "IPCLoggerChannel-192.168.32.8-8485",
> "tag.Context" : "dfs",
> "tag.IsOutOfSync" : "false",
> "tag.Hostname" : "957e3e66f10b",
> "QueuedEditsSize" : 0,
> "LagTimeMillis" : 0,
> "CurrentLagTxns" : 0
>   }
> {code}
> Node the name includes the IP, rather than the hostname.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17237) Remove IPCLoggerChannel Metrics when the logger is closed

2023-10-24 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-17237:
-
Fix Version/s: 3.4.0
   3.3.7

> Remove IPCLoggerChannel Metrics when the logger is closed
> -
>
> Key: HDFS-17237
> URL: https://issues.apache.org/jira/browse/HDFS-17237
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.7
>
>
> When an IPCLoggerChannel is created (which is used to read from and write to 
> the Journal nodes) it also creates a metrics object. When the namenodes 
> failover, the IPC loggers are all closed and reopened in read mode on the new 
> SBNN or the read mode is closed on the SBNN and re-opened in write mode. The 
> closing frees the resources and discards the original IPCLoggerChannel object 
> and causes a new one to be created by the caller.
> If a Journal node was down and added back to the cluster with the same 
> hostname, but a different IP, when the failover happens, you end up with 4 
> metrics objects for the JNs:
> 1. For for each of the original 3 IPs
> 2. One for the new IP
> The old stale metric will remain forever and will no longer be updated, 
> leading to confusing results in any tools that use the metrics for monitoring.
> This change, ensures we un-register the metrics when the logger channel is 
> closed and a new metrics object gets created when the new channel is created.
> I have added a small test to prove this, but also reproduced the original 
> issue on a docker cluster and validated it is resolved with this change in 
> place.
> For info, the logger metrics look like:
> {code}
> {
>"name" : "Hadoop:service=NameNode,name=IPCLoggerChannel-192.168.32.8-8485",
> "modelerType" : "IPCLoggerChannel-192.168.32.8-8485",
> "tag.Context" : "dfs",
> "tag.IsOutOfSync" : "false",
> "tag.Hostname" : "957e3e66f10b",
> "QueuedEditsSize" : 0,
> "LagTimeMillis" : 0,
> "CurrentLagTxns" : 0
>   }
> {code}
> Node the name includes the IP, rather than the hostname.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17237) Remove IPCLoggerChannel Metrics when the logger is closed

2023-10-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17237:
--
Labels: pull-request-available  (was: )

> Remove IPCLoggerChannel Metrics when the logger is closed
> -
>
> Key: HDFS-17237
> URL: https://issues.apache.org/jira/browse/HDFS-17237
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>
> When an IPCLoggerChannel is created (which is used to read from and write to 
> the Journal nodes) it also creates a metrics object. When the namenodes 
> failover, the IPC loggers are all closed and reopened in read mode on the new 
> SBNN or the read mode is closed on the SBNN and re-opened in write mode. The 
> closing frees the resources and discards the original IPCLoggerChannel object 
> and causes a new one to be created by the caller.
> If a Journal node was down and added back to the cluster with the same 
> hostname, but a different IP, when the failover happens, you end up with 4 
> metrics objects for the JNs:
> 1. For for each of the original 3 IPs
> 2. One for the new IP
> The old stale metric will remain forever and will no longer be updated, 
> leading to confusing results in any tools that use the metrics for monitoring.
> This change, ensures we un-register the metrics when the logger channel is 
> closed and a new metrics object gets created when the new channel is created.
> I have added a small test to prove this, but also reproduced the original 
> issue on a docker cluster and validated it is resolved with this change in 
> place.
> For info, the logger metrics look like:
> {code}
> {
>"name" : "Hadoop:service=NameNode,name=IPCLoggerChannel-192.168.32.8-8485",
> "modelerType" : "IPCLoggerChannel-192.168.32.8-8485",
> "tag.Context" : "dfs",
> "tag.IsOutOfSync" : "false",
> "tag.Hostname" : "957e3e66f10b",
> "QueuedEditsSize" : 0,
> "LagTimeMillis" : 0,
> "CurrentLagTxns" : 0
>   }
> {code}
> Node the name includes the IP, rather than the hostname.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17237) Remove IPCLoggerChannel Metrics when the logger is closed

2023-10-23 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-17237:
-
Summary: Remove IPCLoggerChannel Metrics when the logger is closed  (was: 
Remove IPCLogger Metrics when the logger is closed)

> Remove IPCLoggerChannel Metrics when the logger is closed
> -
>
> Key: HDFS-17237
> URL: https://issues.apache.org/jira/browse/HDFS-17237
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> When an IPCLoggerChannel is created (which is used to read from and write to 
> the Journal nodes) it also creates a metrics object. When the namenodes 
> failover, the IPC loggers are all closed and reopened in read mode on the new 
> SBNN or the read mode is closed on the SBNN and re-opened in write mode. The 
> closing frees the resources and discards the original IPCLoggerChannel object 
> and causes a new one to be created by the caller.
> If a Journal node was down and added back to the cluster with the same 
> hostname, but a different IP, when the failover happens, you end up with 4 
> metrics objects for the JNs:
> 1. For for each of the original 3 IPs
> 2. One for the new IP
> The old stale metric will remain forever and will no longer be updated, 
> leading to confusing results in any tools that use the metrics for monitoring.
> This change, ensures we un-register the metrics when the logger channel is 
> closed and a new metrics object gets created when the new channel is created.
> I have added a small test to prove this, but also reproduced the original 
> issue on a docker cluster and validated it is resolved with this change in 
> place.
> For info, the logger metrics look like:
> {code}
> {
>"name" : "Hadoop:service=NameNode,name=IPCLoggerChannel-192.168.32.8-8485",
> "modelerType" : "IPCLoggerChannel-192.168.32.8-8485",
> "tag.Context" : "dfs",
> "tag.IsOutOfSync" : "false",
> "tag.Hostname" : "957e3e66f10b",
> "QueuedEditsSize" : 0,
> "LagTimeMillis" : 0,
> "CurrentLagTxns" : 0
>   }
> {code}
> Node the name includes the IP, rather than the hostname.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org