[jira] [Comment Edited] (HDFS-14969) Fix HDFS client unnecessary failover log printing
[ https://issues.apache.org/jira/browse/HDFS-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972962#comment-16972962 ] Xudong Cao edited comment on HDFS-14969 at 11/13/19 2:34 AM: - cc [~xkrogen] [~vagarychen] [~shv] [~weichiu] I feel it's not good to remove the entire log. The more appropriate way is to update the logic to be aware of how many NNs are configured. We may need to add a new method to the FailoverProxyProvider interface such as getProxiesCount() and implement it in all subclasses. Then We can compare the current failover count and the total number of NNs in RetryInvocationHandler to determine whether to print the failover log. What do you think? However, after the HDFS-14963 is merged in the future, I feel that this problem will be greatly alleviated. was (Author: xudongcao): cc [~xkrogen] [~vagarychen] [~shv] [~weichiu] I feel it's not good to remove the entire log. The more appropriate way is to update the logic to be aware of how many NNs are configured. We may need to add a new method to the FailoverProxyProvider interface such as getProxiesCount() , and then implement it in all subclasses. What do you think? However, after the HDFS-14963 is merged in the future, I feel that this problem will be greatly alleviated. > Fix HDFS client unnecessary failover log printing > - > > Key: HDFS-14969 > URL: https://issues.apache.org/jira/browse/HDFS-14969 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.1.3 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Minor > > In multi-NameNodes scenario, suppose there are 3 NNs and the 3rd is ANN, and > then a client starts rpc with the 1st NN, it will be silent when failover > from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd > NN, it prints some unnecessary logs, in some scenarios, these logs will be > very numerous: > {code:java} > 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category READ is not supported in state standby. Visit > https://s.apache.org/sbnn-error > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459) > ...{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14969) Fix HDFS client unnecessary failover log printing
[ https://issues.apache.org/jira/browse/HDFS-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970371#comment-16970371 ] Erik Krogen edited comment on HDFS-14969 at 11/8/19 4:27 PM: - +1 on this. It has been an issue ever since the multiple SbNN feature was introduced in HDFS-6440. As we've started deploying this feature, we've been getting complaints from users -- any time their job fails, they think it is an infrastructure failure because they find these logs 😓There is hard-coded logic right now to skip printing the exception if it's the first StandbyException encountered, due to the assumption that there are only two NNs, so under a normal scenario you should only see at most one StandbyException. We should either remove this log entirely (downgrade to DEBUG), or update the logic to be aware of how many NNs are configured. was (Author: xkrogen): +1 on this. It has been an issue ever since the multiple SbNN feature was introduced in HDFS-6440. As we've started moving towards this, we've been getting complaints from users -- any time their job fails, they think it is an infrastructure failure because they find these logs 😓There is hard-coded logic right now to skip printing the exception if it's the first StandbyException encountered, due to the assumption that there are only two NNs, so under a normal scenario you should only see at most one StandbyException. We should either remove this log entirely (downgrade to DEBUG), or update the logic to be aware of how many NNs are configured. > Fix HDFS client unnecessary failover log printing > - > > Key: HDFS-14969 > URL: https://issues.apache.org/jira/browse/HDFS-14969 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.1.3 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Minor > > In multi-NameNodes scenario, suppose there are 3 NNs and the 3rd is ANN, and > then a client starts rpc with the 1st NN, it will be silent when failover > from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd > NN, it prints some unnecessary logs, in some scenarios, these logs will be > very numerous: > {code:java} > 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category READ is not supported in state standby. Visit > https://s.apache.org/sbnn-error > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459) > ...{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14969) Fix HDFS client unnecessary failover log printing
[ https://issues.apache.org/jira/browse/HDFS-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970371#comment-16970371 ] Erik Krogen edited comment on HDFS-14969 at 11/8/19 4:27 PM: - +1 on this. It has been an issue ever since the multiple SbNN feature was introduced in HDFS-6440. As we've started deploying this feature, we've been getting complaints from users -- any time their job fails, they think it is an infrastructure failure because they find these logs 😓There is hard-coded logic right now to skip printing the exception if it's the first StandbyException encountered, due to the assumption that there are only two NNs, so under a normal scenario you would only see at most one StandbyException. We should either remove this log entirely (downgrade to DEBUG), or update the logic to be aware of how many NNs are configured. was (Author: xkrogen): +1 on this. It has been an issue ever since the multiple SbNN feature was introduced in HDFS-6440. As we've started deploying this feature, we've been getting complaints from users -- any time their job fails, they think it is an infrastructure failure because they find these logs 😓There is hard-coded logic right now to skip printing the exception if it's the first StandbyException encountered, due to the assumption that there are only two NNs, so under a normal scenario you should only see at most one StandbyException. We should either remove this log entirely (downgrade to DEBUG), or update the logic to be aware of how many NNs are configured. > Fix HDFS client unnecessary failover log printing > - > > Key: HDFS-14969 > URL: https://issues.apache.org/jira/browse/HDFS-14969 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.1.3 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Minor > > In multi-NameNodes scenario, suppose there are 3 NNs and the 3rd is ANN, and > then a client starts rpc with the 1st NN, it will be silent when failover > from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd > NN, it prints some unnecessary logs, in some scenarios, these logs will be > very numerous: > {code:java} > 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category READ is not supported in state standby. Visit > https://s.apache.org/sbnn-error > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459) > ...{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org