[jira] [Comment Edited] (HDFS-14969) Fix HDFS client unnecessary failover log printing

2019-11-12 Thread Xudong Cao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972962#comment-16972962
 ] 

Xudong Cao edited comment on HDFS-14969 at 11/13/19 2:34 AM:
-

cc [~xkrogen] [~vagarychen]  [~shv] [~weichiu] I feel it's not good to remove 
the entire log. The more appropriate way is to update the logic to be aware of 
how many NNs are configured. We may need to add a new method to the 
FailoverProxyProvider interface such as getProxiesCount() and implement it in 
all subclasses. Then We can compare the current failover count and the total 
number of NNs in RetryInvocationHandler to determine whether to print the 
failover log. What do you think?

However, after the HDFS-14963 is merged in the future, I feel that this problem 
will be greatly alleviated.


was (Author: xudongcao):
cc [~xkrogen] [~vagarychen]  [~shv] [~weichiu] I feel it's not good to remove 
the entire log. The more appropriate way is to update the logic to be aware of 
how many NNs are configured. We may need to add a new method to the 
FailoverProxyProvider interface such as getProxiesCount() , and then implement 
it in all subclasses. What do you think?

However, after the HDFS-14963 is merged in the future, I feel that this problem 
will be greatly alleviated.

> Fix HDFS client unnecessary failover log printing
> -
>
> Key: HDFS-14969
> URL: https://issues.apache.org/jira/browse/HDFS-14969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.1.3
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>
> In multi-NameNodes scenario, suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
>  ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14969) Fix HDFS client unnecessary failover log printing

2019-11-08 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970371#comment-16970371
 ] 

Erik Krogen edited comment on HDFS-14969 at 11/8/19 4:27 PM:
-

+1 on this. It has been an issue ever since the multiple SbNN feature was 
introduced in HDFS-6440. As we've started deploying this feature, we've been 
getting complaints from users -- any time their job fails, they think it is an 
infrastructure failure because they find these logs 😓There is hard-coded logic 
right now to skip printing the exception if it's the first StandbyException 
encountered, due to the assumption that there are only two NNs, so under a 
normal scenario you should only see at most one StandbyException. We should 
either remove this log entirely (downgrade to DEBUG), or update the logic to be 
aware of how many NNs are configured.


was (Author: xkrogen):
+1 on this. It has been an issue ever since the multiple SbNN feature was 
introduced in HDFS-6440. As we've started moving towards this, we've been 
getting complaints from users -- any time their job fails, they think it is an 
infrastructure failure because they find these logs 😓There is hard-coded logic 
right now to skip printing the exception if it's the first StandbyException 
encountered, due to the assumption that there are only two NNs, so under a 
normal scenario you should only see at most one StandbyException. We should 
either remove this log entirely (downgrade to DEBUG), or update the logic to be 
aware of how many NNs are configured.

> Fix HDFS client unnecessary failover log printing
> -
>
> Key: HDFS-14969
> URL: https://issues.apache.org/jira/browse/HDFS-14969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.1.3
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>
> In multi-NameNodes scenario, suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
>  ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14969) Fix HDFS client unnecessary failover log printing

2019-11-08 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16970371#comment-16970371
 ] 

Erik Krogen edited comment on HDFS-14969 at 11/8/19 4:27 PM:
-

+1 on this. It has been an issue ever since the multiple SbNN feature was 
introduced in HDFS-6440. As we've started deploying this feature, we've been 
getting complaints from users -- any time their job fails, they think it is an 
infrastructure failure because they find these logs 😓There is hard-coded logic 
right now to skip printing the exception if it's the first StandbyException 
encountered, due to the assumption that there are only two NNs, so under a 
normal scenario you would only see at most one StandbyException. We should 
either remove this log entirely (downgrade to DEBUG), or update the logic to be 
aware of how many NNs are configured.


was (Author: xkrogen):
+1 on this. It has been an issue ever since the multiple SbNN feature was 
introduced in HDFS-6440. As we've started deploying this feature, we've been 
getting complaints from users -- any time their job fails, they think it is an 
infrastructure failure because they find these logs 😓There is hard-coded logic 
right now to skip printing the exception if it's the first StandbyException 
encountered, due to the assumption that there are only two NNs, so under a 
normal scenario you should only see at most one StandbyException. We should 
either remove this log entirely (downgrade to DEBUG), or update the logic to be 
aware of how many NNs are configured.

> Fix HDFS client unnecessary failover log printing
> -
>
> Key: HDFS-14969
> URL: https://issues.apache.org/jira/browse/HDFS-14969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.1.3
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>
> In multi-NameNodes scenario, suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
>  ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org