[ https://issues.apache.org/jira/browse/HBASE-23881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046977#comment-17046977 ]
Josh Elser edited comment on HBASE-23881 at 2/27/20 9:16 PM: ------------------------------------------------------------- {noformat} 2020-02-27 16:03:51,668 INFO [Time-limited test] example.TestShadeSaslAuthenticationProvider$3(243): Caught exception org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=4, exceptions: 2020-02-27T21:03:51.026Z, RpcRetryingCaller{globalStartTime=1582837371011, pause=100, maxAttempts=4}, org.apache.hadoop.hbase.MasterNotRunningException: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to mizar.local/192.168.2.28:56690 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=0,methodName=IsMasterRunning], waitTime=60011, rpcTimeout=60000 2020-02-27T21:03:51.138Z, RpcRetryingCaller{globalStartTime=1582837371011, pause=100, maxAttempts=4}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Call to mizar.local/192.168.2.28:56690 failed on local exception: java.io.IOException: Connection reset by peer 2020-02-27T21:03:51.347Z, RpcRetryingCaller{globalStartTime=1582837371011, pause=100, maxAttempts=4}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Call to mizar.local/192.168.2.28:56690 failed on local exception: java.io.IOException: Connection reset by peer 2020-02-27T21:03:51.656Z, RpcRetryingCaller{globalStartTime=1582837371011, pause=100, maxAttempts=4}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Call to mizar.local/192.168.2.28:56690 failed on local exception: java.io.IOException: Connection reset by peer {noformat} So, when you run this test on branch-2 we see a different set of exceptions. This looks like the client is seeing the exception properly. Granted, it comes back as {{MasterNotRunningException}} instead of {{InvalidToken}}, but still the test fails as I'd expect. On Master, the server is definitely throwing an exception up the netty server callstack, but the client never gets it. I'm still trying to unwrap how Netty is supposed to be propagating the exception. NettyRpcConnection#saslNegotiate's operationComplete callback returns that the call was successful when it definitely should not be. Makes me a little worried we have an authentication problem on master with Netty (where Netty is the only RPC option). Not sure if you have any tips you could give me to help me debug this more, [~zhangduo] :) was (Author: elserj): {noformat} 2020-02-27 16:03:51,668 INFO [Time-limited test] example.TestShadeSaslAuthenticationProvider$3(243): Caught exception org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=4, exceptions: 2020-02-27T21:03:51.026Z, RpcRetryingCaller{globalStartTime=1582837371011, pause=100, maxAttempts=4}, org.apache.hadoop.hbase.MasterNotRunningException: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to mizar.local/192.168.2.28:56690 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=0,methodName=IsMasterRunning], waitTime=60011, rpcTimeout=60000 2020-02-27T21:03:51.138Z, RpcRetryingCaller{globalStartTime=1582837371011, pause=100, maxAttempts=4}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Call to mizar.local/192.168.2.28:56690 failed on local exception: java.io.IOException: Connection reset by peer 2020-02-27T21:03:51.347Z, RpcRetryingCaller{globalStartTime=1582837371011, pause=100, maxAttempts=4}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Call to mizar.local/192.168.2.28:56690 failed on local exception: java.io.IOException: Connection reset by peer 2020-02-27T21:03:51.656Z, RpcRetryingCaller{globalStartTime=1582837371011, pause=100, maxAttempts=4}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Call to mizar.local/192.168.2.28:56690 failed on local exception: java.io.IOException: Connection reset by peer {noformat} So, when you run this test on branch-2 we see a different set of exceptions. This looks like the client is seeing the exception properly. Granted, it comes back as {{MasterNotRunningException}} instead of {{InvalidToken}}, but still the test fails as I'd expect. On Master, the server is definitely throwing an exception up the netty server callstack, but the client never gets it. I'm still trying to unwrap how Netty is supposed to be propagating the exception. NettyRpcConnection#saslNegotiate's operationComplete callback returns that the call was successful when it definitely should not be. Makes me a little worried we have an authentication problem on master with Netty (where Netty is the only RPC option). Not sure if you have any tips you could give me, [~zhangduo] :) > TestShadeSaslAuthenticationProvider failures > -------------------------------------------- > > Key: HBASE-23881 > URL: https://issues.apache.org/jira/browse/HBASE-23881 > Project: HBase > Issue Type: Bug > Components: test > Affects Versions: 3.0.0, 2.3.0 > Reporter: Bharath Vissapragada > Assignee: Josh Elser > Priority: Major > > TestShadeSaslAuthenticationProvider now fails deterministically with the > following exception.. > {noformat} > java.lang.Exception: Unexpected exception, > expected<org.apache.hadoop.hbase.DoNotRetryIOException> but > was<java.io.IOException> > at > org.apache.hadoop.hbase.security.provider.example.TestShadeSaslAuthenticationProvider.testNegativeAuthentication(TestShadeSaslAuthenticationProvider.java:233) > {noformat} > The test now fails a different place than before merging HBASE-18095 because > the RPCs are also a part of connection setup. We might need to rewrite the > test.. -- This message was sent by Atlassian Jira (v8.3.4#803005)