[ https://issues.apache.org/jira/browse/HBASE-26022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368577#comment-17368577 ]
zhuobin zheng commented on HBASE-26022: --------------------------------------- In *master branch*, it seem like RpcClient will dynamic generate server principal before create saslClient everyTime. So, it's not a problem. But it seems to be a problem too in branch-1. I will try to fix it latter. > DNS jitter causes hbase client to get stuck > ------------------------------------------- > > Key: HBASE-26022 > URL: https://issues.apache.org/jira/browse/HBASE-26022 > Project: HBase > Issue Type: Bug > Affects Versions: 1.2.0 > Reporter: zhuobin zheng > Assignee: zhuobin zheng > Priority: Major > > In our product hbase cluster, we occasionally encounter below errors, and > stuck hbase a long time. Then hbase requests to this machine will fail > forever. > {code:java} > WARN org.apache.hadoop.security.UserGroupInformation: > PriviledgedActionException as:${user@realm} (auth:KERBEROS) > cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Server not > found in Kerberos database (7) - LOOKING_UP_SERVER)] > WARN org.apache.hadoop.security.UserGroupInformation: > PriviledgedActionException as:${user@realm} (auth:KERBEROS) > cause:java.io.IOException: Couldn't setup connection for ${user@realm} to > hbase/${ip}@realm > {code} > The main problem is the trully server principal we generated in KDC is > hbase/*${hostname}*@realm, so we must can't find hbase/*${ip}*@realm in KDC. > When RpcClientImpl#Connection construct, the field serverPrincial which never > changed generated by method InetAddress.getCanonicalHostName() which will > return IP when failed to get hostname. > Therefor, once DNS jitter when RpcClientImpl#Connection, this connection will > never setup sasl env. And I'm not see connection abandon logic in sasl failed > code path. > I think of two solutions to this problem: > # Abandon connection when sasl failed. So next request will reconstruct a > connection, and will regenerate a new server principal. > # Refresh serverPrincial field when sasl failed. So next retry will use new > server principal. > HBase Version: 1.2.0-cdh5.14.4 -- This message was sent by Atlassian Jira (v8.3.4#803005)