[ https://issues.apache.org/jira/browse/ZOOKEEPER-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ZOOKEEPER-4275: -------------------------------------- Labels: pull-request-available (was: ) > Slowness in sasl login or subject.doAs() causes zk client to falsely assume > that the server did not respond, closes connection and goes to unnecessary > retries > -------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: ZOOKEEPER-4275 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4275 > Project: ZooKeeper > Issue Type: Bug > Components: java client > Affects Versions: 3.5.9 > Reporter: Ravi Kishore Valeti > Priority: Minor > Labels: pull-request-available > Fix For: 3.5.10 > > Time Spent: 10m > Remaining Estimate: 0h > > Zookeeper client does sasl auth (login and subject.doAs())as a preset before > attempting a connection to server. > If there is a delay in sasl auth (possibly due to slow Kerberos > communication), ZK client falsely assumes that the zk server did not respond > and runs in to unnecessary multiple retries. > Client configuration: > "zookeeper.session.timeout" = "3000" > "zookeeper.recovery.retry" = "1" > "zookeeper.recovery.retry.intervalmill" = "500" > This configuration translates to > connect timeout as 1000ms > Read Timeout as 2000ms > Example: There was a 3 second delay in logging in the user as seen from the > logs below. The connection attempt was made later. However, zk client did not > wait for server response but logged a timeout (3 seconds > 1 sec connect > timeout), closed the connection and went to retries. Since there was a > consistent delay at Kerberos master, we had seen this retries go as long as > 10 mins causing requests to timeout/fail. > Logs: > 3/23/21 4:15:*32.389* AM jute.maxbuffer value is xxxxx Bytes > 3/23/21 4:15:*35.395* AM Client successfully logged in. > 3/23/21 4:15:35.396 AM TGT refresh sleeping until: Wed Mar 24 00:34:31 GMT > 2021 > 3/23/21 4:15:35.396 AM TGT refresh thread started. > 3/23/21 4:15:35.396 AM Client will use GSSAPI as SASL mechanism. > 3/23/21 4:15:35.396 AM TGT expires: xxx Mar xx 04:15:35 GMT > 2021 > 3/23/21 4:15:35.396 AM TGT valid starting at: xxx Mar xx 04:15:35 GMT > 2021 > 3/23/21 4:15:*35.397* AM *Opening socket connection* to server xxxxx:2181. > Will attempt to SASL-authenticate using Login Context section 'Client' > 3/23/21 4:15:*35.397* AM *Client session timed out, have not heard from > server in* *3008ms* for sessionid 0x0 > 3/23/21 4:15:35.397 AM Client session timed out, have not heard from server > in 3008ms for sessionid 0x0, closing socket connection and attempting > reconnect > 3/23/21 4:15:35.498 AM TGT renewal thread has been interrupted and will exit. > 3/23/21 4:15:38.503 AM Client successfully logged in. > 3/23/21 4:15:38.503 AM TGT expires: xxx Mar xx 04:15:38 GMT > 2021 > 3/23/21 4:15:38.503 AM Client will use GSSAPI as SASL mechanism. > 3/23/21 4:15:38.503 AM TGT valid starting at: xxx Mar xx 04:15:38 GMT > 2021 > 3/23/21 4:15:38.503 AM TGT refresh thread started. > 3/23/21 4:15:38.503 AM TGT refresh sleeping until: Wed Mar 24 00:10:10 GMT > 2021 > 3/23/21 4:15:38.506 AM Opening socket connection to server xxxxx:2181. Will > attempt to SASL-authenticate using Login Context section 'Client' > 3/23/21 4:15:38.506 AM Client session timed out, have not heard from server > in 3009ms for sessionid 0x0, closing socket connection and attempting > reconnect > 3/23/21 4:15:38.506 AM Client session timed out, have not heard from server > in 3009ms for sessionid 0x0 > 3/23/21 4:15:38.606 AM TGT renewal thread has been interrupted and will exit. > 3/23/21 4:15:41.610 AM Client successfully logged in. > 3/23/21 4:15:41.611 AM TGT refresh sleeping until: xxx Mar xx 23:42:03 GMT > 2021 > 3/23/21 4:15:41.611 AM Client will use GSSAPI as SASL mechanism. > 3/23/21 4:15:41.611 AM TGT valid starting at: xxx Mar xx 04:15:41 GMT > 2021 > 3/23/21 4:15:41.611 AM TGT expires: xxx Mar xx 04:15:41 GMT > 2021 > 3/23/21 4:15:41.611 AM TGT refresh thread started. > 3/23/21 4:15:41.612 AM Opening socket connection to server xxxxx:2181. Will > attempt to SASL-authenticate using Login Context section 'Client' > 3/23/21 4:15:41.613 AM Client session timed out, have not heard from server > in 3006ms for sessionid 0x0 > 3/23/21 4:15:41.613 AM Client session timed out, have not heard from server > in 3006ms for sessionid 0x0, closing socket connection and attempting > reconnect -- This message was sent by Atlassian Jira (v8.3.4#803005)