[
https://issues.apache.org/jira/browse/ZOOKEEPER-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987313#comment-17987313
]
zhanglu153 commented on ZOOKEEPER-4946:
---------------------------------------
Call org.apache.zookeeper.client.ZooKeeperSaslClient#shutdown method, which is
the shutdown Login thread.
{code:java}
public void shutdown() {
if ((t != null) && (t.isAlive())) {
t.interrupt();
try {
t.join();
} catch (InterruptedException e) {
LOG.warn("error while waiting for Login thread to shutdown: " + e);
}
}
} {code}
After calling t.interrupt, if the Login thread is not executing the
Thread.sleep method, other code blocks cannot perceive that the thread has been
interrupted and still ger stuck in a dead loop. This will cause the t.join
method to consistently block the sendThread.
For example, when there is a failure in kerberos, the reLogin method call
throws an exception. After retry has been set to 0, the reLogin method will be
repeatedly called until successful. If the t.interrupt method is called at this
time, the Login thread cannot be successfully interrupted. t.join method will
block the sendThread method.
{code:java}
try {
int retry = 1;
while (retry >= 0) {
try {
reLogin();
break;
} catch (LoginException le) {
if (retry > 0) {
--retry;
// sleep for 10 seconds.
try {
sleepBeforeRetryFailedRefresh();
} catch (InterruptedException e) {
LOG.error("Interrupted during login retry after
LoginException:", le);
throw le;
}
} else {
LOG.error("Could not refresh TGT for principal: {}.",
principal, le);
}
}
}
} catch (LoginException le) {
LOG.error("Failed to refresh TGT: refresh thread exiting now.", le);
break;
}{code}
> Login thread failed to shutdown successfully, causing SendThead to be blocked
> -----------------------------------------------------------------------------
>
> Key: ZOOKEEPER-4946
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4946
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.5.10, 3.6.4, 3.7.2, 3.8.4, 3.9.3
> Reporter: zhanglu153
> Priority: Critical
>
> Call org.apache.zookeeper.client.ZooKeeperSaslClient#shutdown method in
> sendThread to close the zooKeeperSaslClient, that is, shutdown Login thread.
> Although the t.interrupt method was called, the run method of the Login
> thread did not detect the thread being interrupted. For example, when the
> Login thread enters the while loop of reLogin, there may be a situation where
> the Login thread cannot interrupt.
> This will cause the t.join method to remain blocked, resulting in the
> sendThread thread being blocked and unabled to complete execution. This may
> result in many zk requests being unable to be released, such as possible
> deadlocks, due to sendThread being blocked in lower versions of zk, such as
> 3.5 and 3.6. Higher versions of zk, such as 3.8 and 3.9, have leaked
> sendThread threads.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)