[ 
https://issues.apache.org/jira/browse/CURATOR-205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kezhu Wang closed CURATOR-205.
------------------------------
    Resolution: Duplicate

> Repeated InterruptedExceptions during mutex acquire leads to LeaderSelector 
> deadlock
> ------------------------------------------------------------------------------------
>
>                 Key: CURATOR-205
>                 URL: https://issues.apache.org/jira/browse/CURATOR-205
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 2.8.0
>            Reporter: Stephen Ingram
>            Priority: Major
>
> When an InterruptedException is thrown during the internalLockLoop that is 
> called during mutex.acquire, internalLockLoop will set a flag "doDelete" 
> which signals during a finally clause to delete the lock path that we are 
> trying to create.
> However, in the pathInForeground function of DeleteBuilderImpl, a _second_ 
> InterruptedException may occur before zookeeper can delete the specified 
> path.  The RetryLoop machinery contained in the function will only retry if 
> it is a Retryable Exception, an equivalence class which does not include 
> InterruptedExceptions.  
> The second InterruptedException exception then causes an exit of the 
> pathInForeground function without deleting the path, leading to a deadlock 
> where no one can acquire the mutex.
> In my test, I am certain that both of these InterruptedExceptions are due to 
> repeated fluctuation in the ConnectionStateManager's connection state.  When 
> the state ceases to fluctuate, no leader can be selected due to the 
> persistence of the node we failed to delete.
> I was able to address this bug with a solution similar to CURATOR-45:  if the 
> pathInForeground function is interrupted with an InterruptedException, I 
> schedule a BackgroundCallback to attempt pathInForeground again.  This task 
> is able to delete the path when the connection is stable and the mutex is 
> acquired by the new leader.
> I have a repro and a fix if needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to