[ https://issues.apache.org/jira/browse/CURATOR-518?focusedWorklogId=849096&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-849096 ]
ASF GitHub Bot logged work on CURATOR-518: ------------------------------------------ Author: ASF GitHub Bot Created on: 04/Mar/23 10:07 Start Date: 04/Mar/23 10:07 Worklog Time Spent: 10m Work Description: tisonkun commented on code in PR #446: URL: https://github.com/apache/curator/pull/446#discussion_r1125432578 ########## curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderSelector.java: ########## @@ -379,10 +423,31 @@ public boolean hasLeadership() */ public synchronized void interruptLeadership() { - Future<?> task = ourTask.get(); - if ( task != null ) + // Correctness with requeue: + // * requeue, interrupt and ourTask are guarded by synchronized(this), so requeue is not + // able to execute interleaving. + // * hasLeadership is true, so we are not cancelling before task execution. + // * auto-requeue is always executed before task termination and has no interruption point. + // * it is harmless to interrupt nearly completed task(e.g. lost or release leadership). + if ( ourTask != null && hasLeadership ) { - task.cancel(true); + ourTask.cancel(true); + } + } + + /** + * Cancel ongoing election and leadership. + * + * <p>Subsequent {@link #requeue()} will succeed if {@link #autoRequeue()} is not enabled. + */ + private synchronized void cancelElection() { + if ( ourTask != null ) { + ourTask.cancel(true); + isQueued = false; Review Comment: ```suggestion clearIsQueued(); ``` Keep consistent. ########## curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderSelector.java: ########## @@ -237,13 +242,15 @@ private synchronized boolean internalRequeue() if ( !isQueued && (state.get() == State.STARTED) ) { isQueued = true; - Future<Void> task = executorService.submit(new Callable<Void>() + Future<?> oldTask = ourTask; + ourTask = executorService.submit(new Callable<Void>() { @Override public Void call() throws Exception { try { + waitTaskDone(oldTask, Duration.ofMillis(500), Duration.ofSeconds(5)); Review Comment: It seems this is a best-effort waiting call and we don't have happens-before relation to ensure that the previous task must be done or cancelled. Filter only `hasLeadership` will proceed `interruptLeadership` seems enough for fixing 518? Issue Time Tracking ------------------- Worklog Id: (was: 849096) Time Spent: 0.5h (was: 20m) > Curator. LeaderSelector. Two successive calls to interruptLeadership() will > break autoRequeue. > ---------------------------------------------------------------------------------------------- > > Key: CURATOR-518 > URL: https://issues.apache.org/jira/browse/CURATOR-518 > Project: Apache Curator > Issue Type: Improvement > Components: Recipes > Affects Versions: 4.0.1, 4.2.0 > Environment: Windows 8, JRE 1.8.0_181 > Reporter: Bulatov Oleg > Priority: Major > Labels: newbie > Time Spent: 0.5h > Remaining Estimate: 0h > > h1. Curator. LeaderSelector. Two successive calls to interruptLeadership() > will break autoRequeue > If we set autoRequeue to TRUE. But during execution interruptLeadership() > will be called from another thread before internalRequeue() completed its > work. Then it will break recursive call to internalRequeue(), so that client > will not ask for leadership and get stuck. > We can solve this problem if we check hasLeadership() before calling > interruptLeadership(). But it is strange that such check curator library does > not do internally. -- This message was sent by Atlassian Jira (v8.20.10#820010)