Pierre Salagnac created SOLR-17107:
--------------------------------------

             Summary: Leader election is unpredictable if two threads join 
concurrently election of the same replica
                 Key: SOLR-17107
                 URL: https://issues.apache.org/jira/browse/SOLR-17107
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud
    Affects Versions: 9.3, 8.11
            Reporter: Pierre Salagnac


There is a race condition in leader election if two thread concurrently run the 
election for the same replica. This is not about how leader election is 
distributed across multiple Solr nodes, but how multiple threads in a single 
Solr node conflict with each other.
 
On the overall, when two threads (on the same server) concurrently join leader 
election for the same replica, the outcome is unpredictable. It may end in two 
nodes thinking they are the leader or not having any leader at all.
 
h2. How to reproduce


I identified two scenarios, but maybe there are more:
 
*1. Zookeeper session expires while an election is already in progress.*
When we re-create the Zookeeper session, we re-register all the cores, and join 
elections for all of them. If an election is already in-progress or is 
triggered for any reason, we can have two threads on the same Solr server node 
running leader election for the same core.
 
*2. Command REJOINLEADERELECTION is received twice concurrently for the same 
core.*
This scenario is much easier to reproduce with an external client. It occurs 
for us since we have customizations using this command.
 
h2. Full analysis

There are at least two issues in the current code.

*1. We blindly delete ZK nodes that were created by other threads*

Right after we created our ephemeral sequential ZK node to join the election 
queue, we check whether there are other ZK nodes for the same session ID (so 
the same Solr server). When some other nodes are found, we just deleted them 
but we don't stop the election for any of the thread. It is likely the two 
threads will think they won the election.

In addition, if two threads join the election concurrently, it is possible they 
both delete the sequential node of the other thread. At the end, no node remain 
in the queue. So if another node joins the election later, it will miss that 
there may be already a leader.

The fix for this issue would be to have one of the two threads that aborts the 
election, without deleting the node of the other thread.
The election process should be continued only by the thread with the smallest 
sequence number in the queue.

*2. Mutability around {{LeaderElector}} and contexts*

Another issue is any thread can change the context of {{LeaderElector}} 
instances. This can be done either by invoking {{setup()}} (mostly after ZK 
session expiration) or {{{}retryElection(){}}}.
When we change the context, the old one is closed, by we don't take into 
account what is the exact state of the election if another thread is currently 
joining with the old context. 
Not sure exactly what would be the fix for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to