Patson Luk created SOLR-16257:
---------------------------------

             Summary: Possible Race condition in ZkStateReader between fields 
watchedCollectionStates and collectionWatches
                 Key: SOLR-16257
                 URL: https://issues.apache.org/jira/browse/SOLR-16257
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud
    Affects Versions: 8.8, main (10.0)
            Reporter: Patson Luk


We have observed certain weird behavior with our Solr that the cluster state 
for certain collections seem to be "stuck" in certain stale states. While 
inspecting the current code logic, it's found that certain race condition can 
arise and lead to inconsistent states of `collectionWatches` and 
`watchedCollectionStates`

 

By looking at the current design, it appears that those 2 fields:
 * {{collectionWatches -}} for "watching" ZK updates on collections (via 
notification). Such map has the collection name (ie org) as the key and value 
is {{CollectionWatch}} which holds another set of {{{}DocCollectionWatcher{}}}s 
in this case (ie {{{}ConcurrentHashMap<String, 
CollectionWatch<DocCollectionWatcher>> collectionWatches{}}})
 * {{watchedCollectionStates}} - as {{{}ConcurrentHashMap<String, 
DocCollection>{}}}, which key is collection name and value is the 
{{DocCollection}} value triggered by previous watch event handled by 
{{ZkStateReader$StateWatcher}} (either by the fetch on first watcher 
registration or by notification from ZK on state changes)

On the design level, such 2 fields should be "in-sync" , ie if a collection is 
no longer tracked in {{collectionWatches}} then it should not have any entry in 
{{watchedCollectionStates}} neither.

 

However such guarantee is not a strong one as we can see there are various code 
pieces that try to remove entries from {{watchedCollectionStates}} if somehow 
the collection is no longer in {{collectionWatches}} for example in 
[here|https://github.com/fullstorydev/lucene-solr/blob/b8a2e1574d71094fcf81e6d693634b9ced8f4956/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L2137]
 , in particular this appears to address certain race condition with 
{{{}unregisterCore{}}}. The code that removes registered DocCollectionWatcher 
also tries to ensure both maps are in sync as in 
[here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1916]

*The core of the issue appears to be that there's an assumption when the last 
{{DocCollectionWatcher}} is removed from the {{{}CollectionWatch{}}}, both the 
{{watchedCollectionStates}} and {{collectionWatches}} should be purged of the 
watched collection.* Hence the {{clusterState}} should have the 
{{LazyCollectionRef}} instead, which {{DocCollection get(boolean allowCached)}} 
invocation should return the correct {{DocCollection}} state on 
{{{}allowCached=false{}}}. Unfortunately, there could still be possible race 
condition as far as there are 2 separate maps. One possible race condition is 
demonstrated in later section

h3. PR proposal
 An idea is to Merge {{DocCollectionWater}} into  {{CollectionWatch}} so we 
would not run into race condition that the collection key appears in one but 
not the other. Please see the PR here (link to be updated) and would love to 
gather some feedbacks and thoughts!


 
h3. Steps to reproduce a race condition
Spin up 2 nodes, only one node should serve the test collection, the other node 
should be made the overseer/leader of the cluster. Debug on the overseer node

 # From the IDE, ensure all breakpoints suspend "thread" only, not "all" 
(that's for intelliJ)
 # All breakpoints are in {{ZkStateReader}}. Set breakpoint at line 
[here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1965],
 put condition {{Thread.currentThread().getName().startsWith("zkCallback")}} . 
This pauses when zk notification comes in and right before it checks whether 
the collection is already in {{watchedCollectionStates}} in method 
{{updateWatchedCollection}}
 # Set breakpoint at 
[here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1829],
 which throws {{TimeoutException}} after the latch timeout, for example from 
call of  (inside {{{}ZkStateWatcher#waitForState{}}})
 # now start debugging the overseer node
 # Issue a split shard request to the overseer. For example 
http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=test&shard=shard1
 # Eventually a thread will stop the first breakpoint within 
{{updateWatchedCollection}}, the thread name should be something like 
{{zkCallback-127-thread-2}}
 # Might have to wait for 320 secs (timeout on 
{{CollectionHandlingUtils.waitForNewShard}}), another thread should stop at 2nd 
breakpoint throwing {{TimeoutException}}, the thread name should be something 
similar to {{OverseerThreadFactory...}}
 # Add breakpoint at 
{{removeDocCollectionWatcher}}[here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1925],
 that removes the entry from {{watchedCollectionStates}} but not yet purge the 
{{collectionWatches}} 
 # Resume this {{OverseerThreadFactory}} thread that's going to throw 
{{TimeoutException}}, it should stop the new breakpoint in 
{{removeDocCollectionWatcher}}.
 # Switch back to thread {{zkCallback...}}, resume that thread, it should find 
that the {{watchedCollectionStates}} is empty, hence trying to insert a 
{{CollectionRef}} into it
 # Switch back to {{OverseerThreadFactory}} thread, step a few steps, it will 
purge the {{collectionWatches}} but the {{watchedCollectionStates}} will still 
have one entry in it.
 # If we inspect the {{zkStateReader.clusterState}} at this point, we will 
notice that the collection will have a non lazy CollectionRef for the test 
collection, while the {{collectionWatches}} is empty but 
{{watchedCollectionStates}} would still have the collection state in there

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to