[ 
https://issues.apache.org/jira/browse/SOLR-16257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patson Luk updated SOLR-16257:
------------------------------
    Description: 
We have observed certain weird behavior with our Solr that the cluster state 
for certain collections seem to be "stuck" in certain stale states. While 
inspecting the current code logic, it's found that certain race condition can 
arise and lead to inconsistent states of `collectionWatches` and 
`watchedCollectionStates`

 

By looking at the current design, it appears that those 2 fields:
 * {{collectionWatches -}} for "watching" ZK updates on collections (via 
notification). Such map has the collection name (ie org) as the key and value 
is {{CollectionWatch}} which holds another set of {{{}DocCollectionWatcher{}}}s 
in this case (ie {{{}ConcurrentHashMap<String, 
CollectionWatch<DocCollectionWatcher>> collectionWatches{}}})
 * {{watchedCollectionStates}} - as {{{}ConcurrentHashMap<String, 
DocCollection>{}}}, which key is collection name and value is the 
{{DocCollection}} value triggered by previous watch event handled by 
{{ZkStateReader$StateWatcher}} (either by the fetch on first watcher 
registration or by notification from ZK on state changes)

On the design level, such 2 fields should be "in-sync" , ie if a collection is 
no longer tracked in {{collectionWatches}} then it should not have any entry in 
{{watchedCollectionStates}} neither.

 

However such guarantee is not a strong one as we can see there are various code 
pieces that try to remove entries from {{watchedCollectionStates}} if somehow 
the collection is no longer in {{collectionWatches}} for example in 
[here|https://github.com/fullstorydev/lucene-solr/blob/b8a2e1574d71094fcf81e6d693634b9ced8f4956/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L2137]
 , in particular this appears to address certain race condition with 
{{{}unregisterCore{}}}. The code that removes registered DocCollectionWatcher 
also tries to ensure both maps are in sync as in 
[here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1916]

*The core of the issue appears to be that there's an assumption when the last 
{{DocCollectionWatcher}} is removed from the {{{}CollectionWatch{}}}, both the 
{{watchedCollectionStates}} and {{collectionWatches}} should be purged of the 
watched collection.* Hence the {{clusterState}} should have the 
{{LazyCollectionRef}} instead, which {{DocCollection get(boolean allowCached)}} 
invocation should return the correct {{DocCollection}} state on 
{{{}allowCached=false{}}}. Unfortunately, there could still be possible race 
condition as far as there are 2 separate maps. One possible race condition is 
demonstrated in later section

h3. PR proposal
 An idea is to Merge {{DocCollectionWater}} into  {{CollectionWatch}} so we 
would not run into race condition that the collection key appears in one but 
not the other. Please see the PR here https://github.com/apache/solr/pull/909 
and would love to gather some feedbacks and thoughts!


 
h3. Steps to reproduce a race condition
Spin up 2 nodes, only one node should serve the test collection, the other node 
should be made the overseer/leader of the cluster. Debug on the overseer node

 # From the IDE, ensure all breakpoints suspend "thread" only, not "all" 
(that's for intelliJ)
 # All breakpoints are in {{ZkStateReader}}. Set breakpoint at line 
[here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1965],
 put condition {{Thread.currentThread().getName().startsWith("zkCallback")}} . 
This pauses when zk notification comes in and right before it checks whether 
the collection is already in {{watchedCollectionStates}} in method 
{{updateWatchedCollection}}
 # Set breakpoint at 
[here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1829],
 which throws {{TimeoutException}} after the latch timeout, for example from 
call of  (inside {{{}ZkStateWatcher#waitForState{}}})
 # now start debugging the overseer node
 # Issue a split shard request to the overseer. For example 
http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=test&shard=shard1
 # Eventually a thread will stop the first breakpoint within 
{{updateWatchedCollection}}, the thread name should be something like 
{{zkCallback-127-thread-2}}
 # Might have to wait for 320 secs (timeout on 
{{CollectionHandlingUtils.waitForNewShard}}), another thread should stop at 2nd 
breakpoint throwing {{TimeoutException}}, the thread name should be something 
similar to {{OverseerThreadFactory...}}
 # Add breakpoint at 
{{removeDocCollectionWatcher}}[here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1925],
 that removes the entry from {{watchedCollectionStates}} but not yet purge the 
{{collectionWatches}} 
 # Resume this {{OverseerThreadFactory}} thread that's going to throw 
{{TimeoutException}}, it should stop the new breakpoint in 
{{removeDocCollectionWatcher}}.
 # Switch back to thread {{zkCallback...}}, resume that thread, it should find 
that the {{watchedCollectionStates}} is empty, hence trying to insert a 
{{CollectionRef}} into it
 # Switch back to {{OverseerThreadFactory}} thread, step a few steps, it will 
purge the {{collectionWatches}} but the {{watchedCollectionStates}} will still 
have one entry in it.
 # If we inspect the {{zkStateReader.clusterState}} at this point, we will 
notice that the collection will have a non lazy CollectionRef for the test 
collection, while the {{collectionWatches}} is empty but 
{{watchedCollectionStates}} would still have the collection state in there

 

  was:
We have observed certain weird behavior with our Solr that the cluster state 
for certain collections seem to be "stuck" in certain stale states. While 
inspecting the current code logic, it's found that certain race condition can 
arise and lead to inconsistent states of `collectionWatches` and 
`watchedCollectionStates`

 

By looking at the current design, it appears that those 2 fields:
 * {{collectionWatches -}} for "watching" ZK updates on collections (via 
notification). Such map has the collection name (ie org) as the key and value 
is {{CollectionWatch}} which holds another set of {{{}DocCollectionWatcher{}}}s 
in this case (ie {{{}ConcurrentHashMap<String, 
CollectionWatch<DocCollectionWatcher>> collectionWatches{}}})
 * {{watchedCollectionStates}} - as {{{}ConcurrentHashMap<String, 
DocCollection>{}}}, which key is collection name and value is the 
{{DocCollection}} value triggered by previous watch event handled by 
{{ZkStateReader$StateWatcher}} (either by the fetch on first watcher 
registration or by notification from ZK on state changes)

On the design level, such 2 fields should be "in-sync" , ie if a collection is 
no longer tracked in {{collectionWatches}} then it should not have any entry in 
{{watchedCollectionStates}} neither.

 

However such guarantee is not a strong one as we can see there are various code 
pieces that try to remove entries from {{watchedCollectionStates}} if somehow 
the collection is no longer in {{collectionWatches}} for example in 
[here|https://github.com/fullstorydev/lucene-solr/blob/b8a2e1574d71094fcf81e6d693634b9ced8f4956/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L2137]
 , in particular this appears to address certain race condition with 
{{{}unregisterCore{}}}. The code that removes registered DocCollectionWatcher 
also tries to ensure both maps are in sync as in 
[here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1916]

*The core of the issue appears to be that there's an assumption when the last 
{{DocCollectionWatcher}} is removed from the {{{}CollectionWatch{}}}, both the 
{{watchedCollectionStates}} and {{collectionWatches}} should be purged of the 
watched collection.* Hence the {{clusterState}} should have the 
{{LazyCollectionRef}} instead, which {{DocCollection get(boolean allowCached)}} 
invocation should return the correct {{DocCollection}} state on 
{{{}allowCached=false{}}}. Unfortunately, there could still be possible race 
condition as far as there are 2 separate maps. One possible race condition is 
demonstrated in later section

h3. PR proposal
 An idea is to Merge {{DocCollectionWater}} into  {{CollectionWatch}} so we 
would not run into race condition that the collection key appears in one but 
not the other. Please see the PR here (link to be updated) and would love to 
gather some feedbacks and thoughts!


 
h3. Steps to reproduce a race condition
Spin up 2 nodes, only one node should serve the test collection, the other node 
should be made the overseer/leader of the cluster. Debug on the overseer node

 # From the IDE, ensure all breakpoints suspend "thread" only, not "all" 
(that's for intelliJ)
 # All breakpoints are in {{ZkStateReader}}. Set breakpoint at line 
[here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1965],
 put condition {{Thread.currentThread().getName().startsWith("zkCallback")}} . 
This pauses when zk notification comes in and right before it checks whether 
the collection is already in {{watchedCollectionStates}} in method 
{{updateWatchedCollection}}
 # Set breakpoint at 
[here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1829],
 which throws {{TimeoutException}} after the latch timeout, for example from 
call of  (inside {{{}ZkStateWatcher#waitForState{}}})
 # now start debugging the overseer node
 # Issue a split shard request to the overseer. For example 
http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=test&shard=shard1
 # Eventually a thread will stop the first breakpoint within 
{{updateWatchedCollection}}, the thread name should be something like 
{{zkCallback-127-thread-2}}
 # Might have to wait for 320 secs (timeout on 
{{CollectionHandlingUtils.waitForNewShard}}), another thread should stop at 2nd 
breakpoint throwing {{TimeoutException}}, the thread name should be something 
similar to {{OverseerThreadFactory...}}
 # Add breakpoint at 
{{removeDocCollectionWatcher}}[here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1925],
 that removes the entry from {{watchedCollectionStates}} but not yet purge the 
{{collectionWatches}} 
 # Resume this {{OverseerThreadFactory}} thread that's going to throw 
{{TimeoutException}}, it should stop the new breakpoint in 
{{removeDocCollectionWatcher}}.
 # Switch back to thread {{zkCallback...}}, resume that thread, it should find 
that the {{watchedCollectionStates}} is empty, hence trying to insert a 
{{CollectionRef}} into it
 # Switch back to {{OverseerThreadFactory}} thread, step a few steps, it will 
purge the {{collectionWatches}} but the {{watchedCollectionStates}} will still 
have one entry in it.
 # If we inspect the {{zkStateReader.clusterState}} at this point, we will 
notice that the collection will have a non lazy CollectionRef for the test 
collection, while the {{collectionWatches}} is empty but 
{{watchedCollectionStates}} would still have the collection state in there

 


> Possible Race condition in ZkStateReader between fields 
> watchedCollectionStates and collectionWatches
> -----------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-16257
>                 URL: https://issues.apache.org/jira/browse/SOLR-16257
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 8.8, main (10.0)
>            Reporter: Patson Luk
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have observed certain weird behavior with our Solr that the cluster state 
> for certain collections seem to be "stuck" in certain stale states. While 
> inspecting the current code logic, it's found that certain race condition can 
> arise and lead to inconsistent states of `collectionWatches` and 
> `watchedCollectionStates`
>  
> By looking at the current design, it appears that those 2 fields:
>  * {{collectionWatches -}} for "watching" ZK updates on collections (via 
> notification). Such map has the collection name (ie org) as the key and value 
> is {{CollectionWatch}} which holds another set of 
> {{{}DocCollectionWatcher{}}}s in this case (ie {{{}ConcurrentHashMap<String, 
> CollectionWatch<DocCollectionWatcher>> collectionWatches{}}})
>  * {{watchedCollectionStates}} - as {{{}ConcurrentHashMap<String, 
> DocCollection>{}}}, which key is collection name and value is the 
> {{DocCollection}} value triggered by previous watch event handled by 
> {{ZkStateReader$StateWatcher}} (either by the fetch on first watcher 
> registration or by notification from ZK on state changes)
> On the design level, such 2 fields should be "in-sync" , ie if a collection 
> is no longer tracked in {{collectionWatches}} then it should not have any 
> entry in {{watchedCollectionStates}} neither.
>  
> However such guarantee is not a strong one as we can see there are various 
> code pieces that try to remove entries from {{watchedCollectionStates}} if 
> somehow the collection is no longer in {{collectionWatches}} for example in 
> [here|https://github.com/fullstorydev/lucene-solr/blob/b8a2e1574d71094fcf81e6d693634b9ced8f4956/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L2137]
>  , in particular this appears to address certain race condition with 
> {{{}unregisterCore{}}}. The code that removes registered DocCollectionWatcher 
> also tries to ensure both maps are in sync as in 
> [here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1916]
> *The core of the issue appears to be that there's an assumption when the last 
> {{DocCollectionWatcher}} is removed from the {{{}CollectionWatch{}}}, both 
> the {{watchedCollectionStates}} and {{collectionWatches}} should be purged of 
> the watched collection.* Hence the {{clusterState}} should have the 
> {{LazyCollectionRef}} instead, which {{DocCollection get(boolean 
> allowCached)}} invocation should return the correct {{DocCollection}} state 
> on {{{}allowCached=false{}}}. Unfortunately, there could still be possible 
> race condition as far as there are 2 separate maps. One possible race 
> condition is demonstrated in later section
> h3. PR proposal
>  An idea is to Merge {{DocCollectionWater}} into  {{CollectionWatch}} so we 
> would not run into race condition that the collection key appears in one but 
> not the other. Please see the PR here https://github.com/apache/solr/pull/909 
> and would love to gather some feedbacks and thoughts!
>  
> h3. Steps to reproduce a race condition
> Spin up 2 nodes, only one node should serve the test collection, the other 
> node should be made the overseer/leader of the cluster. Debug on the overseer 
> node
>  # From the IDE, ensure all breakpoints suspend "thread" only, not "all" 
> (that's for intelliJ)
>  # All breakpoints are in {{ZkStateReader}}. Set breakpoint at line 
> [here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1965],
>  put condition {{Thread.currentThread().getName().startsWith("zkCallback")}} 
> . This pauses when zk notification comes in and right before it checks 
> whether the collection is already in {{watchedCollectionStates}} in method 
> {{updateWatchedCollection}}
>  # Set breakpoint at 
> [here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1829],
>  which throws {{TimeoutException}} after the latch timeout, for example from 
> call of  (inside {{{}ZkStateWatcher#waitForState{}}})
>  # now start debugging the overseer node
>  # Issue a split shard request to the overseer. For example 
> http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=test&shard=shard1
>  # Eventually a thread will stop the first breakpoint within 
> {{updateWatchedCollection}}, the thread name should be something like 
> {{zkCallback-127-thread-2}}
>  # Might have to wait for 320 secs (timeout on 
> {{CollectionHandlingUtils.waitForNewShard}}), another thread should stop at 
> 2nd breakpoint throwing {{TimeoutException}}, the thread name should be 
> something similar to {{OverseerThreadFactory...}}
>  # Add breakpoint at 
> {{removeDocCollectionWatcher}}[here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1925],
>  that removes the entry from {{watchedCollectionStates}} but not yet purge 
> the {{collectionWatches}} 
>  # Resume this {{OverseerThreadFactory}} thread that's going to throw 
> {{TimeoutException}}, it should stop the new breakpoint in 
> {{removeDocCollectionWatcher}}.
>  # Switch back to thread {{zkCallback...}}, resume that thread, it should 
> find that the {{watchedCollectionStates}} is empty, hence trying to insert a 
> {{CollectionRef}} into it
>  # Switch back to {{OverseerThreadFactory}} thread, step a few steps, it will 
> purge the {{collectionWatches}} but the {{watchedCollectionStates}} will 
> still have one entry in it.
>  # If we inspect the {{zkStateReader.clusterState}} at this point, we will 
> notice that the collection will have a non lazy CollectionRef for the test 
> collection, while the {{collectionWatches}} is empty but 
> {{watchedCollectionStates}} would still have the collection state in there
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to