[ 
https://issues.apache.org/jira/browse/SOLR-13490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16850933#comment-16850933
 ] 

Gus Heck commented on SOLR-13490:
---------------------------------

I tend to like #3 because it simplifies the API such that it's only watching 
one thing at a time which is simpler. Your hypothetical watcher sounds like it 
wants to watch two things (live nodes and state) and I like the solution to not 
conflate the two items. The watcher can watch one or the other and query when 
it sees a change, or it can watch both and keep a state machine if it prefers. 
The watching code will know which strategy is best/most efficient for whatever 
it does (or get it wrong, but not require changes to zkStateReader when that's 
discovered). Passing back the ZkStateReader is likely redundant for code that 
has set a watch (via a reference zkStateReader it already holds)


> waitForState/registerCollectionStateWatcher can see stale liveNodes data due 
> to (Zk) Watcher race condition
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13490
>                 URL: https://issues.apache.org/jira/browse/SOLR-13490
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Priority: Major
>         Attachments: SOLR-13490.patch
>
>
> I was investigating some failures in 
> {{TestCloudSearcherWarming.testRepFactor1LeaderStartup}} which lead me to the 
> hunch that {{waitForState}} wasn't ensuring that the predicates registered 
> would always be called if/when a node was shutdown.
> Digging into it a bit more, I found that the root cause seems to be the way 
> the {{CollectionStateWatcher}} / {{CollectionStatePredicate}} APIs pass in 
> *both* the {{DocCollection}}, and the "current" {{liveNodes}} - but are only 
> _triggered_ by the {{StateWatcher}} on the {{state.json}} (which is used to 
> rebuild the {{DocCollection}}) - when the {{CollectionStateWatcher}} / 
> {{CollectionStatePredicate}} are called, they get the "fresh" 
> {{DocCollection}} but they get the _cached_ {{ZkStateReader.liveNodes}}
> Meanwhile, the {{LiveNodeWatcher}} only calls {{refreshLiveNodes()}} only 
> updates {{ZkStateReader.liveNodes}} and triggers any {{LiveNodesListener}} - 
> it does *NOT* invoke any {{CollectionStateWatcher}} that may have replicas 
> hosted on any of changed nodes.
> Since there is no garunteed order that Watchers will be triggered, this means 
> there is a race condition where the following can happen...
>  * client1 has a ZkStateReader with cached {{liveNodes=[N1, N2, N3]}}
>  * client1 registers a {{CollectionStateWatcher}} "watcherZ" that cares if 
> "replicaX" of collectionA is on a "down" node
>  * client2 causes shutdown of node N1 which is hosting replicaX
>  * client1's zkStateReader gets a WatchedEvent for state.json of collectionA
>  ** DocCollection for collectionA is rebuilt
>  ** watcherZ is fired w/cached {{liveNodes=[N1, N2, N3]}} and the new 
> DocCollection
>  *** watcherZ sees that replicaX is on N1, but thinks N1 is live
>  *** watcherZ says "everything ok, not the event i was waiting for" and 
> doesn't take any action
>  * client1's zkStateReader gets a WatchedEvent for LIVE_NODES_ZKNODE
>  ** zkStateReader.liveNodes is rebuilt
> ...at no point in this sequence (or after this) will watcherZ be notified 
> fired with the updated liveNodes (unless/until another {{state.json}} change 
> is made for collectionA.
> ----
> While this is definitely be problematic in _tests_ that deal with node 
> lifecyle and use things like {{SolrCloudTestCase.waitForState(..., 
> SolrCloudTestCase.clusterShape(...))}} to check for the expected 
> shards/replicas, a cursory search of how/where 
> {{ZkStateReader.waitForState(...)}} and 
> {{ZkStateReader.registerCollectionStateWatcher(...)}} are used in solr-core 
> suggests that could also lead to bad behavior in situations like reacting to 
> shard leader loss, waiting for all leaders of SYSTEM_COLL to come online for 
> upgrade, running PrepRecoveryOp, etc... (anywhere that liveNodes is used by 
> the watcher/predicate)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to