[ https://issues.apache.org/jira/browse/SOLR-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250308#comment-17250308 ]
Ishan Chattopadhyaya commented on SOLR-14928: --------------------------------------------- bq. Change cluster state updates so that each (Collection API) command execution does the update directly in Zookeeper using optimistic locking (Compare and Swap on the state.json Zookeeper files). IIUC, the idea is for every node to do a compare-and-set (CAS) on the state.json to update the state of the replicas it has. This approach will result in a spinlock when lots of nodes that host the same collection recover at the same time. Imagine there's a collection with 2000+ replicas, scattered across many nodes. Restarting all those nodes will result in a lot of contention and failed updates during the CAS. This spinlock is extremely inefficient. Here's a quick comparison that I performed for this approach vs. SOLR-15052: https://github.com/chatman/experiments/blob/main/src/main/java/StateListVsCASSpinlock.java {code} Time to update (CAS): 94584.337722ms Time to update (States List): 203.532139ms {code} ^ This was as a result of 2048 shards, updated all at once (using multiple threads, trying to simulate the behaviour that will result in multiple nodes recovering at once). Please let me know if I'm missing something. > Remove Overseer ClusterStateUpdater > ----------------------------------- > > Key: SOLR-14928 > URL: https://issues.apache.org/jira/browse/SOLR-14928 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Ilan Ginzburg > Assignee: Ilan Ginzburg > Priority: Major > Labels: cluster, collection-api, overseer > > Remove the Overseer {{ClusterStateUpdater}} thread and associated Zookeeper > queue at {{<_chroot_>/overseer/queue}}. > Change cluster state updates so that each (Collection API) command execution > does the update directly in Zookeeper using optimistic locking (Compare and > Swap on the {{state.json}} Zookeeper files). > Following this change cluster state updates would still be happening only > from the Overseer node (that's where Collection API commands are executing), > but the code will be ready for distribution once such commands can be > executed by any node (other work done in the context of parent task > SOLR-14927). > See the [Cluster State > Updater|https://docs.google.com/document/d/1u4QHsIHuIxlglIW6hekYlXGNOP0HjLGVX5N6inkj6Ok/edit#heading=h.ymtfm3p518c] > section in the Removing Overseer doc. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org