I'm hearing two technical concerns with disabling the Overseer: (A) For many-replica collections, replica state changes don't scale well to a single state.json that has the state of all replicas. SolrCloud has a solution to _that_ problem today -- PRS. Salesforce's SolrCloud fork basically removed the state of replicas, notwithstanding examining live nodes at runtime, so I don't think Salesforce sees this problem today either, right? (not that it matters to the community but I want to ensure I'm understanding the scope of this problem)
(B) For many-replica collections, creation/deletion/moving of many replicas at once doesn't scale well. The example given was deleting a collection but I'm skeptical that's a good example, since my reading of its code is that it deletes replicas in sequence, not parallel. But let's imagine it did or lets imagine many-replica collection creation. PRS doesn't address this, as it only separates the replica's state enum, not the replica's very existence or location. Near term proposed solution: The "Cmd" implementations generally use a ShardHandler and its concurrency could be capped to something reasonable. And for other cluster replica rebalancing -- it would likewise need to be throttled to do a manageable number at once; can't move all replicas at once (for big collections, anyway). Users have solutions for this; not within SolrCloud's OOTB code. I don't mean to push against a redesign of cluster state, but that's not in the short term. I think there are reasonable short term solutions. >
