I'm hearing two technical concerns with disabling the Overseer:

(A) For many-replica collections, replica state changes don't scale well to
a single state.json that has the state of all replicas.  SolrCloud has a
solution to _that_ problem today -- PRS.  Salesforce's SolrCloud fork
basically removed the state of replicas, notwithstanding examining live
nodes at runtime, so I don't think Salesforce sees this problem today
either, right?  (not that it matters to the community but I want to ensure
I'm understanding the scope of this problem)

(B) For many-replica collections, creation/deletion/moving of many replicas
at once doesn't scale well.  The example given was deleting a collection
but I'm skeptical that's a good example, since my reading of its code is
that it deletes replicas in sequence, not parallel.  But let's imagine it
did or lets imagine many-replica collection creation.  PRS doesn't address
this, as it only separates the replica's state enum, not the replica's very
existence or location.  Near term proposed solution: The "Cmd"
implementations generally use a ShardHandler and its concurrency could be
capped to something reasonable.  And for other cluster replica rebalancing
-- it would likewise need to be throttled to do a manageable number at
once; can't move all replicas at once (for big collections, anyway).  Users
have solutions for this; not within SolrCloud's OOTB code.

I don't mean to push against a redesign of cluster state, but that's not in
the short term.  I think there are reasonable short term solutions.

>

Reply via email to