Hey Ilan, curious if you have tried PRS in your implementation or not
at this point and what your experience has been if you have tried it?
I believe PRS currently publishes DOWNNODE messages to overseer, but
they are essentially a no-op by the overseer so they have very little
impact. We are running a cluster with many collections/shards and PRS
has been a huge improvement for us in processing nodes going down/up.

The idea of ephemeral nodes seems interesting, but maybe some added
risk around Zookeeper session expiration and re-establishing replica
state.

Justin

On Tue, Sep 26, 2023 at 6:14 PM Ilan Ginzburg <ilans...@gmail.com> wrote:
>
> *TL;DR; a way to track replica state using EPHEMERAL nodes that disappear
> automatically when a node goes down.*
>
> Hi,
>
> When running a cluster with many collections and replicas per node,
> processing of DOWNNODE messages takes more time.
> In a public cloud setup, the node that went down can come back quickly
> before that processing is finished. When that happens, replicas are marked
> DOWN by DOWNNODE while they are marked ACTIVE by the node starting, and
> depending on how the two operations intermesh, some replicas then stay DOWN
> forever (forever is until node is restarted).
> We had to put in place K8s init containers to add a delay before nodes
> restart. This delays rolling restarts, deployments and node crash recovery
> so not a desirable long term solution.
>
> What do you think of a change that avoids the need for a DOWNNODE message
> altogether:
> - Each replica state is captured as an *EPHEMERAL* node in Zookeeper
> - No such node implicitly means the replica state is DOWN
> - If the node is present, it contains an encoding of the actual state (DOWN,
> ACTIVE, RECOVERING, RECOVERY_FAILED)
> - When a node goes down (or when its ZK session expires) all its replica
> state nodes automatically vanish.
>
> This change is similar to the Per Replica State implementation (starting
> point
> <https://github.com/apache/solr/blob/main/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/PerReplicaStatesOps.java#L99C17-L99C17>
> in the code) but different:
> - EPHEMERAL rather than PERSISTENT Zookeeper nodes
> - No duplicate replica state nodes (and no node version to pick the right
> one)
> - DOWNNODE not needed (if all collections are tracked in that way).
> - Need to republish all replica states after Zookeeper session expiration
> since they will disappear
>
> What do you think? esp. Noble and Ishan the authors of PRS.
> I have no detailed design and no code, just sharing an idea to solve
> a real issue we're facing.
>
> Ilan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Reply via email to