Re: Failed to wait for initial partition map exchange

Alexey Goncharuk Thu, 14 Jul 2016 14:03:12 -0700

This is a cross-post from a user list.

We faced this issue for a lot of times before and got a lot of users
complaining about the whole cluster freeze. We can protect a cluster from
such a situation simply by dropping non-responsive nodes from the cluster.
Of course, we need to get to the bottom of the root cause, and killing
nodes may cause some data loss in the cluster, but I think it is better
than restarting the whole cluster from scratch.


To summarize, I suggest to 'kill' non-responsive nodes from topology after
some timeout in exchange future.

Thoughts?

Re: Failed to wait for initial partition map exchange

Reply via email to