Re: Shard takeover behavior

Aaron McCurry Thu, 06 Mar 2014 18:32:18 -0800

On Thu, Mar 6, 2014 at 6:30 AM, Ravikumar Govindarajan <
[email protected]> wrote:

> I came to know about zk.session.timeout variable just now, while reading
> more about this problem.
>
> This will only trigger dead-node notification after the configured timeout
> exceeds. Setting it to 3-4 mins must be fine for OOMs and rolling-restarts.
>

Well it works that way for OOMs and for when the process drop hard (Think
kill -9).  However when a shard server is shutdown it currently ends it's
session in ZooKeeper, thus triggering a layout change.

>
> Only extra stuff I am looking for, is to divert search calls to a read-only
> shard instance during this 3-4 mins time to avoid mini-outages
>

Yes, and I think that the controllers will automatically spread the queries
across those servers that are online.  The BlurClient class already takes a
list of connection strings and treats all connections as equals.  For
example, it's current use is to provide the client with all the controllers
connection strings.  Internally if any one of the controllers goes down or
has a network issue another controller is automatically retried without the
user having to do anything.  There is back off, ping, and pooling logic in
the BlurClientManager that the BlurClient utilizes.

Aaron

>
> --
> Ravi
>
>
>
> On Thu, Mar 6, 2014 at 3:34 PM, Ravikumar Govindarajan <
> [email protected]> wrote:
>
> > What do you think of giving an extra leeway for shard-server  failover
> > cases?
> >
> > Ex: Whenever a shard-server process gets killed, the controller-node does
> > not immediately update-layout, but rather mark it as a suspect.
> >
> > When we have a read-only back-up of shard, searches can continue
> > unhindered. Indexing during this time can be diverted to a queue, which
> > will store and retry-ops, when shard-server comes online again.
> >
> > Over configured number of attempts/time, if the shard-server does not
> come
> > up, then one controller-server can authoritatively mark it as down and
> > update the layout.
> >
> > --
> > Ravi
> >
> >
>

Re: Shard takeover behavior

Reply via email to