Re: Shard takeover behavior

Aaron McCurry Mon, 17 Feb 2014 06:17:28 -0800

On Mon, Feb 17, 2014 at 1:12 AM, Ravikumar Govindarajan <
[email protected]> wrote:

> Thanks Aaron for the quick reply...
>
> Is the shard-unavailability auto-detected by ZK? Then I obviously have to
> take into account GC pauses also, before configuring it.
>

Yes you do.  There is a sample GC config in the blur-env.sh that I would
recommend.  Although it is not directly related to GC pauses, Blur have a
GC back pressure feature.  If at any point after a GC has occurred there is
more heap than is allowed (75% of the heap by default) a back pressure
event is triggered.  This will cancel all current queries with a
BlurException of type BACK_PRESSURE, the controller will automatically
retry the queries that were cancelled.  In practice the end result is a
slow down in the cluster and the query that would have caused the OOM will
not finish but other queries get through the system in between back
pressure event.  This meant prevent the cascading failure of the cluster
when an OOM exception is caused by a query.

>
> Serving same-shard from multiple servers is possible? Wow, that would be
> killer feature ...
>

Since every update to the index via thrift mutates are committed and synced
to HDFS, it possible to have other shards follow the lead shard (the
writer).  This won't help the write fail over event, but would allow for
reads to continue uninterrupted during a failure.

Aaron

>
>
>
> On Fri, Feb 14, 2014 at 9:44 PM, Aaron McCurry <[email protected]> wrote:
>
> > On Fri, Feb 14, 2014 at 10:11 AM, Ravikumar Govindarajan <
> > [email protected]> wrote:
> >
> > > I would like to know the internals of what happens during a layout
> > change.
> > >
> > > Shards will get re-assigned to different shard-servers
> > >
> > > Taking the case of a single shard during re-assignment, will there be 2
> > > shard-servers serving the same shard for a brief period of time? Will
> > there
> > > be a window, where shard is temporarily unavailable etc..
> > >
> >
> > Currently there will be a brief amount of time (a few seconds for most
> > medium to large indexes) where the shard is unavailable.  In the future
> we
> > will likely allow the same shard to be served from multiple servers at
> the
> > same time, but currently that is not the case.
> >
> > Aaron
> >
> >
> > >
> > > Any help is much appreciated
> > >
> > > --
> > > Ravi
> > >
> >
>

Re: Shard takeover behavior

Reply via email to