Re: Shard takeover behavior

Aaron McCurry Wed, 05 Mar 2014 03:05:25 -0800

You are correct in all accounts.


On Wed, Mar 5, 2014 at 2:07 AM, Ravikumar Govindarajan <
[email protected]> wrote:

> Aaron,
>
> I find that shard-servers are registered as ephemeral nodes in ZK.
>
> Does this mean that, when a shard-server process gets killed {OOM-killer
> etc...}, the layout gets re-built?
>

Just to point out another feature.  There is a GC watcher feature in Blur
that monitors the heap level after each GC and if the heap isn't below a
configured level all the running queries get interrupted.  This exists to
try and prevent OOM-killers.


>
> Rolling-upgrade or restarts will be performed in shard-servers one by one.
> Will the layout get re-built here also? [In this case, layout calculation
> is performed multiple times in a very short-span of time]
>

Depending on your cluster size in server and in data, one might want to
kill and restart more than one shard server at a time.  In the future we
would like to allow for a migration of indexes to other servers before
killing the node.  Allowing for a smoother transition.

Aaron


>
> --
> Ravi
>
>
> On Mon, Feb 17, 2014 at 9:20 PM, Ravikumar Govindarajan <
> [email protected]> wrote:
>
> > This will cancel all current queries with a
> >> BlurException of type BACK_PRESSURE, the controller will automatically
> >> retry the queries that were cancelled
> >
> >
> > This is a real nice feature and most handy in production too, as a
> > protection gainst both intentional and un-intentional bad queries
> >
> >
> >  Since every update to the index via thrift mutates are committed and
> >> syncedto HDFS, it possible to have other shards follow the lead shard
> >> (the
> >
> > writer)
> >
> > I think I am back to data-locality we previously discussed with your
> "lead
> > shard" remark. It is now possible to pin data-nodes to a file during
> writes
> > in hadoop. https://issues.apache.org/jira/browse/HDFS-2576
> >
> > We can always pin a shard to 3 shard-servers [assuming shard-server is
> run
> > in datanodes] and persist this ZK. Failures can be easily handled using
> > this info
> >
> > The big advantage of this method is, even in case of failures
> > short-circuit reads are always utilized. I know you are not a big-fan of
> > it, but still is it worthwhile to pursue it?
> >
> > One downside to this approach is, cluster balance might get skewed and
> > need to be properly handled in HDFS
> >
> > --
> > Ravi
> >
>

Re: Shard takeover behavior

Reply via email to