Re: moving & pending ranges & repair

Jaakko Sun, 22 Nov 2009 19:47:11 -0800

> We sleep long enough when gossiping pending ranges before starting to
> move data that we're safe from micropartitions.


Hmm... I don't think we sleep at all at that time. Once we get load
info, we gossip pending ranges and start to move data immediately. In
most cases gossip is slow enough for other nodes to see
'bootstrapping' and 'normal' states simultaenously, so pending ranges
exist only for the duration of handling gossip state information.


> adding an explicit
> check for the coordinationg [moving, in our case] node to ask the
> other nodes "do you have the pending ranges for this move" before
> proceeding would be nice to foolproof things.  But if you're going to
> do that then using gossip for the move all is silly.

IMHO coordinating a move and gossiping a state are not redundant
operations, but serve different purposes. Former is for making sure
that the move does not break things (all nodes *affected by range
changes* stay put for the duration of the maneuvering), and the latter
is for letting *all* cluster nodes to know where to direct data in
case there is a write during the move. Now it might of course be that
gossip is enough, but I think we'll need some level of coordination
when we're doing automated load balancing the latest. Without
coordination, the cluster might easily have unnecessary movement.

-Jaakko

Re: moving & pending ranges & repair

Reply via email to