> We sleep long enough when gossiping pending ranges before starting to > move data that we're safe from micropartitions.
Hmm... I don't think we sleep at all at that time. Once we get load info, we gossip pending ranges and start to move data immediately. In most cases gossip is slow enough for other nodes to see 'bootstrapping' and 'normal' states simultaenously, so pending ranges exist only for the duration of handling gossip state information. > adding an explicit > check for the coordinationg [moving, in our case] node to ask the > other nodes "do you have the pending ranges for this move" before > proceeding would be nice to foolproof things. But if you're going to > do that then using gossip for the move all is silly. IMHO coordinating a move and gossiping a state are not redundant operations, but serve different purposes. Former is for making sure that the move does not break things (all nodes *affected by range changes* stay put for the duration of the maneuvering), and the latter is for letting *all* cluster nodes to know where to direct data in case there is a write during the move. Now it might of course be that gossip is enough, but I think we'll need some level of coordination when we're doing automated load balancing the latest. Without coordination, the cluster might easily have unnecessary movement. -Jaakko