On Wed, Sep 5, 2012 at 12:38 PM, Jonathan Hsieh <[email protected]> wrote: > I generally think in pictures, so I've mapped out the single Assignment > control flow as found in trunk yesterday in terms of threads and network > communications (each of which can possibly fail). It is a process that has > 18 or so network communications, 3 processes, and about 8 threads > coordinating (excluding meta writes) >
Did you attach your picture Jon? > We've also talked about defining design and code invariants -- here's the > one that I've gotten so far: (We can pull up more from discussion) > > * ZK state should transient (treat it like memory). If deleted, hbase should > be able to recover and essentially be in the same state (a few exceptions -- > enabled/disable state) > Yes. We should post these invariants somewhere? In dev section of refguide? > A few questions I have from this exercise: > > 1) Why do we have ZK asynchronously update the HM? (why not do it > synchronously?) IIRC, it was faster. > 2) Why do we have the RS update ZK as it opens -- why not have the HM manage > all ZK comms and not have the RS talk directly to ZK in this process? Then > ZK is just for failover and less so for coordination. IIRC, the notion was that we could keep an eye on the regionserver progress opening a region. RS could take a long time opening and as long as it was tickling zk by resetting state, the master would not take control of the region away from the RS. Inversely, if the RS froze mid-open, it'd know it lost control if when it tried to set state, the sequence id had moved on from what it thought it was. > 3) Clients who issue assign calls are partially asynchronous and partially > synchronous. Why not go all the way? No reason. The thought was async meant less friction. The work was just never done to async it all. > 4) Why are there multiple error conventions -- abort, FAILED_OPEN, throwing > exception, (and cases where we "return" silently without notification)? I would have to look at the particular instance but high level I'd say its a case of: 1. On the one hand your classic myopic patch-centric view 2. While on the other, you can't throw an exception out to the master if the rpc open has been successfully handed off and the rpc has completed... there needs to be another means flagging error. > 5) How do we handle timeout situations -- IMO it makes sense to have a > rollback or fail forward policy for different places on the timeline. Yes. There are a couple of flavors of this in the code base at present. Could do w/ a revisit for sure. > 6) Can we use cancellation instead of checking for > enabling/disabled/disabling/shutdown/stopping all over the place? (let's say > these cluster ops would cancel the assign and then win by blocking assigns). The enabling, etc., checks are done on assign to make sure we don't go ahead if table state has changed since the order to assign was given. To me cancel seems like something else; the open or close has gone out already and we want to stop it happening. They seem like different things to me. > 7) In memory state has different but similarly named states in the HM, ZK, > and in the RS's. And there are the transition events could be missed. Yes. This is a problem. My peeve is the one where we cannot trust what RegionState says and even if we could, its states are not 'clean'; e.g. OFFINE is both BEGIN the open of a region but also a catchall parking state that we put regions into when not sure what else to do w/ them. > 8) Is having multiple processes "responsible for acting" necessary? (why > not have the HM open and then update meta)? > It could be good having master do all meta edits. Would be good to see what advantage it would bring us before going about making the change. I can provide more history and provenance if needed, np. St.Ack
