On Wed, Sep 5, 2012 at 12:38 PM, Jonathan Hsieh <[email protected]> wrote:
> I generally think in pictures, so I've mapped out the single Assignment
> control flow as found in trunk yesterday in terms of threads and network
> communications (each of which can possibly fail).  It is a process that has
> 18 or so network communications, 3 processes, and about 8 threads
> coordinating (excluding meta writes)
>

Did you attach your picture Jon?


> We've also talked about defining design and code invariants -- here's the
> one that I've gotten so far:  (We can pull up more from discussion)
>
> * ZK state should transient (treat it like memory). If deleted, hbase should
> be able to recover and essentially be in the same state (a few exceptions --
> enabled/disable state)
>

Yes.

We should post these invariants somewhere?  In dev section of refguide?


> A few questions I have from this exercise:
>
> 1) Why do we have ZK asynchronously update the HM?  (why not do it
> synchronously?)

IIRC, it was faster.


> 2) Why do we have the RS update ZK as it opens -- why not have the HM manage
> all ZK comms and not have the RS talk directly to ZK in this process?  Then
> ZK is just for failover and less so for coordination.

IIRC, the notion was that we could keep an eye on the regionserver
progress opening a region.  RS could take a long time opening and as
long as it was tickling zk by resetting state, the master would not
take control of the region away from the RS.  Inversely, if the RS
froze mid-open, it'd know it lost control if when it tried to set
state, the sequence id had moved on from what it thought it was.

> 3) Clients who issue assign calls are partially asynchronous and partially
> synchronous.  Why not go all the way?

No reason.  The thought was async meant less friction.  The work was
just never done to async it all.

> 4) Why are there multiple error conventions -- abort, FAILED_OPEN, throwing
> exception, (and cases where we "return" silently without notification)?

I would have to look at the particular instance but high level I'd say
its a case of:

1. On the one hand your classic myopic patch-centric view
2. While on the other, you can't throw an exception out to the master
if the rpc open has been successfully handed off and the rpc has
completed... there needs to be another means flagging error.

> 5) How do we handle timeout situations -- IMO it makes sense to have a
> rollback or fail forward policy for different places on the timeline.

Yes.  There are a couple of flavors of this in the code base at
present.  Could do w/ a revisit for sure.

> 6) Can we use cancellation instead of checking for
> enabling/disabled/disabling/shutdown/stopping all over the place? (let's say
> these cluster ops would cancel the assign and then win by blocking assigns).

The enabling, etc., checks are done on assign to make sure we don't go
ahead if table state has changed since the order to assign was given.

To me cancel seems like something else; the open or close has gone out
already and we want to stop it happening.

They seem like different things to me.

> 7) In memory state has different but similarly named states in the HM, ZK,
> and in the RS's.  And there are the transition events could be missed.

Yes.  This is a problem.

My peeve is the one where we cannot trust what RegionState says and
even if we could, its states are not 'clean'; e.g. OFFINE is both
BEGIN the open of a region but also a catchall parking state that we
put regions into when not sure what else to do w/ them.

> 8) Is having multiple processes "responsible for acting" necessary?  (why
> not have the HM open and then update meta)?
>

It could be good having master do all meta edits.  Would be good to
see what advantage it would bring us before going about making the
change.

I can provide more history and provenance if needed, np.

St.Ack

Reply via email to