[
https://issues.apache.org/jira/browse/HBASE-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865273#action_12865273
]
stack commented on HBASE-2485:
------------------------------
Doc is great. Here's a few comments.
+ I think you should start your proposal w/ some high-level intents: e.g. Only
messages from Master to RS over RPC are of import, are "commands"; messages
from RS to Master are just informational (load, split) OR, intent is moving the
intransitions out of Master to zk so intransiitions weathers a master restart.
+ Startup could be tricky. Here we are hoisting all regions in .META. up into
the unassigned in zk. I was wondering about the case where the copy from
.META. to zk/UNASSIGNED is only partially done say because master crashes.
What happens? Maybe it'll be OK? If the meta startcode does not match that
of a running regionserver, then the region has not yet been assigned so add it
to zk/UNASSIGNED.
+ In Close Region RS Flow, did we agree closing is of no use? There is
nothing master can do really if closing is taking for ever?
+ Up in zk, unfortunately, znodes will have to be named using the regions
encoded name. Will make it a little tough following region flow. Perhaps the
fix is to make encoded name of a region more prevalent in logs.
+ We said opening was nice to have rather than necessary?
+ I wonder if you need a new message from Master to RS where you can ask the RS
what regions it has deployed? Be best if we didn't need it. We shouldn't need
it I suppose.
> Persist Master in-memory state so on restart or failover, new instance can
> pick up where the old left off
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-2485
> URL: https://issues.apache.org/jira/browse/HBASE-2485
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Karthik Ranganathan
> Fix For: 0.20.5
>
> Attachments: HBase-State-Transitions.docx
>
>
> Today there was some good stuff up on IRC on how transitions won't always
> make it across Master failovers in multi-master deploy because transitions
> are kept in in-memory structure up in the Master and so on master crash, the
> new master will be missing state on startup (Todd was main promulgator of
> this observation and of the opinion that while master rewrite is scheduled
> for 0.21, some part needs to be done for 0.20.5). A few suggestions were
> made: transitions should be file-backed somehow, etc. Let this issue be
> about the subset we want to do for 0.20.5.
> Of the in-memory state queues, there is at least the master tasks queue --
> process region opens, closes, regionserver crashes, etc. -- where tasks must
> be done in order and IIRC, tasks are fairly idempotent (at least in the
> server crash case, its multi-step and we'll put the crash event back on the
> queue if we cannot do all steps in the one go). Perhaps this queue could be
> done using the new queue facility in zk 3.3.0 (I haven't looked to check if
> possible, just suggesting). Another suggestion was a file to which we'd
> append queue items, requeueing, and marking the file with task complete, etc.
> On Master restart or fail-over, we'd replay the queue log.
> There is also the Map of regions-in-transition. Yesterday we learned that
> there is a bug where server shutdown processing does not iterate the Map of
> regions-in-transition. This Map may hold regions that are in "opening" or
> "opened" state but haven't yet had the fact added to .META. by master.
> Meantime the hosting server can crash. Regions that were opening will stay
> in the regions-in-transition and those in opened-but-not-yet-added-to-meta
> will go ahead and add a crashed server to .META. (Currently
> regions-in-transition does not record server the region opening/open is
> happening on so it doesn't have enough info to be processed as part of server
> shutdown).
> Regions-in-transition also needs to be persistant. On startup,
> regions-in-transition can get kinda hectic on a big cluster. Ordering is not
> so important here I believe. A directory in zk might work (For 1M regions in
> a big cluster, that'd be about 2M creates and 2M deletes during startup --
> thats too much?). Or we could write a WAL-like log again of region
> transitions (We'd have to develop a little vocabulary) that got reread by a
> new master.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.