[ 
https://issues.apache.org/jira/browse/HBASE-11059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027087#comment-14027087
 ] 

Jeffrey Zhong edited comment on HBASE-11059 at 6/10/14 9:54 PM:
----------------------------------------------------------------

I have reviewed the patch v2.1 in high level. Looks great in general. I have 
following comments:

1)  Rolling restart when turn hbase.assignment.usezk from OFF to ON. 
At this scenario, rolling restart seems not working. Region server will try to 
update ZK node during region opening & will fail. Basically all RS with new 
config can't open any region. Logically it's equivalent stop every thing and 
then restart. It's also unclear to me when a new master starts where it gets 
RIT info.

2) When hbase.assignment.usezk is ON, a region is set RIT to pending_open 
before it sends region open RPC out and master restarts. During restart, how AM 
processes left over RIT isn't clear to me. The code seems only process RITs 
recorded in ZK assignment node.

3) When a region is opened after master receives region transition response 
from RS, could we use checkAndPut to update META in order to make sure the 
region location is the same as the RS who sends the region transition request 
to prevent potential double assignment.

4) Inside postOpenDeployTasks, when reportRegionTransition return false we 
don't offline the region in RS. If this situation happens, when we open a 
region which isn't supposed to do so, the opening has side effects like 
removing recovered.edits files when recovery mode is in recover edits mode.

5) A rare race condition that a RS fails to open a region and it reports to 
master as FAILED_OPEN but without firstly changing its internal memory state. 
If master re-assigns the region onto the same RS, the region open RPC will 
simply return OPENED and the region will be unassigned forever.
  


was (Author: jeffreyz):
I have reviewed the patch v2.1 in high level. Looks great in general. I have 
following comments:

1)  Rolling restart when turn hbase.assignment.usezk from ON to OFF. 
At this scenario, rolling restart seems not working. Region server will try to 
update ZK node during region opening & will fail. Basically all RS with new 
config can't open any region. Logically it's equivalent stop every thing and 
then restart. It's also unclear to me when a new master starts where it gets 
RIT info.

2) When hbase.assignment.usezk is ON, a region is set RIT to pending_open 
before it sends region open RPC out and master restarts. During restart, how AM 
processes left over RIT isn't clear to me. The code seems only process RITs 
recorded in ZK assignment node.

3) When a region is opened after master receives region transition response 
from RS, could we use checkAndPut to update META in order to make sure the 
region location is the same as the RS who sends the region transition request 
to prevent potential double assignment.

4) Inside postOpenDeployTasks, when reportRegionTransition return false we 
don't offline the region in RS. If this situation happens, when we open a 
region which isn't supposed to do so, the opening has side effects like 
removing recovered.edits files when recovery mode is in recover edits mode.

5) A rare race condition that a RS fails to open a region and it reports to 
master as FAILED_OPEN but without firstly changing its internal memory state. 
If master re-assigns the region onto the same RS, the region open RPC will 
simply return OPENED and the region will be unassigned forever.
  

> ZK-less region assignment
> -------------------------
>
>                 Key: HBASE-11059
>                 URL: https://issues.apache.org/jira/browse/HBASE-11059
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment
>            Reporter: Jimmy Xiang
>            Assignee: Jimmy Xiang
>             Fix For: 0.99.0
>
>         Attachments: hbase-11059.patch, hbase-11059_v2.1.patch, 
> hbase-11059_v2.2.patch, hbase-11059_v2.patch, zk-less_am.pdf, 
> zk-less_assignment.png
>
>
> It seems that most people don't like region assignment with ZK (HBASE-5487), 
> which causes many uncertainties. This jira is to support ZK-less region 
> assignment. We need to make sure this patch doesn't break backward 
> compatibility/rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to