[ https://issues.apache.org/jira/browse/HBASE-11059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027087#comment-14027087 ]
Jeffrey Zhong edited comment on HBASE-11059 at 6/10/14 9:54 PM: ---------------------------------------------------------------- I have reviewed the patch v2.1 in high level. Looks great in general. I have following comments: 1) Rolling restart when turn hbase.assignment.usezk from OFF to ON. At this scenario, rolling restart seems not working. Region server will try to update ZK node during region opening & will fail. Basically all RS with new config can't open any region. Logically it's equivalent stop every thing and then restart. It's also unclear to me when a new master starts where it gets RIT info. 2) When hbase.assignment.usezk is ON, a region is set RIT to pending_open before it sends region open RPC out and master restarts. During restart, how AM processes left over RIT isn't clear to me. The code seems only process RITs recorded in ZK assignment node. 3) When a region is opened after master receives region transition response from RS, could we use checkAndPut to update META in order to make sure the region location is the same as the RS who sends the region transition request to prevent potential double assignment. 4) Inside postOpenDeployTasks, when reportRegionTransition return false we don't offline the region in RS. If this situation happens, when we open a region which isn't supposed to do so, the opening has side effects like removing recovered.edits files when recovery mode is in recover edits mode. 5) A rare race condition that a RS fails to open a region and it reports to master as FAILED_OPEN but without firstly changing its internal memory state. If master re-assigns the region onto the same RS, the region open RPC will simply return OPENED and the region will be unassigned forever. was (Author: jeffreyz): I have reviewed the patch v2.1 in high level. Looks great in general. I have following comments: 1) Rolling restart when turn hbase.assignment.usezk from ON to OFF. At this scenario, rolling restart seems not working. Region server will try to update ZK node during region opening & will fail. Basically all RS with new config can't open any region. Logically it's equivalent stop every thing and then restart. It's also unclear to me when a new master starts where it gets RIT info. 2) When hbase.assignment.usezk is ON, a region is set RIT to pending_open before it sends region open RPC out and master restarts. During restart, how AM processes left over RIT isn't clear to me. The code seems only process RITs recorded in ZK assignment node. 3) When a region is opened after master receives region transition response from RS, could we use checkAndPut to update META in order to make sure the region location is the same as the RS who sends the region transition request to prevent potential double assignment. 4) Inside postOpenDeployTasks, when reportRegionTransition return false we don't offline the region in RS. If this situation happens, when we open a region which isn't supposed to do so, the opening has side effects like removing recovered.edits files when recovery mode is in recover edits mode. 5) A rare race condition that a RS fails to open a region and it reports to master as FAILED_OPEN but without firstly changing its internal memory state. If master re-assigns the region onto the same RS, the region open RPC will simply return OPENED and the region will be unassigned forever. > ZK-less region assignment > ------------------------- > > Key: HBASE-11059 > URL: https://issues.apache.org/jira/browse/HBASE-11059 > Project: HBase > Issue Type: Improvement > Components: master, Region Assignment > Reporter: Jimmy Xiang > Assignee: Jimmy Xiang > Fix For: 0.99.0 > > Attachments: hbase-11059.patch, hbase-11059_v2.1.patch, > hbase-11059_v2.2.patch, hbase-11059_v2.patch, zk-less_am.pdf, > zk-less_assignment.png > > > It seems that most people don't like region assignment with ZK (HBASE-5487), > which causes many uncertainties. This jira is to support ZK-less region > assignment. We need to make sure this patch doesn't break backward > compatibility/rolling upgrade. -- This message was sent by Atlassian JIRA (v6.2#6252)