[
https://issues.apache.org/jira/browse/HBASE-12464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218987#comment-14218987
]
Hadoop QA commented on HBASE-12464:
-----------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12682560/HBASE-12464.v2-2.0.patch
against master branch at commit b6dd9b441fb279bbd7b6c48d809166b2b0235514.
ATTACHMENT ID: 12682560
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1
warning messages.
{color:green}+1 checkstyle{color}. The applied patch does not increase the
total number of checkstyle errors
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100
{color:green}+1 site{color}. The mvn site goal succeeds with this patch.
{color:red}-1 core tests{color}. The patch failed these unit tests:
org.apache.hadoop.hbase.client.TestHCM
Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/11758//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11758//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11758//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11758//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11758//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11758//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11758//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11758//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11758//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11758//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11758//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11758//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors:
https://builds.apache.org/job/PreCommit-HBASE-Build/11758//artifact/patchprocess/checkstyle-aggregate.html
Javadoc warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/11758//artifact/patchprocess/patchJavadocWarnings.txt
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/11758//console
This message is automatically generated.
> meta table region assignment stuck in the FAILED_OPEN state due to region
> server not fully ready to serve
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-12464
> URL: https://issues.apache.org/jira/browse/HBASE-12464
> Project: HBase
> Issue Type: Bug
> Components: Region Assignment
> Affects Versions: 1.0.0, 2.0.0, 0.99.1
> Reporter: Stephen Yuan Jiang
> Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0
>
> Attachments: HBASE-12464.v1-1.0.patch, HBASE-12464.v1-2.0.patch,
> HBASE-12464.v2-2.0.patch
>
> Original Estimate: 24h
> Time Spent: 7.4h
> Remaining Estimate: 1h
>
> meta table region assignment could reach to the 'FAILED_OPEN' state, which
> makes the region not available unless the target region server shutdown or
> manual resolution. This is undesirable state for meta tavle region.
> Here is the sequence how this could happen (the code is in
> AssignmentManager#assign()):
> Step 1: Master detects a region server (RS1) that hosts one meta table region
> is down, it changes the meta region state from 'online' to 'offline'
> Step 2: In a loop (with configuable maximumAttempts count, default is 10, and
> minimal is 1), AssignmentManager tries to find a RS to host the meta table
> region. If there is no RS available, it would loop forver by resetting the
> loop count (BUG#1 from this logic - a small bug)
> {code}
> if (region.isMetaRegion()) {
> try {
> Thread.sleep(this.sleepTimeBeforeRetryingMetaAssignment);
> if (i == maximumAttempts) i = 1; // ==> BUG: if
> maximumAttempts is 1, then the loop will end.
> continue;
> } catch (InterruptedException e) {
> ...
> }
> {code}
> Step 3: Once a new RS is found (RS2), inside the same loop as Step 2,
> AssignmentManager tries to assign the meta region to RS2 (OFFLINE, RS1 =>
> PENDING_OPEN, RS2). If for some reason that opening the region in RS2 failed
> (eg. the target RS2 is not ready to serve - ServerNotRunningYetException),
> AssignmentManager would change the state from (PENDING_OPEN, RS2) to
> (FAILED_OPEN, RS2). then it would retry (and even change the RS server to go
> to). The retry is up to maximumAttempts. Once the maximumAttempts is
> reached, the meta region will be in the 'FAILED_OPEN' state, unless either
> (1). RS2 shutdown to trigger region assignment again or (2). it is
> reassigned by an operator via HBase Shell.
> Based on the document ( http://hbase.apache.org/book/regions.arch.html ),
> this is by design - "17. For regions in FAILED_OPEN or FAILED_CLOSE states ,
> the master tries to close them again when they are reassigned by an operator
> via HBase Shell.".
> However, this is bad design, espcially for meta table region (it is arguable
> that the design is good for regular table - for this ticket, I am more focus
> on fixing the meta region availablity issue).
> I propose 2 possible fixes:
> Fix#1 (band-aid change): in Step 3, just like Step 2, if the region is a meta
> table region, reset the loop count so that it would not leave the loop with
> meta table region in FAILED_OPEN state.
> Fix#2 (more involved): if a region is in FAILED_OPEN state, we should provide
> a way to automatically trigger AssignmentManager::assign() after a short
> period of time (leaving any region in FAILED_OPEN state or other states like
> 'FAILED_CLOSE' is undesirable, should have some way to retrying and auto-heal
> the region).
> I think at least for 1.0.0, Fix#1 is good enough. We can open a task-type of
> JIRA for Fix#2 in future release.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)