[ 
https://issues.apache.org/jira/browse/HBASE-21288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646180#comment-16646180
 ] 

Hadoop QA commented on HBASE-21288:
-----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
1s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.0 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
20s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
58s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
21s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 4s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
28s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} branch-2.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
12s{color} | {color:red} hbase-server: The patch generated 1 new + 162 
unchanged - 0 fixed = 163 total (was 162) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 2s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 21s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}154m 38s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}193m 50s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.client.TestMultiParallel |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 |
| JIRA Issue | HBASE-21288 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943353/HBASE-21288.branch-2.0.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux f8ac776d2a85 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.0 / ed80fc5d6c |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14643/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14643/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14643/testReport/ |
| Max. process+thread count | 4094 (vs. ulimit of 10000) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14643/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> HostingServer in UnassignProcedure is not accurate
> --------------------------------------------------
>
>                 Key: HBASE-21288
>                 URL: https://issues.apache.org/jira/browse/HBASE-21288
>             Project: HBase
>          Issue Type: Sub-task
>          Components: amv2, Balancer
>    Affects Versions: 2.1.0, 2.0.2
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>         Attachments: HBASE-21288.branch-2.0.001.patch
>
>
> We have a case that a region shows status OPEN on a already dead server in 
> meta table(it is hard to trace how this happen), meaning this region is 
> actually not online. But balance came and scheduled a MoveReionProcedure for 
> this region, which created a mess:
> The balancer 'thought' this region was on the server which has the same 
> address(but with different startcode). So it schedules a MRP from this online 
> server to another, but the UnassignProcedure dispatch the unassign call to 
> the dead server according to regionstate, which then found the server dead 
> and schedule a SCP for the dead server. But since the UnassignProcedure's 
> hostingServer is not accurate, the SCP can't interrupt it.
> So, in the end, the SCP can't finish since the UnassignProcedure has the 
> region' lock, the UnassignProcedure can not finish since no one wake it, thus 
> stuck.
> Here is log, notice that the server of the UnassignProcedure is 
> 'hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539153278584' but it was 
> dispatch to 'hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964'
> {code}
> 2018-10-10 14:34:50,011 INFO  [PEWorker-4] 
> assignment.RegionTransitionProcedure(252): Dispatch pid=13, ppid=12, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; UnassignProcedure 
> table=hbase:acl, region=267335c85766c62479fb4a5f18a1e95f, 
> server=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539153278584; rit=CLOSING, 
> location=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964
> 2018-10-10 14:34:50,011 WARN  [PEWorker-4] 
> assignment.RegionTransitionProcedure(230): Remote call failed 
> hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964; pid=13, ppid=12, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; UnassignProcedure 
> table=hbase:acl, region=267335c85766c62479fb4a5f18a1e95f, 
> server=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539153278584; rit=CLOSING, 
> location=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964; 
> exception=NoServerDispatchException
> org.apache.hadoop.hbase.procedure2.NoServerDispatchException: 
> hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964; pid=13, ppid=12, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; UnassignProcedure 
> table=hbase:acl, region=267335c85766c62479fb4a5f18a1e95f, 
> server=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539153278584
> //Then a SCP was scheduled
> 2018-10-10 14:34:50,012 WARN  [PEWorker-4] master.ServerManager(635): 
> Expiration of hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964 but 
> server not online
> 2018-10-10 14:34:50,012 INFO  [PEWorker-4] master.ServerManager(615): 
> Processing expiration of hb-uf6oyi699w8h700f0-003.hbase.rds. 
> ,16020,1539076734964 on hb-uf6oyi699w8h700f0-001.hbase.rds. 
> ,16000,1539088156164
> 2018-10-10 14:34:50,017 DEBUG [PEWorker-4] 
> procedure2.ProcedureExecutor(1089): Stored pid=14, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=false; ServerCrashProcedure 
> server=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964, 
> splitWal=true, meta=false
> //The SCP did not interrupt the UnassignProcedure but schedule new 
> AssignProcedure for this region
> 2018-10-10 14:34:50,043 DEBUG [PEWorker-6] 
> procedure.ServerCrashProcedure(250): Done splitting WALs pid=14, 
> state=RUNNABLE:SERVER_CRASH_SPLIT_LOGS, hasLock=true; ServerCrashProcedure 
> server=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964, 
> splitWal=true, meta=false
> 2018-10-10 14:34:50,054 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1691): Initialized subprocedures=[{pid=15, 
> ppid=14, state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; 
> AssignProcedure table=hbase:acl, region=267335c85766c62479fb4a5f18a1e95f}, 
> {pid=16, ppid=14, state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; 
> AssignProcedure table=hbase:req_intercept_rule, 
> region=460481706415d776b3742f428a6f579b}, {pid=17, ppid=14, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure 
> table=hbase:namespace, region=ec7a965e7302840120a5d8289947c40b}]
> {code}
> Here I also added a safe fence in balancer, if such regions are found, 
> balancing is skipped for safe.It should do no harm.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to