[jira] [Commented] (HBASE-21259) [amv2] Revived deadservers; recreated serverstatenode

Hadoop QA (JIRA) Wed, 10 Oct 2018 16:21:28 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-21259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645701#comment-16645701
 ]


Hadoop QA commented on HBASE-21259:
-----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HBASE-21259 does not apply to branch-2.1. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.8.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-21259 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943323/HBASE-21259.branch-2.1.004.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14637/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> [amv2] Revived deadservers; recreated serverstatenode
> -----------------------------------------------------
>
>                 Key: HBASE-21259
>                 URL: https://issues.apache.org/jira/browse/HBASE-21259
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2
>    Affects Versions: 2.1.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>             Fix For: 2.2.0, 2.1.1, 2.0.3
>
>         Attachments: HBASE-21259.branch-2.1.001.patch, 
> HBASE-21259.branch-2.1.002.patch, HBASE-21259.branch-2.1.003.patch, 
> HBASE-21259.branch-2.1.004.patch
>
>
> On startup, I see servers being revived; i.e. their serverstatenode is 
> getting marked online even though its just been processed by 
> ServerCrashProcedure. It looks like this (in a patched server that reports on 
> whenever a serverstatenode is created):
> {code}
> 2018-09-29 03:45:40,963 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=3982597, 
> state=SUCCESS; ServerCrashProcedure 
> server=vb1442.halxg.cloudera.com,22101,1536675314426, splitWal=true, 
> meta=false in 1.0130sec
> ...
> 2018-09-29 03:45:43,733 INFO 
> org.apache.hadoop.hbase.master.assignment.RegionStates: CREATING! 
> vb1442.halxg.cloudera.com,22101,1536675314426
> java.lang.RuntimeException: WHERE AM I?
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionStates.getOrCreateServer(RegionStates.java:1116)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionStates.addRegionToServer(RegionStates.java:1143)
>         at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1464)
>         at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:200)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:369)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97)
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:953)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1716)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1494)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2022)
> {code}
> See how we've just finished a SCP which will have removed the 
> serverstatenode... but then we come across an unassign that references the 
> server that was just processed. The unassign will attempt to update the 
> serverstatenode and therein we create one if one not present. We shouldn't be 
> creating one.
> I think I see this a lot because I am scheduling unassigns with hbck2. The 
> servers crash and then come up with SCPs doing cleanup of old server and 
> unassign procedures in the procedure executor queue to be processed still.... 
>  but could happen at any time on cluster should an unassign happen get 
> scheduled near an SCP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21259) [amv2] Revived deadservers; recreated serverstatenode

Reply via email to