[ 
https://issues.apache.org/jira/browse/HBASE-21017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592508#comment-16592508
 ] 

Duo Zhang commented on HBASE-21017:
-----------------------------------

OK I think I find the race here
{noformat}
2018-08-24 12:03:19,255 INFO  [RS-EventLoopGroup-8-9] 
ipc.ServerRpcConnection(556): Connection from 67.195.81.136:48580, 
version=3.0.0-SNAPSHOT, sasl=false, ugi=jenkins (auth:SIMPLE), 
service=ClientService
2018-08-24 12:03:19,297 INFO  
[RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=45229] 
master.MasterRpcServices(579): Client=jenkins//67.195.81.136 assign 
testEnableTableWithNoRegionServers,,1535112163487.37ec5bc06522d2e4e51a73fb48d03962.
2018-08-24 12:03:19,521 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=45229] 
procedure2.ProcedureExecutor(1004): Stored pid=29, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false; 
TransitRegionStateProcedure table=testEnableTableWithNoRegionServers, 
region=37ec5bc06522d2e4e51a73fb48d03962, ASSIGN
2018-08-24 12:03:19,521 INFO  [PEWorker-8] 
procedure.MasterProcedureScheduler(689): pid=29, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false; 
TransitRegionStateProcedure table=testEnableTableWithNoRegionServers, 
region=37ec5bc06522d2e4e51a73fb48d03962, ASSIGN checking lock on 
37ec5bc06522d2e4e51a73fb48d03962
2018-08-24 12:03:19,572 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=45229] 
procedure.ProcedureSyncWait(188): waitFor pid=29
2018-08-24 12:03:19,655 INFO  [PEWorker-8] 
assignment.TransitRegionStateProcedure(155): Setting lastHost as the region 
location asf916.gq1.ygridcore.net,34328,1535112151646
2018-08-24 12:03:19,655 INFO  [PEWorker-8] 
assignment.TransitRegionStateProcedure(159): Starting pid=29, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; 
TransitRegionStateProcedure table=testEnableTableWithNoRegionServers, 
region=37ec5bc06522d2e4e51a73fb48d03962, ASSIGN; rit=OPEN, 
location=asf916.gq1.ygridcore.net,34328,1535112151646; forceNewPlan=false, 
retain=true
2018-08-24 12:03:19,826 INFO  [PEWorker-9] assignment.RegionStateStore(199): 
pid=29 updating hbase:meta row=37ec5bc06522d2e4e51a73fb48d03962, 
regionState=OPENING, regionLocation=asf916.gq1.ygridcore.net,39326,1535112180502
java.lang.Exception
        at 
org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateUserRegionLocation(RegionStateStore.java:199)
        at 
org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:138)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.transitStateAndUpdate(AssignmentManager.java:1423)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.regionOpening(AssignmentManager.java:1435)
        at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.openRegion(TransitRegionStateProcedure.java:176)
        at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.executeFromState(TransitRegionStateProcedure.java:311)
        at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.executeFromState(TransitRegionStateProcedure.java:96)
        at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:189)
        at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.execute(TransitRegionStateProcedure.java:283)
        at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.execute(TransitRegionStateProcedure.java:96)
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:873)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1577)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1365)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:77)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1877)
2018-08-24 12:03:19,832 INFO  [PEWorker-9] procedure2.ProcedureExecutor(1612): 
Initialized subprocedures=[{pid=30, ppid=29, state=RUNNABLE, hasLock=false; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}]
2018-08-24 12:03:19,932 INFO  
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=45229] 
assignment.RegionStateStore(199): pid=29 updating hbase:meta 
row=37ec5bc06522d2e4e51a73fb48d03962, regionState=OPEN, openSeqNum=8, 
regionLocation=asf916.gq1.ygridcore.net,39326,1535112180502
java.lang.Exception
        at 
org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateUserRegionLocation(RegionStateStore.java:199)
        at 
org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:138)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.transitStateAndUpdate(AssignmentManager.java:1423)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.regionOpened(AssignmentManager.java:1471)
        at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.reportTransitionOpened(TransitRegionStateProcedure.java:361)
        at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.reportTransition(TransitRegionStateProcedure.java:402)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportTransition(AssignmentManager.java:899)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1060)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:998)
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:483)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:15170)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
2018-08-24 12:03:20,145 WARN  
[RpcServer.priority.FPBQ.Fifo.handler=4,queue=0,port=39326] 
regionserver.RSRpcServices(2006): Received OPEN for the 
region:testEnableTableWithNoRegionServers,,1535112163487.37ec5bc06522d2e4e51a73fb48d03962.,
 which is already online
{noformat}

1. We update the state to OPENING and schedule the OpenRegionProcedure
2. There is a regionServerReport and we change the state to OPEN
3. OpenRegionProcedure is scheduled and send request to RS
4. RS ignored the request.
5. Since the state has been changed to OPEN, we will not wake the event any 
more.

> Revisit the expected states for open/close
> ------------------------------------------
>
>                 Key: HBASE-21017
>                 URL: https://issues.apache.org/jira/browse/HBASE-21017
>             Project: HBase
>          Issue Type: Sub-task
>          Components: amv2
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Major
>             Fix For: 3.0.0, 2.2.0
>
>         Attachments: HBASE-21017-debug.patch, HBASE-21017.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to