[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233520#comment-17233520 ] Truong Duc Kien commented on HBASE-20552: - Possible related issue (already fixed) https://issues.apache.org/jira/browse/HBASE-21421 > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Assignee: Umesh Agashe >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but sta
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155669#comment-17155669 ] Josh Elser commented on HBASE-20552: [~shenshengli], what version are you on, by chance? I haven't seen this recently. > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Assignee: Umesh Agashe >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147512#comment-17147512 ] shenshengli commented on HBASE-20552: - I reproduced the problem in my own environment, with more than 10,000 regions on each RS.By adjusting the parameters of ’hbase.regionserver.msginterval‘, it from 3 s and 30 s, greatly reduces the risk of the problem.Conversely, if you go from 3s down to below 1s, this is almost certainly going to happen. > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Assignee: Umesh Agashe >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482748#comment-16482748 ] Josh Elser commented on HBASE-20552: Just to give you all an update, Ted and I have both made some internal changes to HBase to try to get some more insight around this if it happens again. I ran through a dozen or so test scenarios end of last week, none of which showed this again. I'm apt to close this one as CannotRepro for now. Can try to bring these extra debugging stuff out to Apache if y'all think it would be beneficial. > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Assignee: Umesh Agashe >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN,
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480777#comment-16480777 ] Josh Elser commented on HBASE-20552: Ok! Thanks for the info, Umesh. Glad we both worked towards the same conclusion. I feel a little bit better knowing that at least we think we did the right thing in HBase. [~stack], my understanding is that HDFS is 3.1.0ish. I'm not sure, but your thinking does seem reasonable. I have a system up trying to get a live environment (so I can poke the pv2 WAL), but that's also been unsuccessful for me. > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Assignee: Umesh Agashe >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > ri
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479955#comment-16479955 ] stack commented on HBASE-20552: --- Long shot: Something up w/ your HDFS over there [~elserj] and crew where lease recovery is dropping the end of the WAL? Doing some bad math on file length? It does it for two different logs here... The Master WAL Proc and the regionserver hosting hbase:meta's WAL. Some interesting version of HDFS? Thanks. > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Assignee: Umesh Agashe >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479831#comment-16479831 ] Umesh Agashe commented on HBASE-20552: -- [~elserj], I don't have a repro. I thought I had a repro but it was due to the bug which was inadvertently introduced in recent commit and got fixed in addendum (HBASE-20564). So far I found 2 instances of missing edits around the same time. First, in master proc wal where 003 is not able to read pids 468 onwards. And second, in meta region: pid=475 on 005 started with: {code:java} 2018-05-02 05:39:45,811 INFO [PEWorker-6] assignment.AssignProcedure: Starting pid=475, ppid=471, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=test_hbase_ha_load_test_tool_hbase, region=94f6ca283dbb4445b2bcdc321b734d28; rit=OFFLINE, location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502; forceNewPlan=false, retain=true {code} After this it was updated twice on 005: {code:java} 2018-05-02 05:39:45,983 INFO [PEWorker-1] assignment.RegionStateStore: pid=475 updating hbase:meta row=94f6ca283dbb4445b2bcdc321b734d28, regionState=OPENING 2018-05-02 05:39:46,580 INFO [PEWorker-1] assignment.RegionStateStore: pid=475 updating hbase:meta row=94f6ca283dbb4445b2bcdc321b734d28, regionState=OPEN, openSeqNum=13401, regionLocation=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 {code} But when 003 read and printed meta, it has: {code:java} 2018-05-02 05:44:08,236 INFO [master/ctr-e138-1518143905142-279227-01-03:2] assignment.RegionStateStore: Load hbase:meta entry region=94f6ca283dbb4445b2bcdc321b734d28, regionState=OPEN, lastHost=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474, regionLocation=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502 {code} The location server including timestamp matches to when pid=471 started "location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502". So 2 updates from pid=471 to meta are missing. > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Assignee: Umesh Agashe >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has other
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479767#comment-16479767 ] Josh Elser commented on HBASE-20552: Trying to help pick this one up too.. {quote}bq. On M005, pid=471 is SCP for R007 which also hosts meta. Meta is re-assigned with pid=472 to R002 which is followed by other region assignments {quote} I'm coming to think this is our problem, too. pid=471 is an SCP for r007 from pv2-004.log which finished at 05:39:47,288 on m005. When m003 takes over and reads the tracker from pv2-002.log, the largest pid we have is pid=467. My hunch (which I need to back up with code) is that because m003 never sees the completed SCP, it thinks that r002 is holding this region (overriding what meta say, maybe?), claiming it to be on r007 instead. The following is the "largest" proc from the pv2-004 log that m003 reads. {noformat} 2018-05-02 05:43:33,876 DEBUG [master/ctr-e138-1518143905142-279227-01-03:2] procedure2.ProcedureExecutor: Completed pid=465, state=SUCCESS; MoveRegionProcedure hri=94f6ca283dbb4445b2bcdc321b734d28, source=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474, destination=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502{noformat} Then, m003 initializes RegionStateStore, saying: {noformat} 2018-05-02 05:44:08,236 INFO [master/ctr-e138-1518143905142-279227-01-03:2] assignment.RegionStateStore: Load hbase:meta entry region=94f6ca283dbb4445b2bcdc321b734d28, regionState=OPEN, lastHost=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474, regionLocation=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502 {noformat} This makes me wonder if Umesh's findings about pid=507 (SCP for r007 putting the region back on r007) are related... You get anywhere on a repro, [~uagashe]? I have some nodes running through this internal scenario which has triggered this before. Might try my hand at repro'ing in an IT, but unsure how hard that will be ;) > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Assignee: Umesh Agashe >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherw
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478047#comment-16478047 ] Ted Yu commented on HBASE-20552: The tests were run using branch-2.0 code > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Assignee: Umesh Agashe >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignme
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478044#comment-16478044 ] Umesh Agashe commented on HBASE-20552: -- [~yuzhih...@gmail.com], Just want to confirm that you saw this on branch-2.0 or master? > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Assignee: Umesh Agashe >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473180#comment-16473180 ] Ted Yu commented on HBASE-20552: w.r.t. the warning from ProcWal, I saw the following in a successful run (another test run): {code} 2018-05-09 01:39:58,463 INFO [master/ctr-e138-1518143905142-296213-01-03:2] wal.ProcedureWALFormatReader: Rebuilding tracker for hdfs://mycluster/apps/hbase/data/MasterProcWALs/pv2-0001.log 2018-05-09 01:39:58,550 WARN [master/ctr-e138-1518143905142-296213-01-03:2] wal.ProcedureWALFormatReader: Nothing left to decode. Exiting with missing EOF, log=hdfs://mycluster/apps/hbase/data/MasterProcWALs/pv2-0001.log 2018-05-09 01:39:58,659 DEBUG [master/ctr-e138-1518143905142-296213-01-03:2] procedure2.ProcedureExecutor: Completed pid=40, state=SUCCESS; ServerCrashProcedure server=ctr-e138-1518143905142-296213-01-03.hwx.site,16020,1525829363193, splitWal=true, meta=false {code} I am not sure if the 'Nothing left to decode' was related to the cause of this issue (unexpected state). > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Assignee: Umesh Agashe >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at >
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472885#comment-16472885 ] Umesh Agashe commented on HBASE-20552: -- I think its real problem in the code. Working on repro and the patch. > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Assignee: Umesh Agashe >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472867#comment-16472867 ] stack commented on HBASE-20552: --- Go [~uagashe]! Usually we'll complain if we fail to read procedures from Master WAL. Any evidence of us skipping Procedure steps? (Seems like a good one!!) > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Assignee: Umesh Agashe >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-0
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472482#comment-16472482 ] Umesh Agashe commented on HBASE-20552: -- Further, M003 starts SCP with pid=507 for R007: {code:java} 2018-05-02 05:44:08,413 INFO [PEWorker-6] procedure.ServerCrashProcedure: Start pid=507, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure server=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502, splitWal=true, meta=false{code} This starts AssignProcedure with pid=508 for region 94f6ca283dbb4445b2bcdc321b734d28: {code:java} 2018-05-02 05:44:08,480 INFO [PEWorker-6] assignment.AssignProcedure: Starting pid=508, ppid=507, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=test_hbase_ha_load_test_tool_hbase, region=94f6ca283dbb4445b2bcdc321b734d28; rit=OFFLINE, location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502; forceNewPlan=false, retain=true 2018-05-02 05:44:08,659 INFO [PEWorker-11] assignment.RegionStateStore: pid=508 updating hbase:meta row=94f6ca283dbb4445b2bcdc321b734d28, regionState=OPENING, regionLocation=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353 2018-05-02 05:44:08,727 INFO [PEWorker-11] assignment.RegionTransitionProcedure: Dispatch pid=508, ppid=507, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=test_hbase_ha_load_test_tool_hbase, region=94f6ca283dbb4445b2bcdc321b734d28; rit=OPENING, location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353 ... 2018-05-02 05:44:09,213 DEBUG [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] assignment.RegionTransitionProcedure: Received report OPENED seqId=13402, pid=508, ppid=507, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=test_hbase_ha_load_test_tool_hbase, region=94f6ca283dbb4445b2bcdc321b734d28; rit=OPENING, location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353 2018-05-02 05:44:09,213 DEBUG [PEWorker-12] assignment.RegionTransitionProcedure: Finishing pid=508, ppid=507, state=RUNNABLE:REGION_TRANSITION_FINISH; AssignProcedure table=test_hbase_ha_load_test_tool_hbase, region=94f6ca283dbb4445b2bcdc321b734d28; rit=OPENING, location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353 2018-05-02 05:44:09,214 INFO [PEWorker-12] assignment.RegionStateStore: pid=508 updating hbase:meta row=94f6ca283dbb4445b2bcdc321b734d28, regionState=OPEN, openSeqNum=13402, regionLocation=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353 2018-05-02 05:44:09,258 INFO [PEWorker-12] procedure2.ProcedureExecutor: Finished subprocedure(s) of pid=507, state=RUNNABLE:SERVER_CRASH_HANDLE_RIT2; ServerCrashProcedure server=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502, splitWal=true, meta=false; resume parent processing. 2018-05-02 05:44:09,258 INFO [PEWorker-12] procedure2.ProcedureExecutor: Finished pid=508, ppid=507, state=SUCCESS; AssignProcedure table=test_hbase_ha_load_test_tool_hbase, region=94f6ca283dbb4445b2bcdc321b734d28 in 764msec 2018-05-02 05:44:09,273 INFO [PEWorker-14] procedure2.ProcedureExecutor: Finished pid=507, state=SUCCESS; ServerCrashProcedure server=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502, splitWal=true, meta=false in 975msec{code} Strange thing is SCP for R007 is assigning region back to R007! > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Assignee: Umesh Agashe >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-07.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-08.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > r
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472456#comment-16472456 ] Umesh Agashe commented on HBASE-20552: -- bq. Log for server 0002 was attached already. Thanks! and also for 007? > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Assignee: Umesh Agashe >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > at sun.reflect.N
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472438#comment-16472438 ] Umesh Agashe commented on HBASE-20552: -- bq. Was there any region on 0008 you're interested in ? 670f6b815d2acac905130e5440d59304 1d954f21d711345a9587d995cecea136 91f73e76bbe7bc8a61b1b1299d34c6ab > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Assignee: Umesh Agashe >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkO
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472422#comment-16472422 ] Ted Yu commented on HBASE-20552: Log for server 0002 was attached already. Was there any region on 0008 you're interested in ? > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Assignee: Umesh Agashe >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > at s
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472409#comment-16472409 ] Umesh Agashe commented on HBASE-20552: -- Usually following warnings can be ignored. But these messages followed by "Completed pid=" looks trouble. When M003 became active at around 2018-05-02 05:43:33, there are a few warnings while reading master proc wal: {code:java} 2018-05-02 05:43:33,529 WARN [master/ctr-e138-1518143905142-279227-01-03:2] wal.WALProcedureStore: Unable to read tracker for hdfs://mycluster/apps/hbase/data/MasterProcWALs/pv2-0004.log - Invalid Trailer version. got 8 expected 1 2018-05-02 05:43:33,638 DEBUG [master/ctr-e138-1518143905142-279227-01-03:2] wal.WALProcedureStore: Roll new state log: 5 2018-05-02 05:43:33,655 INFO [master/ctr-e138-1518143905142-279227-01-03:2] procedure2.ProcedureExecutor: Recovered WALProcedureStore lease in 219msec 2018-05-02 05:43:33,681 INFO [master/ctr-e138-1518143905142-279227-01-03:2] wal.ProcedureWALFormatReader: Rebuilding tracker for hdfs://mycluster/apps/hbase/data/MasterProcWALs/pv2-0004.log 2018-05-02 05:43:33,816 WARN [master/ctr-e138-1518143905142-279227-01-03:2] wal.ProcedureWALFormatReader: Nothing left to decode. Exiting with missing EOF, log=hdfs://mycluster/apps/hbase/data/MasterProcWALs/pv2-0004.log 2018-05-02 05:43:33,875 DEBUG [master/ctr-e138-1518143905142-279227-01-03:2] procedure2.ProcedureExecutor: Completed pid=467, state=SUCCESS; MoveRegionProcedure hri=4c37ee7a4e1210e481debdc2933fc4d2, source=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474, destination=ctr-e138-1518143905142-279227-01-03.hwx.site,16020,15252394258262018-05-02 05:43:33,876 DEBUG [master/ctr-e138-1518143905142-279227-01-03:2] procedure2.ProcedureExecutor: Completed pid=465, state=SUCCESS; MoveRegionProcedure hri=94f6ca283dbb4445b2bcdc321b734d28, source=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474, destination=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502 2018-05-02 05:43:33,876 DEBUG [master/ctr-e138-1518143905142-279227-01-03:2] procedure2.ProcedureExecutor: Completed pid=462, state=SUCCESS; MoveRegionProcedure hri=a8ff96226d546f0ea151823ae73e5a1b, source=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474, destination=ctr-e138-1518143905142-279227-01-08.hwx.site,16020,1525238658606{code} M003 during startup has no log messages for procedures with ids 468 to 504 even though they are ran and completed on M005. This is unusual. RecoverMetaProcedure on M003 starts with id 505 which is correct. Orthogonal to above observation we have meta update issue as well. On M005, pid=471 is SCP for R007 which also hosts meta. Meta is re-assigned with pid=472 to R002 which is followed by other region assignments {code:java} pid=478 e75a388bc2011feed75bdc1a0e99a9a9 regionLocation=ctr-e138-1518143905142-279227-01-02.hwx.site pid=474 670f6b815d2acac905130e5440d59304 regionLocation=ctr-e138-1518143905142-279227-01-08.hwx.site pid=479 c963eb77dbdc6dbab886dbe4eebba5ad regionLocation=ctr-e138-1518143905142-279227-01-06.hwx.site pid=481 b5180eee96b616afdf79578309c66a11 regionLocation=ctr-e138-1518143905142-279227-01-02.hwx.site pid=486 8dc6fd2022c2fdf8c065fbd16cadaaca regionLocation=ctr-e138-1518143905142-279227-01-03.hwx.site pid=480 f3db9f9879ed03f488dcb89bea834237 regionLocation=ctr-e138-1518143905142-279227-01-02.hwx.site pid=484 c078deb2474e9c19b85b5fdb9efaa47d regionLocation=ctr-e138-1518143905142-279227-01-06.hwx.site pid=475 94f6ca283dbb4445b2bcdc321b734d28 regionLocation=ctr-e138-1518143905142-279227-01-02.hwx.site pid=483 1d954f21d711345a9587d995cecea136 regionLocation=ctr-e138-1518143905142-279227-01-08.hwx.site pid=476 1595f38ee901be7c67b997fe2fc95951 regionLocation=ctr-e138-1518143905142-279227-01-06.hwx.site pid=482 a6e0d7561c4f19e78f94d37462588281 regionLocation=ctr-e138-1518143905142-279227-01-06.hwx.site pid=485 91f73e76bbe7bc8a61b1b1299d34c6ab regionLocation=ctr-e138-1518143905142-279227-01-08.hwx.site pid=477 a0620fc83de532a37f6a9bb8f99cc6c4 regionLocation=ctr-e138-1518143905142-279227-01-03.hwx.site{code} >From the logs all the procedures finished successfully without skipping steps. >Meta doesn't seem to be updated for 4 of these assignments. When M003 logs all >regions from meta at startup, locations for following 4 regions don't match >with the target locations in above procedures: {code:java} 670f6b815d2acac905130e5440d59304 ctr-e138-1518143905142-279227-01-08.hwx.site lastHost=ctr-e138-1518143905142-279227-01-07.hwx.site regionLocation=ctr-e138-1518143905142-279227-01-07.hwx.site 94f6ca283dbb4445b2bcdc321b734
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469544#comment-16469544 ] Umesh Agashe commented on HBASE-20552: -- Thanks for attaching the logs. Need to go through logs to see if its similar to what we have seen so far... > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > at sun.reflect.N
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469380#comment-16469380 ] stack commented on HBASE-20552: --- Sometimes in a procedure we'll look at current state of things and determine that we can 'pass' on a step because it looks like all has been done already. We have to be careful when we do this. There is an outstanding grey array identified by [~uagashe] where we should be updating hbase:meta though it looks like we don't have too... A more fundamental problem was addressed but this may be a new case of it. Will look in logs > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=9
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469240#comment-16469240 ] Ted Yu commented on HBASE-20552: >From master-ctr-e138-1518143905142-279227-01-03.hwx.site.log : {code} 2018-05-02 05:44:08,236 INFO [master/ctr-e138-1518143905142-279227-01-03:2] assignment.RegionStateStore: Load hbase:meta entry region=94f6ca283dbb4445b2bcdc321b734d28, regionState=OPEN, lastHost=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474, regionLocation=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502 {code} It seems master 0005 might not have persisted the assignment to server 0002 in hbase:meta - the server shown above was 0007 So when server 0002 reported in w.r.t. region 94f6ca283dbb4445b2bcdc321b734d28, it was rejected. > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469217#comment-16469217 ] Ted Yu commented on HBASE-20552: bq. Did all recover after the RS ABORTED? The reported incident happened during nightly run. We didn't have a chance to fully evaluate the health of master 3 before the cluster was gone. >From what I can tell, pid 465 doesn't seem to be parent procedure. There was no pid=466 in log of 3. For pid=467, it was for a different region: {code} 2018-05-02 05:43:33,875 DEBUG [master/ctr-e138-1518143905142-279227-01-03:2] procedure2.ProcedureExecutor: Completed pid=467, state=SUCCESS; MoveRegionProcedure hri=4c37ee7a4e1210e481debdc2933fc4d2, source=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474, destination=ctr-e138-1518143905142-279227-01-03.hwx.site,16020, 1525239425826 {code} Master logs have been attached. > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecu
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469192#comment-16469192 ] stack commented on HBASE-20552: --- Repeat: "Did all recover after the RS ABORTED?" bq. Pid 475 was executed on master 05. I didn't find it mentioned in log of server 03. Right. It looks like a new assign that came of startup of new master after reading the content of hbase:meta. There should be two assigns for the same region now. The one that was a subprocedure of pid=465 (?466?467?) and the new one pid=475. The completion of 467? would make it so 465 could mark it self successful. High-level, there is not enough to go on here in the posted snippets. I'm just trying to teach you how to fish. Will not be able to answer what happened unless you post the full log from both masters. Thanks. > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Ca
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469176#comment-16469176 ] Ted Yu commented on HBASE-20552: bq. It seems to have succeeded in getting the region assigned (to 0002 it seems). The move procedure would assign to server 07. bq. This is pid=475 Pid 475 was executed on master 05. I didn't find it mentioned in log of server 03. bq. Now there are two assigns for the region Assignment of region to server 0002 was done by master 05 (M1). Assignment to 0007 was done by M2. bq. What happens to pid=475? It succeeds? >From master 05 log, we can see that both 465 and 475 succeeded: {code} 2018-05-02 05:38:59,773 INFO [PEWorker-9] procedure2.ProcedureExecutor: Finished pid=465, state=SUCCESS; MoveRegionProcedure hri=94f6ca283dbb4445b2bcdc321b734d28, source=ctr-e138- 1518143905142-279227-01-02.hwx.site,16020,1525239334474, destination=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525238558502 in 748msec ... 2018-05-02 05:39:46,700 INFO [PEWorker-1] procedure2.ProcedureExecutor: Finished pid=475, ppid=471, state=SUCCESS; AssignProcedure table=test_hbase_ha_load_test_tool_hbase, region=94f6ca283dbb4445b2bcdc321b734d28 in 976msec {code} In master 03, pid=465 was only mentioned once (shown in description). pid=475 didn't appear. bq. What is pid=507? It was crash processing: {code} 2018-05-02 05:44:08,409 DEBUG [master/ctr-e138-1518143905142-279227-01-03:2] procedure2.ProcedureExecutor: Stored pid=507, state=RUNNABLE:SERVER_CRASH_START; {code} > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Priority: Critical > Attachments: > 102143-master-ctr-e138-1518143905142-279227-01-03.hwx.site.log, > 102143-master-ctr-e138-1518143905142-279227-01-05.hwx.site.log, > 102143-regionserver-ctr-e138-1518143905142-279227-01-02.hwx.site.log > > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentMan
[jira] [Commented] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
[ https://issues.apache.org/jira/browse/HBASE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469113#comment-16469113 ] stack commented on HBASE-20552: --- Thanks for report. We throw UnexpectedStateException when we meet a condition we do not know how to handle. I expect there are a few of these lucking in AMv2. Did all recover after the RS ABORTED? The pid=465 on M2 is the original move done back up on M1 being replayed and thinking it is done. It seems to have succeeded in getting the region assigned (to 0002 it seems). A move is composed of an unassign followed by an assign. The unsassign seems to have completed (a sub-procedure of pid=465) but what happened to the assign that was a sub-procedure of pid=465? It looks like we create a new assign when processing the crashed server. This is pid=475. Now there are two assigns for the region. Only one should prevail (the second when it notices the other assign should give up... There is a lock on region during the running of the assign so only one can run at a time). What happens to pid=475? It succeeds? It gave up because it saw the subprocedure of pid=465 had succeeded, the assign? What is pid=507? Bulk assign? Or crash processing? We need to figure what state went without an update (an update of hbase:meta) or what procedure went to run presuming a state that wasn't true for some reason. Looks like a good one. Thanks. > HBase RegionServer was shutdown due to UnexpectedStateException > --- > > Key: HBASE-20552 > URL: https://issues.apache.org/jira/browse/HBASE-20552 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Romil Choksi >Priority: Critical > > This was observed during cluster testing (source code sync'ed with hbase-2.0, > built May 2nd): > {code} > 2018-05-02 05:44:10,089 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] > master.MasterRpcServices: Region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported > a fatal error: > * ABORTING region server > ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- > 1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- > 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has > otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: > rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, > table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) > ... 7 more > * > Cause: > org.apache.hadoop.hbase.YouAreDeadException: > org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, > location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, >table=test_hbase_ha_load_test_tool_hbase, > region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on > server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 > but state has otherwise. > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) > at > org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServer