[
https://issues.apache.org/jira/browse/HBASE-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jerry He updated HBASE-13317:
-----------------------------
Attachment: HBASE-13317-master-v5.patch
HBASE-13317-0.98-v5.patch
Attached 0.98-v5, which only has nit changes from 0.98-v4.
Attached master-v5. master branch has region sever on master.
defaultMinToStart = 2. So Assert number of rs is changed in the test case.
> Region server reportForDuty stuck looping if there is a master change
> ---------------------------------------------------------------------
>
> Key: HBASE-13317
> URL: https://issues.apache.org/jira/browse/HBASE-13317
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 1.0.0, 2.0.0, 0.98.12
> Reporter: Jerry He
> Assignee: Jerry He
> Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13
>
> Attachments: HBASE-13317-0.98-v2.patch, HBASE-13317-0.98-v3.patch,
> HBASE-13317-0.98-v4.patch, HBASE-13317-0.98-v5.patch, HBASE-13317-0.98.patch,
> HBASE-13317-master-v5.patch
>
>
> During cluster startup, region server reportForDuty gets stuck looping if
> there is a master change.
> {noformat}
> 2015-03-22 11:15:16,186 INFO [regionserver60020] regionserver.HRegionServer:
> reportForDuty to master=bigaperf274,60000,1427045883965 with port=60020,
> startcode=1427048115174
> 2015-03-22 11:15:16,272 WARN [regionserver60020] regionserver.HRegionServer:
> error telling master we are up
> com.google.protobuf.ServiceException: java.net.ConnectException: Connection
> refused
> at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
> at
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8277)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2137)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:896)
> at java.lang.Thread.run(Thread.java:745)
> 2015-03-22 11:15:16,274 WARN [regionserver60020] regionserver.HRegionServer:
> reportForDuty failed; sleeping and then retrying.
> 2015-03-22 11:15:19,274 INFO [regionserver60020] regionserver.HRegionServer:
> reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020,
> startcode=1427048115174
> 2015-03-22 11:15:19,275 WARN [regionserver60020] regionserver.HRegionServer:
> error telling master we are up
> com.google.protobuf.ServiceException: java.net.ConnectException: Connection
> refused
> at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
> at
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8277)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2137)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:896)
> at java.lang.Thread.run(Thread.java:745)
> 2015-03-22 11:15:19,276 WARN [regionserver60020] regionserver.HRegionServer:
> reportForDuty failed; sleeping and then retrying.
> 2015-03-22 11:15:22,276 INFO [regionserver60020] regionserver.HRegionServer:
> reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020,
> startcode=1427048115174
> 2015-03-22 11:15:22,296 DEBUG [regionserver60020] regionserver.HRegionServer:
> Master is not running yet
> 2015-03-22 11:15:22,296 WARN [regionserver60020] regionserver.HRegionServer:
> reportForDuty failed; sleeping and then retrying.
> 2015-03-22 11:15:25,296 INFO [regionserver60020] regionserver.HRegionServer:
> reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020,
> startcode=1427048115174
> 2015-03-22 11:15:25,299 DEBUG [regionserver60020] regionserver.HRegionServer:
> Master is not running yet
> 2015-03-22 11:15:25,299 WARN [regionserver60020] regionserver.HRegionServer:
> reportForDuty failed; sleeping and then retrying.
> 2015-03-22 11:15:28,299 INFO [regionserver60020] regionserver.HRegionServer:
> reportForDuty to master=bigaperf273,60000,1427048108439 with port=60020,
> startcode=1427048115174
> 2015-03-22 11:15:28,302 DEBUG [regionserver60020] regionserver.HRegionServer:
> Master is not running yet
> 2015-03-22 11:15:28,302 WARN [regionserver60020] regionserver.HRegionServer:
> reportForDuty failed; sleeping and then retrying.
> {noformat}
> What happended is the region server first got
> master=bigaperf274,60000,1427045883965. Before it was able to report
> successfully, the maser changed to bigaperf273,60000,1427048108439.
> We were supposed to open a new connection to the new master. But we never
> did, looping and trying to old address forever.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)