[ https://issues.apache.org/jira/browse/HBASE-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jieshan Bean updated HBASE-4341: -------------------------------- Attachment: HBASE-4341-Branch.patch > HRS#closeAllRegions should take care of HRS#onlineRegions's weak consistency > ---------------------------------------------------------------------------- > > Key: HBASE-4341 > URL: https://issues.apache.org/jira/browse/HBASE-4341 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.90.4 > Reporter: Jieshan Bean > Assignee: Jieshan Bean > Fix For: 0.90.5 > > Attachments: HBASE-4341-Branch.patch > > > This's the reason of why did "https://builds.apache.org/job/hbase-0.90/282" > get failure . In this test, one case was timeout and cause the whole test > process got killed. > [logs] > Here's the related logs(From > org.apache.hadoop.hbase.mapreduce.TestTableMapReduce-output.txt): > {noformat} > 2011-08-31 10:09:01,089 INFO > [RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker] > regionserver.Leases(124): > RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closing > leases > 2011-08-31 10:09:01,089 INFO > [RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker] > regionserver.Leases(131): > RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closed leases > 2011-08-31 10:09:01,403 INFO > [RegionServer:0;vesta.apache.org,52257,1314785332968] > regionserver.HRegionServer(709): Waiting on 1 regions to close > 2011-08-31 10:09:01,403 DEBUG > [RegionServer:0;vesta.apache.org,52257,1314785332968] > regionserver.HRegionServer(713): > {74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.} > 2011-08-31 10:09:01,697 INFO [Master:0;vesta.apache.org:50036] > master.ServerManager(465): Waiting on regionserver(s) to go down > vesta.apache.org,52257,1314785332968 > 2011-08-31 10:09:02,697 INFO [Master:0;vesta.apache.org:50036] > master.ServerManager(465): Waiting on regionserver(s) to go down > vesta.apache.org,52257,1314785332968 > 2011-08-31 10:09:03,008 INFO [vesta.apache.org:50036.timeoutMonitor] > hbase.Chore(79): vesta.apache.org:50036.timeoutMonitor exiting > 2011-08-31 10:09:03,697 INFO [Master:0;vesta.apache.org:50036] > master.ServerManager(465): Waiting on regionserver(s) to go down > vesta.apache.org,52257,1314785332968 > 2011-08-31 10:09:04,697 INFO [Master:0;vesta.apache.org:50036] > master.ServerManager(465): Waiting on regionserver(s) to go down > vesta.apache.org,52257,1314785332968 > 2011-08-31 10:09:05,698 INFO [Master:0;vesta.apache.org:50036] > master.ServerManager(465): Waiting on regionserver(s) to go down > vesta.apache.org,52257,1314785332968 > 2011-08-31 10:09:06,698 INFO [Master:0;vesta.apache.org:50036] > master.ServerManager(465): Waiting on regionserver(s) to go down > vesta.apache.org,52257,1314785332968 > 2011-08-31 10:09:07,698 INFO [Master:0;vesta.apache.org:50036] > master.ServerManager(465): Waiting on regionserver(s) to go down > vesta.apache.org,52257,1314785332968 > {noformat} > [Analysis] > One region was opened during the RS's stopping. > This is method of "HRS#closeAllRegions": > {noformat} > protected void closeAllRegions(final boolean abort) { > closeUserRegions(abort); > ------------------------- > if (meta != null) closeRegion(meta.getRegionInfo(), abort, false); > if (root != null) closeRegion(root.getRegionInfo(), abort, false); > } > {noformat} > HRS#onlineRegions is a ConcurrentHashMap. So walk down this map may not get > all the data if some entries are been added during the traverse. Once one > region was missed, it can't be closed anymore. And this regionserver will not > be stopped normally. Then the following logs occurred: > {noformat} > 2011-08-31 10:09:01,403 INFO > [RegionServer:0;vesta.apache.org,52257,1314785332968] > regionserver.HRegionServer(709): Waiting on 1 regions to close > 2011-08-31 10:09:01,403 DEBUG > [RegionServer:0;vesta.apache.org,52257,1314785332968] > regionserver.HRegionServer(713): > {74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.} > 2011-08-31 10:09:01,697 INFO [Master:0;vesta.apache.org:50036] > master.ServerManager(465): Waiting on regionserver(s) to go down > vesta.apache.org,52257,1314785332968 > {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira