[ 
https://issues.apache.org/jira/browse/HBASE-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098642#comment-13098642
 ] 

stack commented on HBASE-4341:
------------------------------

The above analysis makes sense to me.  You have a patch Jieshan?

> HRS#closeAllRegions should take care of HRS#onlineRegions's weak consistency
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-4341
>                 URL: https://issues.apache.org/jira/browse/HBASE-4341
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.5
>
>
> This's the reason of why did "https://builds.apache.org/job/hbase-0.90/282"; 
> get failure . In this test, one case was timeout and cause the whole test 
> process got killed.
> [logs]
> Here's the related logs(From 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce-output.txt):
> {noformat}
> 2011-08-31 10:09:01,089 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker] 
> regionserver.Leases(124): 
> RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closing 
> leases
> 2011-08-31 10:09:01,089 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker] 
> regionserver.Leases(131): 
> RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closed leases
> 2011-08-31 10:09:01,403 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(709): Waiting on 1 regions to close
> 2011-08-31 10:09:01,403 DEBUG 
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(713): 
> {74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.}
> 2011-08-31 10:09:01,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:02,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:03,008 INFO  [vesta.apache.org:50036.timeoutMonitor] 
> hbase.Chore(79): vesta.apache.org:50036.timeoutMonitor exiting
> 2011-08-31 10:09:03,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:04,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:05,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:06,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> 2011-08-31 10:09:07,698 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> {noformat}
> [Analysis]
> One region was opened during the RS's stopping. 
> This is method of "HRS#closeAllRegions":
> {noformat}
>   protected void closeAllRegions(final boolean abort) {
>     closeUserRegions(abort);
>     -------------------------
>     if (meta != null) closeRegion(meta.getRegionInfo(), abort, false);
>     if (root != null) closeRegion(root.getRegionInfo(), abort, false);
>   }
> {noformat}
> HRS#onlineRegions is a ConcurrentHashMap. So walk down this map may not get 
> all the data if some entries are been added during the traverse. Once one 
> region was missed, it can't be closed anymore. And this regionserver will not 
> be stopped normally. Then the following logs occurred:
> {noformat}
> 2011-08-31 10:09:01,403 INFO  
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(709): Waiting on 1 regions to close
> 2011-08-31 10:09:01,403 DEBUG 
> [RegionServer:0;vesta.apache.org,52257,1314785332968] 
> regionserver.HRegionServer(713): 
> {74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.}
> 2011-08-31 10:09:01,697 INFO  [Master:0;vesta.apache.org:50036] 
> master.ServerManager(465): Waiting on regionserver(s) to go down 
> vesta.apache.org,52257,1314785332968
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to