[ 
https://issues.apache.org/jira/browse/HBASE-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6375:
----------------------------------

    Attachment: HBASE-6375_94.patch

Patch for 0.94 branch
                
> Master may be using a stale list of region servers for creating assignment 
> plan during startup
> ----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6375
>                 URL: https://issues.apache.org/jira/browse/HBASE-6375
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.92.1, 0.94.0, 0.96.0
>         Environment: All
>            Reporter: Aditya Kishore
>            Assignee: Aditya Kishore
>             Fix For: 0.96.0
>
>         Attachments: HBASE-6375_94.patch, HBASE-6375_trunk.patch
>
>
> While investigating an Out of Memory issue, I had an interesting observation 
> where the master tries to assign all regions to a single region server even 
> though 7 other had already registered with it.
> As the cluster had MSLAB enabled, this resulted in OOM on the RS when it 
> tired to open all of them.
> *From master's log (edited for brevity):*
> {quote}
> 55,468 Waiting on regionserver(s) to checkin
> 56,968 Waiting on regionserver(s) to checkin
> 58,468 Waiting on regionserver(s) to checkin
> 59,968 Waiting on regionserver(s) to checkin
> 01,242 Registering server=srv109.datacenter,60020,1338673920529,regionCount=0,userLoad=false
> 01,469 Waiting on regionserver(s) count to settle; currently=1
> 02,969 Finished waiting for regionserver count to settle; count=1,sleptFor=46500
> 02,969 Exiting wait on regionserver(s) to checkin; count=1, stopped=false,count of regions out on cluster=0
> 03,010 Processing region \-ROOT\-,,0.70236052 in state M_ZK_REGION_OFFLINE
> 03,220 \-ROOT\- assigned=0, rit=true, location=srv109.datacenter:60020
> 03,221 Processing region .META.,,1.1028785192 in state M_ZK_REGION_OFFLINE
> 03,336 Detected completed assignment of META, notifying catalog tracker
> 03,350 .META. assigned=0, rit=true, location=srv109.datacenter:60020
> 03,350 Master startup proceeding: cluster startup
> 04,006 Registering server=srv111.datacenter,60020,1338673923399,regionCount=0,userLoad=false
> 04,012 Registering server=srv113.datacenter,60020,1338673923532,regionCount=0,userLoad=false
> 04,269 Registering server=srv115.datacenter,60020,1338673923471,regionCount=0,userLoad=false
> 04,363 Registering server=srv117.datacenter,60020,1338673923928,regionCount=0,userLoad=false
> 04,599 Registering server=srv127.datacenter,60020,1338673924067,regionCount=0,userLoad=false
> 04,606 Registering server=srv119.datacenter,60020,1338673923953,regionCount=0,userLoad=false
> 04,804 Registering server=srv129.datacenter,60020,1338673924339,regionCount=0,userLoad=false
> 05,126 Bulk assigning 1252 region(s) across 1 server(s), retainAssignment=true
> 05,546 hd109.datacenter,60020,1338673920529 unassigned znodes=207 of
> {quote}
> *A peek at AssignmentManager code offer some explanation:*
> {code}
>   public void assignAllUserRegions() throws IOException, InterruptedException 
> {
>     // Get all available servers
>     List<HServerInfo> servers = serverManager.getOnlineServersList();
>     // Scan META for all user regions, skipping any disabled tables
>     Map<HRegionInfo,HServerAddress> allRegions =
>       MetaReader.fullScan(catalogTracker, this.zkTable.getDisabledTables(), 
> true);
>     if (allRegions == null || allRegions.isEmpty()) return;
>     // Determine what type of assignment to do on startup
>     boolean retainAssignment = master.getConfiguration().
>       getBoolean("hbase.master.startup.retainassign", true);
>     Map<HServerInfo, List<HRegionInfo>> bulkPlan = null;
>     if (retainAssignment) {
>       // Reuse existing assignment info
>       bulkPlan = LoadBalancer.retainAssignment(allRegions, servers);
>     } else {
>       // assign regions in round-robin fashion
>       bulkPlan = LoadBalancer.roundRobinAssignment(new 
> ArrayList<HRegionInfo>(allRegions.keySet()), servers);
>     }
>     LOG.info("Bulk assigning " + allRegions.size() + " region(s) across " +
>       servers.size() + " server(s), retainAssignment=" + retainAssignment);
>     ...
> {code}
> In the function assignAllUserRegions(), listed above, AM fetches the server 
> list from ServerManager long before it actually use it to create assignment 
> plan.
> In between these, it performs a full scan of META to create an assignment map 
> of regions. So even if additional RSes have registered in the meantime (as 
> happened in this case), AM still has the old list of just one server.
> This code snippet is from 0.90.6 but the same issue exists in 0.92, 0.94 and 
> trunk. Since MSLAB is enabled by default in 0.92 onwards, any large cluster 
> can hit this issue upon cluster start-up when the following sequence holds 
> true.
> # Master start long before the RSes (by default this long ~= 4.5 seconds)
> # All the RSes start togather but one wins the race of registering with 
> Master by few seconds.
> I am attaching a patch for the trunk which moves the code which fetches the 
> RS list form the beginning of the function to where it is first use.
> Apart from this change, one other HBase setting that now becomes important is 
> "hbase.master.wait.on.regionservers.mintostart" due to MSLAB being enabled by 
> default.
> In large clusters which keeps it enabled now must modify 
> "hbase.master.wait.on.regionservers.mintostart" to a suitable number than the 
> default of 1 to ensure that the master waits for a quorum of RSes which are 
> sufficient to open all the regions among themselves. I'll create a separate 
> JIRA for the documentation change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to