[ https://issues.apache.org/jira/browse/HBASE-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413376#comment-13413376 ]
Lars Hofhansl commented on HBASE-6375: -------------------------------------- @Aditya: Did you create the issue you mentioned above? > Master may be using a stale list of region servers for creating assignment > plan during startup > ---------------------------------------------------------------------------------------------- > > Key: HBASE-6375 > URL: https://issues.apache.org/jira/browse/HBASE-6375 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.90.6, 0.92.1, 0.94.0, 0.96.0 > Environment: All > Reporter: Aditya Kishore > Assignee: Aditya Kishore > Fix For: 0.96.0, 0.94.1 > > Attachments: HBASE-6375_94.patch, HBASE-6375_trunk.patch > > > While investigating an Out of Memory issue, I had an interesting observation > where the master tries to assign all regions to a single region server even > though 7 other had already registered with it. > As the cluster had MSLAB enabled, this resulted in OOM on the RS when it > tired to open all of them. > *From master's log (edited for brevity):* > {quote} > 55,468 Waiting on regionserver(s) to checkin > 56,968 Waiting on regionserver(s) to checkin > 58,468 Waiting on regionserver(s) to checkin > 59,968 Waiting on regionserver(s) to checkin > 01,242 Registering server=srv109.datacenter,60020,1338673920529,regionCount=0,userLoad=false > 01,469 Waiting on regionserver(s) count to settle; currently=1 > 02,969 Finished waiting for regionserver count to settle; count=1,sleptFor=46500 > 02,969 Exiting wait on regionserver(s) to checkin; count=1, stopped=false,count of regions out on cluster=0 > 03,010 Processing region \-ROOT\-,,0.70236052 in state M_ZK_REGION_OFFLINE > 03,220 \-ROOT\- assigned=0, rit=true, location=srv109.datacenter:60020 > 03,221 Processing region .META.,,1.1028785192 in state M_ZK_REGION_OFFLINE > 03,336 Detected completed assignment of META, notifying catalog tracker > 03,350 .META. assigned=0, rit=true, location=srv109.datacenter:60020 > 03,350 Master startup proceeding: cluster startup > 04,006 Registering server=srv111.datacenter,60020,1338673923399,regionCount=0,userLoad=false > 04,012 Registering server=srv113.datacenter,60020,1338673923532,regionCount=0,userLoad=false > 04,269 Registering server=srv115.datacenter,60020,1338673923471,regionCount=0,userLoad=false > 04,363 Registering server=srv117.datacenter,60020,1338673923928,regionCount=0,userLoad=false > 04,599 Registering server=srv127.datacenter,60020,1338673924067,regionCount=0,userLoad=false > 04,606 Registering server=srv119.datacenter,60020,1338673923953,regionCount=0,userLoad=false > 04,804 Registering server=srv129.datacenter,60020,1338673924339,regionCount=0,userLoad=false > 05,126 Bulk assigning 1252 region(s) across 1 server(s), retainAssignment=true > 05,546 hd109.datacenter,60020,1338673920529 unassigned znodes=207 of > {quote} > *A peek at AssignmentManager code offer some explanation:* > {code} > public void assignAllUserRegions() throws IOException, InterruptedException > { > // Get all available servers > List<HServerInfo> servers = serverManager.getOnlineServersList(); > // Scan META for all user regions, skipping any disabled tables > Map<HRegionInfo,HServerAddress> allRegions = > MetaReader.fullScan(catalogTracker, this.zkTable.getDisabledTables(), > true); > if (allRegions == null || allRegions.isEmpty()) return; > // Determine what type of assignment to do on startup > boolean retainAssignment = master.getConfiguration(). > getBoolean("hbase.master.startup.retainassign", true); > Map<HServerInfo, List<HRegionInfo>> bulkPlan = null; > if (retainAssignment) { > // Reuse existing assignment info > bulkPlan = LoadBalancer.retainAssignment(allRegions, servers); > } else { > // assign regions in round-robin fashion > bulkPlan = LoadBalancer.roundRobinAssignment(new > ArrayList<HRegionInfo>(allRegions.keySet()), servers); > } > LOG.info("Bulk assigning " + allRegions.size() + " region(s) across " + > servers.size() + " server(s), retainAssignment=" + retainAssignment); > ... > {code} > In the function assignAllUserRegions(), listed above, AM fetches the server > list from ServerManager long before it actually use it to create assignment > plan. > In between these, it performs a full scan of META to create an assignment map > of regions. So even if additional RSes have registered in the meantime (as > happened in this case), AM still has the old list of just one server. > This code snippet is from 0.90.6 but the same issue exists in 0.92, 0.94 and > trunk. Since MSLAB is enabled by default in 0.92 onwards, any large cluster > can hit this issue upon cluster start-up when the following sequence holds > true. > # Master start long before the RSes (by default this long ~= 4.5 seconds) > # All the RSes start togather but one wins the race of registering with > Master by few seconds. > I am attaching a patch for the trunk which moves the code which fetches the > RS list form the beginning of the function to where it is first use. > Apart from this change, one other HBase setting that now becomes important is > "hbase.master.wait.on.regionservers.mintostart" due to MSLAB being enabled by > default. > In large clusters which keeps it enabled now must modify > "hbase.master.wait.on.regionservers.mintostart" to a suitable number than the > default of 1 to ensure that the master waits for a quorum of RSes which are > sufficient to open all the regions among themselves. I'll create a separate > JIRA for the documentation change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira