[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479058#comment-13479058
 ] 

Hadoop QA commented on HBASE-6389:
----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12549692/HBASE-6389_trunk_v2.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified tests.

    {color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
82 warning messages.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3076//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3076//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3076//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3076//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3076//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3076//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3076//console

This message is automatically generated.
                
> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6389
>                 URL: https://issues.apache.org/jira/browse/HBASE-6389
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.94.0, 0.96.0
>            Reporter: Aditya Kishore
>            Assignee: Aditya Kishore
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, HBASE-6389_trunk_v2.patch, 
> org.apache.hadoop.hbase.TestZooKeeper-output.txt, testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> ....
> 581     /**
> 582      * Wait for the region servers to report in.
> 583      * We will wait until one of this condition is met:
> 584      *  - the master is stopped
> 585      *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586      *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587      *    region servers is reached
> 588      *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589      *   there have been no new region server in for
> 590      *      'hbase.master.wait.on.regionservers.interval' time
> 591      *
> 592      * @throws InterruptedException
> 593      */
> 594     public void waitForRegionServers(MonitoredTask status)
> 595     throws InterruptedException {
> ....
> ....
> 612       while (
> 613         !this.master.isStopped() &&
> 614           slept < timeout &&
> 615           count < maxToStart &&
> 616           (lastCountChange+interval > now || count < minToStart)
> 617         ){
> ....
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>    * Wait for the region servers to report in.
>    * We will wait until one of this condition is met:
>    *  - the master is stopped
>    *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>    *    region servers is reached
>    *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>    *   there have been no new region server in for
>    *      'hbase.master.wait.on.regionservers.interval' time AND
>    *   the 'hbase.master.wait.on.regionservers.timeout' is reached
>    *
>    * @throws InterruptedException
>    */
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
>     int minToStart = this.master.getConfiguration().
>     getInt("hbase.master.wait.on.regionservers.mintostart", 1);
>     int maxToStart = this.master.getConfiguration().
>     getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
>     if (maxToStart < minToStart) {
>       maxToStart = minToStart;
>     }
> ..
> ..
>     while (
>       !this.master.isStopped() &&
>         count < maxToStart &&
>         (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>       ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to