[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-11-03 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6389:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to 0.94 as well.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-6389_0.94.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk_v2.patch, 
> HBASE-6389_trunk_v2.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
> testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-11-02 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6389:
-

Release Note: 
Reverts the cluster startup behavior to pre 0.94.0.

With this, Master will wait until 
"hbase.master.wait.on.regionservers.mintostart" number of Region Servers have 
registered with it before it starts region assignment. The default value of 
this setting is 1.

In large clusters with thousands of regions you may want to increase this to a 
higher number which is sufficient to handle the task of opening those region in 
parallel.

If left to the default, at times, the Master could assign all regions to a 
single Region Server which will result in slow startup and in worst case could 
OOM the Region Server (some time resulting in META inconsistency).

Here is how it works now (from the javadoc):

We wait until one of these condition is met:
 - the master is stopped
 - the 'hbase.master.wait.on.regionservers.maxtostart' number of region servers 
is reached
 - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND there 
have been no new region server in for 
'hbase.master.wait.on.regionservers.interval' time AND the 
'hbase.master.wait.on.regionservers.timeout' is reached

  was:
Reverts the cluster startup behavior to pre 0.94.0.

With this, Master will wait until 
"hbase.master.wait.on.regionservers.mintostart" number of Region Servers have 
registered with it before it starts region assignment. The default value of 
this setting is 1.

In large clusters with thousands of regions you may want to increase this to a 
higher number which is sufficient to handle the task of opening those region in 
parallel.

If left to the default, at times, the Master could assign all regions to a 
single Region Server which will result in slow startup and in worst case could 
OOM the Region Server (some time resulting in META inconsistency).

Hadoop Flags: Incompatible change,Reviewed

Committed to trunk.  Thanks for the clarifications Aditya (and the fix).  Lars, 
you want this for 0.94.3?  I marked it as an 'incompatible change' but this is 
a bug fix, and the behavior now more rational.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-6389_0.94.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk_v2.patch, 
> HBASE-6389_trunk_v2.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
> testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
>

[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-10-31 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6389:
--

Fix Version/s: 0.94.3

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-6389_0.94.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk_v2.patch, 
> HBASE-6389_trunk_v2.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
> testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-10-31 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6389:
--

Attachment: HBASE-6389_0.94.patch

Attaching patch for 0.94 branch. The patch passes full test suit on my test 
machine.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-6389_0.94.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk_v2.patch, 
> HBASE-6389_trunk_v2.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
> testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.co

[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-10-19 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6389:
--

Release Note: 
Reverts the cluster startup behavior to pre 0.94.0.

With this, Master will wait until 
"hbase.master.wait.on.regionservers.mintostart" number of Region Servers have 
registered with it before it starts region assignment. The default value of 
this setting is 1.

In large clusters with thousands of regions you may want to increase this to a 
higher number which is sufficient to handle the task of opening those region in 
parallel.

If left to the default, at times, the Master could assign all regions to a 
single Region Server which will result in slow startup and in worst case could 
OOM the Region Server (some time resulting in META inconsistency).

  was:
Reverts the cluster startup behavior to pre 0.94.0.

Now, Master will wait until "hbase.master.wait.on.regionservers.mintostart" 
number of Region Servers have registered with it before it starts region 
assignment. The default value of this setting is 1.

In large clusters with thousands of regions you may want to increase this to a 
higher number which is sufficient to handle the task of opening those region in 
parallel.

If left to the default, at times, the Master could assign all regions to a 
single Region Server which will result in slow start and in worst case could 
OOM the Region Server (some time resulting in META inconsistency).


> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, HBASE-6389_trunk_v2.patch, HBASE-6389_trunk_v2.patch, 
> org.apache.hadoop.hbase.TestZooKeeper-output.txt, testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart

[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-10-19 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6389:
--

Release Note: 
Reverts the cluster startup behavior to pre 0.94.0.

Now, Master will wait until "hbase.master.wait.on.regionservers.mintostart" 
number of Region Servers have registered with it before it starts region 
assignment. The default value of this setting is 1.

In large clusters with thousands of regions you may want to increase this to a 
higher number which is sufficient to handle the task of opening those region in 
parallel.

If left to the default, at times, the Master could assign all regions to a 
single Region Server which will result in slow start and in worst case could 
OOM the Region Server (some time resulting in META inconsistency).

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, HBASE-6389_trunk_v2.patch, HBASE-6389_trunk_v2.patch, 
> org.apache.hadoop.hbase.TestZooKeeper-output.txt, testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostar

[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-10-18 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6389:
--

Status: Patch Available  (was: Open)

Resubmitting the patch with some minor changes.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, HBASE-6389_trunk_v2.patch, HBASE-6389_trunk_v2.patch, 
> org.apache.hadoop.hbase.TestZooKeeper-output.txt, testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-10-18 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6389:
--

Status: Open  (was: Patch Available)

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, HBASE-6389_trunk_v2.patch, HBASE-6389_trunk_v2.patch, 
> org.apache.hadoop.hbase.TestZooKeeper-output.txt, testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-10-18 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6389:
--

Attachment: HBASE-6389_trunk_v2.patch

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, HBASE-6389_trunk_v2.patch, HBASE-6389_trunk_v2.patch, 
> org.apache.hadoop.hbase.TestZooKeeper-output.txt, testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-10-18 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6389:
--

Status: Patch Available  (was: Open)

The new patch addresses test failures in couple of test cases (including 
TestReplication, TestZookeeper).

Also, since the config params are used in quite a few places, I have moved them 
as a public constant.

Please note that there is no change in the logic of the fix. This only fixes 
few bugs in test cases which surfaced due to the fix.

I'll submit the review request later during the day.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, HBASE-6389_trunk_v2.patch, 
> org.apache.hadoop.hbase.TestZooKeeper-output.txt, testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || ti

[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-10-18 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6389:
--

Attachment: HBASE-6389_trunk_v2.patch

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, HBASE-6389_trunk_v2.patch, 
> org.apache.hadoop.hbase.TestZooKeeper-output.txt, testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6389:
--

Status: Open  (was: Patch Available)

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
> testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6389:
--

Attachment: testReplication.jstack

jstack for the hanging TestReplication

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
> testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6389:
--

Attachment: org.apache.hadoop.hbase.TestZooKeeper-output.txt

Here was the test output from yesterday.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6389:
--

Release Note:   (was: Resubmitting the patch for trunk with the fixed test.)

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-18 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6389:
--

Release Note: Resubmitting the patch for trunk with the fixed test.
Hadoop Flags:   (was: Reviewed)
  Status: Patch Available  (was: Reopened)

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-18 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6389:
--

Attachment: HBASE-6389_trunk.patch

As suspected, the failure was result of a partial shutdown of region server 
expired in the previous test.

The region server thread of this expired server was still alive when the next 
test started which lead the test pre-condition to believe that enough number of 
RSes are available.

The new patch take care of the problem.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please cont

[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-18 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6389:
-

Fix Version/s: (was: 0.94.1)
   0.94.2

The change is logically no longer in 0.94.1. Moving to 0.94.2.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-13 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6389:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to 0.94 and 0.96

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.1
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-13 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6389:
--

Attachment: HBASE-6389_trunk.patch

The test failure were result of masked error in test code which this change 
brought out.

There were two such errors.

# The function 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster() was 
overriding the value of 'mintostart' and 'maxtostart' with a single value, even 
if the caller has set them explicitly.
# org.apache.hadoop.hbase.regionserver.TestRSKilledWhenMasterInitializing did 
not set these values even though it kills one RS during master initialization.

The attached patch fixes these two.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.1
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || t

[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-12 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6389:
--

Attachment: HBASE-6389_trunk.patch

Modified patch.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.1
>
> Attachments: HBASE-6389_trunk.patch
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-12 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6389:
--

Attachment: (was: HBASE-6389_trunk.patch)

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.1
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-12 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6389:
--

Hadoop Flags: Reviewed

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.1
>
> Attachments: HBASE-6389_trunk.patch
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-12 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6389:
-

Fix Version/s: 0.94.1

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.1
>
> Attachments: HBASE-6389_trunk.patch
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-12 Thread Aditya Kishore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated HBASE-6389:
--

Summary: Modify the conditions to ensure that Master waits for sufficient 
number of Region Servers before starting region assignments  (was: Modify the 
conditions to ensure that Master waits for suffcient number of Region Servers 
before starting region assignments)

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: HBASE-6389_trunk.patch
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/sof