[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2020-08-18 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179617#comment-17179617
 ] 

Guanghao Zhang commented on HBASE-23035:


{quote}[https://github.com/apache/hbase/blob/c2e0cf989e4a86169219161d4d889db80288e636/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L556]
{quote}
Introduced by HBASE-18036. I am +1 to add a config to control this.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2020-08-18 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179548#comment-17179548
 ] 

Anoop Sam John commented on HBASE-23035:


Ya I have seen this retainAssign stuff in 1.x based clusters.  What I mean is 
in some parts of AM,  we have configs which takes whether locality is to be 
considered and so calc based on that.. So ya a locality sensitive cluster can 
have such a config (new may be) turned ON.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2020-08-18 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179517#comment-17179517
 ] 

Guanghao Zhang commented on HBASE-23035:


{quote} IMO, we should make retain assignment configurable, so that user can 
decide as per their use case.
{quote}
Ok. If you guys need this feature, please open new issue for this. Thanks.

 
{quote}But currently branch-1 retains the assinement during SCP if same RS came 
up (please correct me if I miss something),
{quote}
Let me take a look about this.

 

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2020-08-17 Thread Pankaj Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179347#comment-17179347
 ] 

Pankaj Kumar commented on HBASE-23035:
--

{quote}This jira change the "retain" assignment to round-robin assignment, 
which is same with 1.x.x version. This change will make the failover faster and 
improve availability.
{quote}
Yeah [~zghao] , failover will be faster and regions will be available soon but 
this will impact the scan performance in non-cloud scenario.

But currently branch-1 retains the assinement during SCP if same RS came up 
(please correct me if I miss something),

[https://github.com/apache/hbase/blob/c2e0cf989e4a86169219161d4d889db80288e636/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L556]

 IMO, we should make retain assignment configurable, so that user can decide s 
per their use case.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2020-08-16 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178685#comment-17178685
 ] 

Bo Cui commented on HBASE-23035:


[https://github.com/apache/hbase/blob/c2e0cf989e4a86169219161d4d889db80288e636/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L556]

[~anoop.hbase]  u are talking about it?

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2020-08-13 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176848#comment-17176848
 ] 

Anoop Sam John commented on HBASE-23035:


[~Bo Cui] So ur usecase is entire cluster restart and as part of that u want 
the regions to come back to the old RSs itself (as much as possible).  So 
locality can be preserved.
There is a some config around the LB which takes whether to consider the data 
locality aspect in deciding the plan.  Can we make use of the same thing .. May 
be not.. I dont remember details on that conf. [~zghao] 

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2020-08-12 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176744#comment-17176744
 ] 

Bo Cui commented on HBASE-23035:


[~zghao]

During startup, hbase needs to assign region to previous rs without affecting 
the scan performance,  so we can add conf to solve this problem

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2020-08-11 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175295#comment-17175295
 ] 

Guanghao Zhang commented on HBASE-23035:


This jira change the "retain" assignment to round-robin assignment, which is 
same with 1.x.x version. This change will make the failover faster and improve 
availability.
{quote}Or are there some configs to control this?
{quote}
No configs to control this. The new behavior is the default behavior.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2020-08-11 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175277#comment-17175277
 ] 

Anoop Sam John commented on HBASE-23035:


[~zghao].. So how is the fix now?  The SSH will round robin the assignment even 
if the down RS came back by the time the AM start its work? Or are there some 
configs to control this? Or any other way?  Sorry did not see the patch

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2020-08-11 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175267#comment-17175267
 ] 

Guanghao Zhang commented on HBASE-23035:


{quote}hi, but some hbase cluster is not on the cloud, 
{quote}
Yes. Here's core problem is the slow failover if you assign all regions to one 
regionserver.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2020-08-10 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174189#comment-17174189
 ] 

Bo Cui commented on HBASE-23035:


{quote}And the locality is not big deal when deploy HBase on cloud.
{quote}
[~zghao]

hi, but some hbase cluster is not on the cloud, 

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-29 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940573#comment-16940573
 ] 

Hudson commented on HBASE-23035:


Results for branch master
[build #1487 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/1487/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1487//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1487//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1487//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-28 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940194#comment-16940194
 ] 

Hudson commented on HBASE-23035:


Results for branch branch-2.2
[build #644 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/644/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/644//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/644//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/644//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-28 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940178#comment-16940178
 ] 

Hudson commented on HBASE-23035:


Results for branch branch-2
[build #2305 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2305/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2305//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2305//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2305//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-28 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940117#comment-16940117
 ] 

Hudson commented on HBASE-23035:


Results for branch branch-2.1
[build #1648 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1648/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1648//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1648//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1648//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-25 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938247#comment-16938247
 ] 

Guanghao Zhang commented on HBASE-23035:


Ping [~ram_krish] for reviewing [https://github.com/apache/hbase/pull/652]

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-25 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938244#comment-16938244
 ] 

Guanghao Zhang commented on HBASE-23035:


Open a new issue HBASE-23078 for LoadBalancer.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-25 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938176#comment-16938176
 ] 

Guanghao Zhang commented on HBASE-23035:


There are two problems about the LoadBalancer.

 

1. The cluster means the cluster state of the whole cluster. But 
hasRegionReplica is false, so it only create clusterstate by the regions which 
need to assign, not the whole cluster...
{code:java}
Cluster cluster = createCluster(servers, regions, false);
List unassignedRegions = new ArrayList<>();
roundRobinAssignment(cluster, regions, unassignedRegions,
  servers, assignments);


  protected Cluster createCluster(List servers, 
Collection regions,
  boolean hasRegionReplica) {
// Get the snapshot of the current assignments for the regions in question, 
and then create
// a cluster out of it. Note that we might have replicas already assigned 
to some servers
// earlier. So we want to get the snapshot to see those assignments, but 
this will only contain
// replicas of the regions that are passed (for performance).
Map> clusterState = null;
if (!hasRegionReplica) {
  clusterState = getRegionAssignmentsByServer(regions);
} else {
  // for the case where we have region replica it is better we get the 
entire cluster's snapshot
  clusterState = getRegionAssignmentsByServer(null);
}for (ServerName server : servers) {
  if (!clusterState.containsKey(server)) {
clusterState.put(server, EMPTY_REGION_LIST);
  }
}
return new Cluster(regions, clusterState, null, this.regionFinder,
rackManager);
  }
{code}
2. wouldLowerAvailability method only consider the primary regions. The replica 
region can't assign to same server with primary region. But can be assigned to 
same server with other replica regions.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-24 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936684#comment-16936684
 ] 

Guanghao Zhang commented on HBASE-23035:


There may have bug in StochasticLoadBalancer. I tried to use it to balance 
cluster. But it cannot assign replica to different servers... But anyway this 
should be another issue. Let me dig more.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-24 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936501#comment-16936501
 ] 

Guanghao Zhang commented on HBASE-23035:


{quote}bq. So on restart you just want to leave the target location as null and 
allow the LB to take care of the location - right?
{quote}
Yes. The job should done by LB.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-23 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936390#comment-16936390
 ] 

ramkrishna.s.vasudevan commented on HBASE-23035:


So on restart you just want to leave the target location as null and allow the 
LB to take care of the location - right?

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-23 Thread stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936247#comment-16936247
 ] 

stack commented on HBASE-23035:
---

[~zghao] HBASE-18946 is long time ago. Sorry if we messed it up. Thanks for 
finding hole in it.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-23 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935654#comment-16935654
 ] 

Duo Zhang commented on HBASE-23035:
---

I think this should be done in balancer.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-23 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935638#comment-16935638
 ] 

Guanghao Zhang commented on HBASE-23035:


Ping [~ram_krish] [~anoop.hbase] [~stack] [~zhangduo] Any ideas?

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-23 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935637#comment-16935637
 ] 

Guanghao Zhang commented on HBASE-23035:


Read HBASE-18946 again. So the initial problem is "Region replicas should be 
assigned to different servers". But the fix looks not good in HBASE-18946. It 
tried to round-robin assign when create table with region replica. And retain 
the old location for region replica when failover. But load balancer still have 
change to break this and assign region replica to same server? I thought the 
initial problem should be fixed by load balancer. And the failover no need 
retain the old deployment.

 

 

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-22 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935539#comment-16935539
 ] 

Guanghao Zhang commented on HBASE-23035:


Open a PR#652 for addendum patch.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-22 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935500#comment-16935500
 ] 

Guanghao Zhang commented on HBASE-23035:


I found there are unit test called TestRetainAssignmentOnRestart. But not 
failed on branch-2.2+. Let me dig more...

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-22 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935374#comment-16935374
 ] 

Hudson commented on HBASE-23035:


Results for branch branch-2
[build #2295 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2295/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2295//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2295//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2295//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-22 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935335#comment-16935335
 ] 

Hudson commented on HBASE-23035:


Results for branch branch-2.2
[build #634 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/634/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/634//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/634//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/634//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-22 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935311#comment-16935311
 ] 

Hudson commented on HBASE-23035:


Results for branch master
[build #1474 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/1474/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1474//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1474//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1474//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-22 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935291#comment-16935291
 ] 

Guanghao Zhang commented on HBASE-23035:


Create a new PR#650 for  branch-2.1.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-21 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935042#comment-16935042
 ] 

Guanghao Zhang commented on HBASE-23035:


Pushed to branch-2.2+.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-17 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931235#comment-16931235
 ] 

Guanghao Zhang commented on HBASE-23035:


bq. We were always doing a round robin method in case of SCP right? I mean for 
non region replica cases?
After HBASE-18946, it will retain the regions to the old RS even no region 
replica.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-17 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931193#comment-16931193
 ] 

ramkrishna.s.vasudevan commented on HBASE-23035:


We were always doing a round robin method in case of SCP right? I mean for non 
region replica cases? 

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-17 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931188#comment-16931188
 ] 

Guanghao Zhang commented on HBASE-23035:


bq. So the server which went down and immediate came back will have lesser 
regions comparably right? 
Yes.

bq. So how/whether that will impact the cluster later?
The cluster will balance again by running balancer.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-17 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931181#comment-16931181
 ] 

Anoop Sam John commented on HBASE-23035:


So the server which went down and immediate came back will have lesser regions 
comparably right? In effect some of its old regions got moved to other live 
RSs.  So how/whether that will impact the cluster later? What will happen with 
the later balance happening?

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-16 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931047#comment-16931047
 ] 

Guanghao Zhang commented on HBASE-23035:


bq. You mean the RS is back online immediately? 
Yes

bq. the default behavior in the old time, is to assign regions to all the live 
servers to make them online soon...
Yes. This issue plan to change to the old behavior.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-16 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931025#comment-16931025
 ] 

Duo Zhang commented on HBASE-23035:
---

You mean the RS is back online immediately? If the RS is dead we will not 
assign regions to it...

And IIRC, the default behavior in the old time, is to assign regions to all the 
live servers to make them online soon... 

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)