[jira] [Commented] (HBASE-24152) Add ServerSsdLocalityCostFunction to StochasticLoadBalancer

2020-05-17 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17109784#comment-17109784
 ] 

Zheng Wang commented on HBASE-24152:


{quote}Why can't the ssd aspect be factored into the general server weight; 
e.g. why a replica that is on ssd can't be considered 'more' local that a 
replica on hdd.
{quote}
I think the locality should be the actual ratio, or else the users will be 
confused on it.

Anyway, will back if i find a better way.

Thanks a lot.

> Add ServerSsdLocalityCostFunction to StochasticLoadBalancer
> ---
>
> Key: HBASE-24152
> URL: https://issues.apache.org/jira/browse/HBASE-24152
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> When use ONE_SSD storagy policy, or ALL_SSD but has not enough SSD, there 
> will be some hdfs blocks on DISK and others on SSD,so it is reasonable to 
> consider the locality of ssd for StochasticLoadBalancer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24152) Add ServerSsdLocalityCostFunction to StochasticLoadBalancer

2020-05-12 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105944#comment-17105944
 ] 

Michael Stack commented on HBASE-24152:
---

Pardon me [~filtertip]. I think this a mistaken direction. The balancer is 
complicated enough w/o the extra factors -- especially when the cluster large 
with many regions and we are making stochastic calculation.

The balancer factors macro attributes such as rack locality and then replica 
locality. Teaching the balancer to factor a new attribute, ssd-ism, a 
sub-attribute of regionservers seems like the wrong direction. Why can't the 
ssd aspect be factored into the general server weight; e.g. why a replica that 
is on ssd can't be considered 'more' local that a replica on hdd.

> Add ServerSsdLocalityCostFunction to StochasticLoadBalancer
> ---
>
> Key: HBASE-24152
> URL: https://issues.apache.org/jira/browse/HBASE-24152
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> When use ONE_SSD storagy policy, or ALL_SSD but has not enough SSD, there 
> will be some hdfs blocks on DISK and others on SSD,so it is reasonable to 
> consider the locality of ssd for StochasticLoadBalancer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24152) Add ServerSsdLocalityCostFunction to StochasticLoadBalancer

2020-05-12 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105888#comment-17105888
 ] 

Zheng Wang commented on HBASE-24152:


{quote}Thanks. So, one replica goes to SSD and you want the RS w/ that replica 
to be favored over the others?
{quote}
Yeah.
{quote}Please explain why we need to carry around a dedicated weight just for 
this ONE_SSD placement policy; why we can't just use the existing measure?
{quote}
To be exact, this new added weight is for ssd storage type, both ONE_SSD and 
ALL_SSD storage policy ralated to it.
The existing weight dose not care about storage type, so we could not reach the 
goal that you above-mentioned by it.

> Add ServerSsdLocalityCostFunction to StochasticLoadBalancer
> ---
>
> Key: HBASE-24152
> URL: https://issues.apache.org/jira/browse/HBASE-24152
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> When use ONE_SSD storagy policy, or ALL_SSD but has not enough SSD, there 
> will be some hdfs blocks on DISK and others on SSD,so it is reasonable to 
> consider the locality of ssd for StochasticLoadBalancer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24152) Add ServerSsdLocalityCostFunction to StochasticLoadBalancer

2020-05-11 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104667#comment-17104667
 ] 

Zheng Wang commented on HBASE-24152:


The explain of One_SSD storage policy:

"One_SSD - for storing one of the replicas in SSD. The remaining replicas are 
stored in DISK."

Considering the host of replicas is diffrent, and hdfs try to read local 
replica first, so my starting point is we should move the region to follow the 
replica in ssd, the existing cost functions does not relate to this, so i add a 
new one in this issue.

Thanks very much. :)

> Add ServerSsdLocalityCostFunction to StochasticLoadBalancer
> ---
>
> Key: HBASE-24152
> URL: https://issues.apache.org/jira/browse/HBASE-24152
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> When use ONE_SSD storagy policy, or ALL_SSD but has not enough SSD, there 
> will be some hdfs blocks on DISK and others on SSD,so it is reasonable to 
> consider the locality of ssd for StochasticLoadBalancer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24152) Add ServerSsdLocalityCostFunction to StochasticLoadBalancer

2020-05-11 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104665#comment-17104665
 ] 

Michael Stack commented on HBASE-24152:
---

Thanks. So, one replica goes to SSD and you want the RS w/ that replica to be 
favored over the others? Please explain why we need to carry around a dedicated 
weight just for this ONE_SSD placement policy; why we can't just use the 
existing measure?

> Add ServerSsdLocalityCostFunction to StochasticLoadBalancer
> ---
>
> Key: HBASE-24152
> URL: https://issues.apache.org/jira/browse/HBASE-24152
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> When use ONE_SSD storagy policy, or ALL_SSD but has not enough SSD, there 
> will be some hdfs blocks on DISK and others on SSD,so it is reasonable to 
> consider the locality of ssd for StochasticLoadBalancer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24152) Add ServerSsdLocalityCostFunction to StochasticLoadBalancer

2020-05-11 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104645#comment-17104645
 ] 

Zheng Wang commented on HBASE-24152:


Here is the doc: 
[https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html.]

 

> Add ServerSsdLocalityCostFunction to StochasticLoadBalancer
> ---
>
> Key: HBASE-24152
> URL: https://issues.apache.org/jira/browse/HBASE-24152
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> When use ONE_SSD storagy policy, or ALL_SSD but has not enough SSD, there 
> will be some hdfs blocks on DISK and others on SSD,so it is reasonable to 
> consider the locality of ssd for StochasticLoadBalancer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24152) Add ServerSsdLocalityCostFunction to StochasticLoadBalancer

2020-05-11 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104609#comment-17104609
 ] 

Michael Stack commented on HBASE-24152:
---

Sorry [~filtertip], I'm having trouble following whats going on here. I went 
looking for doc on ONE_SSD but doesn't seem to be any (grepping about in HDFS 
source). I see we set this placement policy per column family. How we make the 
jump from per-column family storage policy to balancer which is 
regions-to-server is giving me trouble. Pardon my being 'thick'.

> Add ServerSsdLocalityCostFunction to StochasticLoadBalancer
> ---
>
> Key: HBASE-24152
> URL: https://issues.apache.org/jira/browse/HBASE-24152
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> When use ONE_SSD storagy policy, or ALL_SSD but has not enough SSD, there 
> will be some hdfs blocks on DISK and others on SSD,so it is reasonable to 
> consider the locality of ssd for StochasticLoadBalancer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24152) Add ServerSsdLocalityCostFunction to StochasticLoadBalancer

2020-05-10 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104054#comment-17104054
 ] 

Zheng Wang commented on HBASE-24152:


{quote}Why not just give host3 a more attractive score?

Up the weights for host1 and host2 because they are 'only' hdd.

Starting a new scoring that is exclusively about SSD doesn't seem like the 
right direction. Better if when scoring a host for the balancer, there is one 
scoring only which has taking into account all factors -- count of regions 
already assigned, as well as whether host host has SSD or not.
{quote}
 

Your proposal is target to move more regions to the hosts which has ssd, but it 
has three disadvantages:
1、Need to get the config of dfs.datanode.data.dir from namenode to judge 
whether a host has ssd or not.
2、The hosts may be mixed storage that have both hdd and ssd, so we can not easy 
to specify a consistent weight for them.
3、Only make effect after compaction.

The proposal in this issue is about locality, target to the local replica is 
also the ssd replica, it has two advantages:
1、No need to get config from namenode.
2、Could make effect immediately after movement.
{quote}
Is this feature for the case where only some hosts in the cluster have SSD?
{quote}
 

Yeah, we do not need to worry about that these hosts will has too many regions, 
restrained by the effect of RegionCountSkewCostFunction.

Thanks.

> Add ServerSsdLocalityCostFunction to StochasticLoadBalancer
> ---
>
> Key: HBASE-24152
> URL: https://issues.apache.org/jira/browse/HBASE-24152
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> When use ONE_SSD storagy policy, or ALL_SSD but has not enough SSD, there 
> will be some hdfs blocks on DISK and others on SSD,so it is reasonable to 
> consider the locality of ssd for StochasticLoadBalancer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24152) Add ServerSsdLocalityCostFunction to StochasticLoadBalancer

2020-05-10 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103887#comment-17103887
 ] 

Michael Stack commented on HBASE-24152:
---

Why not just give host3 a more attractive score?

Up the weights for host1 and host2 because they are 'only' hdd.

Starting a new scoring that is exclusively about SSD doesn't seem like the 
right direction. Better if when scoring a host for the balancer, there is one 
scoring only which has taking into account all factors -- count of regions 
already assigned, as well as whether host host has SSD or not.

Is this feature for the case where only some hosts in the cluster have SSD?

Thanks.

> Add ServerSsdLocalityCostFunction to StochasticLoadBalancer
> ---
>
> Key: HBASE-24152
> URL: https://issues.apache.org/jira/browse/HBASE-24152
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> When use ONE_SSD storagy policy, or ALL_SSD but has not enough SSD, there 
> will be some hdfs blocks on DISK and others on SSD,so it is reasonable to 
> consider the locality of ssd for StochasticLoadBalancer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24152) Add ServerSsdLocalityCostFunction to StochasticLoadBalancer

2020-05-08 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103071#comment-17103071
 ] 

Zheng Wang commented on HBASE-24152:


{quote}Was reviewing the PR but figured don't have enough understanding of what 
is going on so let me ask here.

I don't follow why the PR is special casing SSD weight. Why it it not just 
factored into the host general weight? Balancer works at the RS level, not at 
type of storage? A local replica should be favored whether on ssd or not?

Help me out [~filtertip] Thank you.
{quote}
[~stack] 

Consider this case when setting ONE_SSD as STORAGE_POLICY.

The region-1 opened by host-1, it includes one hdfs block which has three 
replicas stored as below:

replica-1 on host-1(hdd)
replica-2 on host-2(hdd)
replica-3 on host-3(ssd)

Then the reader of hfile will read from hdd, because host-1 is local and has 
high priority.
If we move the region-1 to host-3, the reader will read from ssd, and this cost 
function could increase the possibility of the movement when making plans.

> Add ServerSsdLocalityCostFunction to StochasticLoadBalancer
> ---
>
> Key: HBASE-24152
> URL: https://issues.apache.org/jira/browse/HBASE-24152
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> When use ONE_SSD storagy policy, or ALL_SSD but has not enough SSD, there 
> will be some hdfs blocks on DISK and others on SSD,so it is reasonable to 
> consider the locality of ssd for StochasticLoadBalancer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24152) Add ServerSsdLocalityCostFunction to StochasticLoadBalancer

2020-05-08 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102849#comment-17102849
 ] 

Michael Stack commented on HBASE-24152:
---

Was reviewing the PR but figured don't have enough understanding of what is 
going on so let me ask here.

I don't follow why the PR is special casing SSD weight. Why it it not just 
factored into the host general weight? Balancer works at the RS level, not at 
type of storage? A local replica should be favored whether on ssd or not?

Help me out [~filtertip] Thank you.



> Add ServerSsdLocalityCostFunction to StochasticLoadBalancer
> ---
>
> Key: HBASE-24152
> URL: https://issues.apache.org/jira/browse/HBASE-24152
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> When use ONE_SSD storagy policy, or ALL_SSD but has not enough SSD, there 
> will be some hdfs blocks on DISK and others on SSD,so it is reasonable to 
> consider the locality of ssd for StochasticLoadBalancer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)