[ 
https://issues.apache.org/jira/browse/KUDU-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shenxingwuying updated KUDU-3390:
---------------------------------
    Description: 
The origin jira: https://issues.apache.org/jira/browse/KUDU-3061, and I create 
a new Jira issus to record some infomations.

 

 
h1. Motivation

The number of leader replicas per tablet server can become imbalanced over 
time, which lead to load skew on some nodes.

Two reasons of load skew:
 * The main reason. Scan Requests has two modes: LeaderOnly(default) and 
CLOSEST_REPLICA. For more accurate results, users will choose the 
LeaderOnly(default) mode. Mostly, the scan load is positive correlation with 
leader numbers.

 * The other reason. Write requests, leaders receive write requests and 
followers receive appendEntries(kudu is UpdateConsensus), the flow of 
processing is a little different, which is hidden variables, maybe cause 
imbalanced load. Leader rebalance will make leader and followers balanced and 
eliminate hidden variables and make service more stable.

To deal with the situation, users use kudu CLI leader_step_down command and 
write a script program to rebalance the leaders. That can be better. Leader 
kudu-master can do c automatically.
h1. Solution

We can add an auto leader rebalance task to avoid leader replicas skew. Running 
a periodic task do leader rebalance at kudu-master.

Leader rebalance only do leader transfer, do not copy replicas. The basic idea 
is every tserver leaders' number : replicas' number = 1 : (replica_refactor - 
1). This is the argrithms.

If we need leader rebalance, we'd better enable replicas rebalancer. If enable 
leader rebalancer but disable auto rebalancer the algorithm work well but the 
effect is not good. The algorithm can be convergence, and the algorithm's 
target is every tserver' replicas, number of leader : number of follower is 1 : 
(replica_refactor -1).
h1. Leader Rebalance results

I do some experiments for the effective. I have a cluster, 3 machines: 3 master 
instances and 3 tserver instances.

I create a table with 40 tablets(partitions) and 3 replica_factor. And load a 
lots of data (40000000 records).

I disabled the leader rebalance function, and manually leader transfer all 
tablets to a tserver and run writes and scans.

Then I enabled the the leader rebalance function and runs scans. The workload 
as below:

The Scan command: {{./kudu_tools/kudu perf table_scan $master_list Student 
-columns=id,name,brief,age,score -num_threads=4 -nofill_cache 
-replica_selection="LEADER"}}

 

40: 0: 0  means node1 : node2: node3

47%, 18%, 19% means node1 : node2: node3

 
|| ||leader ratio||scan cost||cpu usage||memory||io||
|before leader rebalance|40: 0: 0|811.586 s|47%, 18%, 19%|no changes|102MB/s 
ioutil:55%, 8KB/s ioutil:2%, 64KB/s ioutil:3%|
|after leader rebalance|13: 14: 13|611.012 s|39%, 45%, 35%|no changes|53MB/s 
ioutil:31%, 80MB/s ioutil:18%, 45MB/s ioutil:24%|

  was:
The origin jira: https://issues.apache.org/jira/browse/KUDU-3061, and I create 
a new Jira issus to record some infomations.

 

 
h1. Motivation

The number of leader replicas per tablet server can become imbalanced over 
time, which lead to load skew on some nodes.

Two reasons of load skew:
 * The main reason. Scan Requests has two modes: LeaderOnly(default) and 
CLOSEST_REPLICA. For more accurate results, users will choose the 
LeaderOnly(default) mode. Mostly, the scan load is positive correlation with 
leader numbers.

 * The other reason. Write requests, leaders receive write requests and 
followers receive appendEntries(kudu is UpdateConsensus), the flow of 
processing is a little different, which is hidden variables, maybe cause 
imbalanced load. Leader rebalance will make leader and followers balanced and 
eliminate hidden variables and make service more stable.

To deal with the situation, users use kudu CLI leader_step_down command and 
write a script program to rebalance the leaders. That can be better. Leader 
kudu-master can do c automatically.
h1. Solution

We can add an auto leader rebalance task to avoid leader replicas skew. Running 
a periodic task do leader rebalance at kudu-master.

Leader rebalance only do leader transfer, do not copy replicas. The basic idea 
is every tserver leaders' number : replicas' number = 1 : (replica_refactor - 
1). This is the argrithms.

If we need leader rebalance, we'd better enable replicas rebalancer. If enable 
leader rebalancer but disable auto rebalancer the algorithm work well but the 
effect is not good. The algorithm can be convergence, and the algorithm's 
target is every tserver' replicas, number of leader : number of follower is 1 : 
(replica_refactor -1).
h1. Leader Rebalance results

I do some experiments for the effective. I have a cluster, 3 machines: 3 master 
instances and 3 tserver instances.

I create a table with 40 tablets(partitions) and 3 replica_factor. And load a 
lots of data (40000000 records).

I disabled the leader rebalance function, and manually leader transfer all 
tablets to a tserver and run writes and scans.

Then I enabled the the leader rebalance function and runs scans. The workload 
as below:

The Scan command: {{./kudu_tools/kudu perf table_scan $master_list Student 
-columns=id,name,brief,age,score -num_threads=4 -nofill_cache 
-replica_selection="LEADER"}}
|| ||leader ratio||scan cost||cpu usage||memory||io||
|before leader rebalance|40: 0: 0|811.586 s|47%, 18%, 19%|no changes|102MB/s 
ioutil:55%, 8KB/s ioutil:2%, 64KB/s ioutil:3%|
|after leader rebalance|13: 14: 13|611.012 s|39%, 45%, 35%|no changes|53MB/s 
ioutil:31%, 80MB/s ioutil:18%, node3: 45MB/s ioutil:24%|


> add new feature auto leader rebalancer
> --------------------------------------
>
>                 Key: KUDU-3390
>                 URL: https://issues.apache.org/jira/browse/KUDU-3390
>             Project: Kudu
>          Issue Type: New Feature
>            Reporter: shenxingwuying
>            Assignee: shenxingwuying
>            Priority: Major
>
> The origin jira: https://issues.apache.org/jira/browse/KUDU-3061, and I 
> create a new Jira issus to record some infomations.
>  
>  
> h1. Motivation
> The number of leader replicas per tablet server can become imbalanced over 
> time, which lead to load skew on some nodes.
> Two reasons of load skew:
>  * The main reason. Scan Requests has two modes: LeaderOnly(default) and 
> CLOSEST_REPLICA. For more accurate results, users will choose the 
> LeaderOnly(default) mode. Mostly, the scan load is positive correlation with 
> leader numbers.
>  * The other reason. Write requests, leaders receive write requests and 
> followers receive appendEntries(kudu is UpdateConsensus), the flow of 
> processing is a little different, which is hidden variables, maybe cause 
> imbalanced load. Leader rebalance will make leader and followers balanced and 
> eliminate hidden variables and make service more stable.
> To deal with the situation, users use kudu CLI leader_step_down command and 
> write a script program to rebalance the leaders. That can be better. Leader 
> kudu-master can do c automatically.
> h1. Solution
> We can add an auto leader rebalance task to avoid leader replicas skew. 
> Running a periodic task do leader rebalance at kudu-master.
> Leader rebalance only do leader transfer, do not copy replicas. The basic 
> idea is every tserver leaders' number : replicas' number = 1 : 
> (replica_refactor - 1). This is the argrithms.
> If we need leader rebalance, we'd better enable replicas rebalancer. If 
> enable leader rebalancer but disable auto rebalancer the algorithm work well 
> but the effect is not good. The algorithm can be convergence, and the 
> algorithm's target is every tserver' replicas, number of leader : number of 
> follower is 1 : (replica_refactor -1).
> h1. Leader Rebalance results
> I do some experiments for the effective. I have a cluster, 3 machines: 3 
> master instances and 3 tserver instances.
> I create a table with 40 tablets(partitions) and 3 replica_factor. And load a 
> lots of data (40000000 records).
> I disabled the leader rebalance function, and manually leader transfer all 
> tablets to a tserver and run writes and scans.
> Then I enabled the the leader rebalance function and runs scans. The workload 
> as below:
> The Scan command: {{./kudu_tools/kudu perf table_scan $master_list Student 
> -columns=id,name,brief,age,score -num_threads=4 -nofill_cache 
> -replica_selection="LEADER"}}
>  
> 40: 0: 0  means node1 : node2: node3
> 47%, 18%, 19% means node1 : node2: node3
>  
> || ||leader ratio||scan cost||cpu usage||memory||io||
> |before leader rebalance|40: 0: 0|811.586 s|47%, 18%, 19%|no changes|102MB/s 
> ioutil:55%, 8KB/s ioutil:2%, 64KB/s ioutil:3%|
> |after leader rebalance|13: 14: 13|611.012 s|39%, 45%, 35%|no changes|53MB/s 
> ioutil:31%, 80MB/s ioutil:18%, 45MB/s ioutil:24%|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to