Song Jiacheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21073 )

Change subject: Remove the replace flag if the move fails in auto rebalancing.
......................................................................


Patch Set 14:

(9 comments)

Thanks for the review!

http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer-test.cc
File src/kudu/master/auto_rebalancer-test.cc:

http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer-test.cc@823
PS12, Line 823: rebalanci
> nit: rebalancing
Done


http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer-test.cc@855
PS12, Line 855: &
> style nit for here and below: stick the ampersand to the type, not the vari
Done


http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer.h
File src/kudu/master/auto_rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer.h@133
PS12, Line 133:   // Finds replicas that are specified in 'replica_moves' and 
make requests
              :   // to have them moved in order to rebalance the cluster.
> Please update this in-line doc to reflect the semantics of the changes in t
Done


http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer.cc
File src/kudu/master/auto_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer.cc@141
PS12, Line 141:
> Consider tagging this new flag with the 'unsafe' tag.
Done


http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer.cc@142
PS12, Line 142: DEFINE_bool(au
> nit: wrong indent
Done


http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer.cc@698
PS12, Line 698:     }
              :     // If the move was completed, remove it from 
'replica_moves'.
> The message is now quite confusing.  Please separate two logs: logging abou
Done


http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/master/auto_rebalancer.cc@724
PS12, Line 724:
> nit: injected artificial test failure
Done


http://gerrit.cloudera.org:8080/#/c/21073/14/src/kudu/master/auto_rebalancer.cc
File src/kudu/master/auto_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/21073/14/src/kudu/master/auto_rebalancer.cc@669
PS14, Line 669:       // Remove the replace flag of the replica.
> Moving replica fail is a normal case. Run `kudu cluster rebalance` command
There are two cases if the leader's replace flag is not removed:
  1. Master add a replica to the raft config and the new replica always fails 
to become a voter because of some reasons, so it need to find another replica, 
and another, costing a lot resource.
  2. Master add a replica to the raft config and the new replica finally become 
a voter, and the tablet always has a extra replica which is actually useless.
The leader replica which fails to move is not always found in the next round of 
rebalancing.


http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/rebalance/rebalancer.h
File src/kudu/rebalance/rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/21073/12/src/kudu/rebalance/rebalancer.h@158
PS12, Line 158:   enum class RunStatus {
              :     UNKNOWN,
> Why to store this information at all?  This smells not quite good:
Yes, I considered about whether to store the information or get the consensus 
information when the move fails. And I select the former one to avoid more 
GetConsensus requests. And the number of RebalanceMove is usually small in our 
environment.
I will try to do it in the other way.



--
To view, visit http://gerrit.cloudera.org:8080/21073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I99dafa654878b9d6d8162d84500913ae0655692b
Gerrit-Change-Number: 21073
Gerrit-PatchSet: 14
Gerrit-Owner: Song Jiacheng <songjiach...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <ale...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mre...@cloudera.com>
Gerrit-Reviewer: Song Jiacheng <songjiach...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
Gerrit-Comment-Date: Mon, 20 May 2024 08:31:53 +0000
Gerrit-HasComments: Yes

Reply via email to