[ 
https://issues.apache.org/jira/browse/KUDU-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942577#comment-16942577
 ] 

Alexey Serbin edited comment on KUDU-2914 at 10/2/19 8:18 AM:
--------------------------------------------------------------

Thank you [~zhangyifan27] for your work on this useful feature!

In theory, once the tserver is marked with special {{decommissioned}} flag, 
master won't put any new replicas at the tablet server.  So, once the list of 
replicas collected from the tablet server already in {{decommissioned}} mode, 
it's safe to get the list of tablet replicas on the tserver and mark each with 
the {{REPLACE}} attribute.  After that it's necessary to get the list of tablet 
replicas at the tablet server (e.g., using {{ListTablets()}} RPC as in  
{{RemoteKsckTabletServer::FetchInfo()}}) from time to time and wait until no 
replicas left there.

Yes, it's possible to mark any number of replicas at a tablet server with the 
{{REPLACE}} attribute.

It should be possible do decommission multiple tablet servers at once, yes.  
However, marking a server as {{decommissined}} isn't yet implemented.  However, 
as a temporary workaround, I think it's possible to put tablet servers into the 
maintenance mode (see 
[KUDU-2069|https://issues.apache.org/jira/browse/KUDU-2069]) instead of marking 
them {{decommissioned}}.  When a tablet server is put into the maintenance 
mode, master doesn't place any replicas on it, but it's still possible to move 
tablet replicas from it.  See [this 
commit|https://github.com/apache/kudu/commit/5316a89dfd13c36eef078b32043f161e6d0bbf01]
 for details.

So, before proper decommissioning is implemented, the procedure of moving all 
replicas from a tablet server could be the following:
# Put the tablet server into the maintenance mode.
# Mark all the replicas at the tablet server with {{REPLACE}} attribute
# Periodically retrieve the list of tablet replicas at the server.
# Once all the replicas are gone from the server, shut it down.
# Remove the tablet server from the cluster.
# Declare victory :)



was (Author: aserbin):
Thank you [~zhangyifan27] for your work on this useful feature!

In theory, once the tserver is marked with special {{decommissioned}} flag, 
master won't put any new replicas at the tablet server.  So, once the list of 
replicas collected from the tablet server already in {{decommissioned}} mode, 
it's safe to get the list of tablet replicas on the tserver and mark each with 
the {{REPLACE}} attribute.  After that it's necessary to get the list of tablet 
replicas at the tablet server (e.g., using {{ListTablets()}} RPC as in  
{{RemoteKsckTabletServer::FetchInfo()}}) from time to time and wait until no 
replicas left there.

Yes, it's possible to mark any number of replicas at a tablet server with the 
{{REPLACE}} attribute.

It should be possible do decommission multiple tablet servers at once, yes.  
However, marking a server as {{decommissined}} isn't yet implemented.  However, 
as a temporary workaround, I think it's possible to put tablet servers into the 
maintenance mode (see 
[KUDU-2069|https://issues.apache.org/jira/browse/KUDU-2069]) instead of marking 
them {{decommissioned}}.  When a tablet server is put into the maintenance 
mode, master doesn't place any replicas on it, but it's still possible to move 
tablet replicas from it.  See [this 
commit|https://github.com/apache/kudu/commit/5316a89dfd13c36eef078b32043f161e6d0bbf01]
 for details.

So, before proper decommissioning is implemented, the procedure of moving all 
replicas from a tablet server could be the following:
# Put the tablet server into the maintenance mode.
# Mark all the replicas at the tablet server with {{REPLACE}} attribute
# Periodically retrieve the list of tablet replicas at the server.
# Once all the replicas are gone from the server, shut it down.
# Switch the tablet server off from the maintenance into the normal mode.
# Remove the tablet server from the cluster.
# Declare victory :)


> Rebalance tool support moving replicas from some specific tablet servers
> ------------------------------------------------------------------------
>
>                 Key: KUDU-2914
>                 URL: https://issues.apache.org/jira/browse/KUDU-2914
>             Project: Kudu
>          Issue Type: Improvement
>          Components: CLI
>            Reporter: YifanZhang
>            Assignee: YifanZhang
>            Priority: Minor
>
> When we need to remove some tservers from a kudu cluster (maybe just for 
> saving resources or replacing these servers with new servers), it's better to 
> move all replicas on these tservers to other tservers in a cluster in 
> advance, instead of waiting for all replicas kicked out and evicting new 
> replicas. This can be achieved by rebalance tool supporting specifying 
> 'blacklist_tservers'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to