[jira] [Commented] (KUDU-2915) Support to delete dead tservers from CLI

2022-01-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480791#comment-17480791
 ] 

ASF subversion and git services commented on KUDU-2915:
---

Commit 9d01e1046249a815f26c7b5ebb1ceb2b67f72b9e in kudu's branch 
refs/heads/master from zhangyifan27
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=9d01e10 ]

KUDU-2915: add tool to unregister a tablet server

Add a 'kudu tserver unregister' tool to unregister a tserver from the
master. This tool will be useful when we want to decommission a tserver
without restarting masters.

This tool unregisters the dead tserver from master's in-memory map and
removes its persisted state from catalog table by default. It's also
possible to unregister a tserver which is not presumed dead by adding
'-force_unregister_live_tserver', or keep tserver's persisted state
by adding '-remove_tserver_state=false'.

Change-Id: If1f5c2979a8d14428f4bcc8e850c57ce228c793a
Reviewed-on: http://gerrit.cloudera.org:8080/18124
Reviewed-by: Alexey Serbin 
Reviewed-by: Andrew Wong 
Tested-by: Kudu Jenkins


> Support to delete dead tservers from CLI
> 
>
> Key: KUDU-2915
> URL: https://issues.apache.org/jira/browse/KUDU-2915
> Project: Kudu
>  Issue Type: Improvement
>  Components: CLI, ops-tooling
>Affects Versions: 1.10.0
>Reporter: Hexin
>Assignee: Hexin
>Priority: Major
>  Labels: supportability
>
> Sometimes the nodes in the cluster will crash due to machine problems such as 
> disk corruption, which can be very common. However, if there are some dead 
> tservers, ksck result will always show error (e.g. Not all Tablet Servers are 
> reachable) although all tables have recovered to be healthy.
> The only way now to get the healthy status of ksck is to restart all masters 
> one by one. In some cases, for example, if the machine has completely 
> corrupted, we hope to get healthy status of ksck without restarting, since 
> after restarting masters the cluster will take some time to recover, during 
> which it will have influence on scanning or upsetting to tables. The recovery 
> time can be long which mainly depends on the scale of cluster. This problem 
> can be serious and annoying especially tservers crashed with high-frequency 
> in a large cluster.
> It’s valuable if we have an easier way to delete dead tservers from master, I 
> will support a kudu command to realize it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KUDU-2915) Support to delete dead tservers from CLI

2021-11-28 Thread YifanZhang (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17449978#comment-17449978
 ] 

YifanZhang commented on KUDU-2915:
--

I think it's good that we could introduce a tool to unregister a dead tablet 
server from the master's in-memory state. 

And on the other hand, I also want to know whether it is safe or reasonable to 
make master take the initiative to forget a tablet server that have been in 
'dead' state for 'a long time' and no replica is running on it. If the same 
tablet server comes back again, the master re-register it in it's in-memory 
state. Is there some problems?

> Support to delete dead tservers from CLI
> 
>
> Key: KUDU-2915
> URL: https://issues.apache.org/jira/browse/KUDU-2915
> Project: Kudu
>  Issue Type: Improvement
>  Components: CLI, ops-tooling
>Affects Versions: 1.10.0
>Reporter: Hexin
>Assignee: Hexin
>Priority: Major
>  Labels: supportability
>
> Sometimes the nodes in the cluster will crash due to machine problems such as 
> disk corruption, which can be very common. However, if there are some dead 
> tservers, ksck result will always show error (e.g. Not all Tablet Servers are 
> reachable) although all tables have recovered to be healthy.
> The only way now to get the healthy status of ksck is to restart all masters 
> one by one. In some cases, for example, if the machine has completely 
> corrupted, we hope to get healthy status of ksck without restarting, since 
> after restarting masters the cluster will take some time to recover, during 
> which it will have influence on scanning or upsetting to tables. The recovery 
> time can be long which mainly depends on the scale of cluster. This problem 
> can be serious and annoying especially tservers crashed with high-frequency 
> in a large cluster.
> It’s valuable if we have an easier way to delete dead tservers from master, I 
> will support a kudu command to realize it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)