[jira] [Commented] (KUDU-2915) Support to delete dead tservers from CLI
[ https://issues.apache.org/jira/browse/KUDU-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480791#comment-17480791 ] ASF subversion and git services commented on KUDU-2915: --- Commit 9d01e1046249a815f26c7b5ebb1ceb2b67f72b9e in kudu's branch refs/heads/master from zhangyifan27 [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=9d01e10 ] KUDU-2915: add tool to unregister a tablet server Add a 'kudu tserver unregister' tool to unregister a tserver from the master. This tool will be useful when we want to decommission a tserver without restarting masters. This tool unregisters the dead tserver from master's in-memory map and removes its persisted state from catalog table by default. It's also possible to unregister a tserver which is not presumed dead by adding '-force_unregister_live_tserver', or keep tserver's persisted state by adding '-remove_tserver_state=false'. Change-Id: If1f5c2979a8d14428f4bcc8e850c57ce228c793a Reviewed-on: http://gerrit.cloudera.org:8080/18124 Reviewed-by: Alexey Serbin Reviewed-by: Andrew Wong Tested-by: Kudu Jenkins > Support to delete dead tservers from CLI > > > Key: KUDU-2915 > URL: https://issues.apache.org/jira/browse/KUDU-2915 > Project: Kudu > Issue Type: Improvement > Components: CLI, ops-tooling >Affects Versions: 1.10.0 >Reporter: Hexin >Assignee: Hexin >Priority: Major > Labels: supportability > > Sometimes the nodes in the cluster will crash due to machine problems such as > disk corruption, which can be very common. However, if there are some dead > tservers, ksck result will always show error (e.g. Not all Tablet Servers are > reachable) although all tables have recovered to be healthy. > The only way now to get the healthy status of ksck is to restart all masters > one by one. In some cases, for example, if the machine has completely > corrupted, we hope to get healthy status of ksck without restarting, since > after restarting masters the cluster will take some time to recover, during > which it will have influence on scanning or upsetting to tables. The recovery > time can be long which mainly depends on the scale of cluster. This problem > can be serious and annoying especially tservers crashed with high-frequency > in a large cluster. > It’s valuable if we have an easier way to delete dead tservers from master, I > will support a kudu command to realize it. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (KUDU-2915) Support to delete dead tservers from CLI
[ https://issues.apache.org/jira/browse/KUDU-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17449978#comment-17449978 ] YifanZhang commented on KUDU-2915: -- I think it's good that we could introduce a tool to unregister a dead tablet server from the master's in-memory state. And on the other hand, I also want to know whether it is safe or reasonable to make master take the initiative to forget a tablet server that have been in 'dead' state for 'a long time' and no replica is running on it. If the same tablet server comes back again, the master re-register it in it's in-memory state. Is there some problems? > Support to delete dead tservers from CLI > > > Key: KUDU-2915 > URL: https://issues.apache.org/jira/browse/KUDU-2915 > Project: Kudu > Issue Type: Improvement > Components: CLI, ops-tooling >Affects Versions: 1.10.0 >Reporter: Hexin >Assignee: Hexin >Priority: Major > Labels: supportability > > Sometimes the nodes in the cluster will crash due to machine problems such as > disk corruption, which can be very common. However, if there are some dead > tservers, ksck result will always show error (e.g. Not all Tablet Servers are > reachable) although all tables have recovered to be healthy. > The only way now to get the healthy status of ksck is to restart all masters > one by one. In some cases, for example, if the machine has completely > corrupted, we hope to get healthy status of ksck without restarting, since > after restarting masters the cluster will take some time to recover, during > which it will have influence on scanning or upsetting to tables. The recovery > time can be long which mainly depends on the scale of cluster. This problem > can be serious and annoying especially tservers crashed with high-frequency > in a large cluster. > It’s valuable if we have an easier way to delete dead tservers from master, I > will support a kudu command to realize it. -- This message was sent by Atlassian Jira (v8.20.1#820001)