[jira] [Commented] (KUDU-3346) Rebalance fails when trying to decommission tserver on a rack-aware cluster

2022-01-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469005#comment-17469005
 ] 

ASF subversion and git services commented on KUDU-3346:
---

Commit 5ef0168cf0ae4471632d63cad223d7301f415982 in kudu's branch 
refs/heads/master from zhangyifan27
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=5ef0168 ]

KUDU-3346: fix rebalancer tool fails to run with '--ignored_tservers'

Prior to this patch the validity of 'ignored_tservers' was checked when
'BuildClusterinfo', which leads to a failure when the 'raw_info' only contains
contains information of tservers on a specific location. This patch fix it by
moving the parameter validity check into 'KsckResultsToClusterRawInfo', because
ksck results contain original cluster information.

I noticed 'ClusterInfo::tservers_to_empty' is not necessary to be built when
'BuildClusterInfo', because we use this info only for printing cluster's stats
and running IgnoredTserverRunner. This should be refactored in follow-up patch.

This patch adds a regression test for the issue and I also verified this fix on
a real cluster.

Change-Id: I1361f562f3e886077a79c3de8ea5fb2ebb8df6e9
Reviewed-on: http://gerrit.cloudera.org:8080/18114
Reviewed-by: Andrew Wong 
Tested-by: Andrew Wong 


> Rebalance fails when trying to decommission tserver on a rack-aware cluster
> ---
>
> Key: KUDU-3346
> URL: https://issues.apache.org/jira/browse/KUDU-3346
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Georgiana Ogrean
>Assignee: YifanZhang
>Priority: Major
> Attachments: rebalance_ignored_tserver_1c.log.Z, rebalance_v1.log.Z
>
>
> When following the steps [in the 
> docs|https://docs.cloudera.com/runtime/7.2.0/administering-kudu/topics/kudu-decommissioning-or-permanently-removing-tablet-server-from-cluster.html]
>  for decommissioning a tserver, the rebalance job fails with:
> {code:java}
> Invalid argument: ignored tserver  is not reported among know 
> tservers 
> {code}
> Steps followed:
> 1. Checked that ksck passes.
> 2. Put the tserver to be decommissioned in maintenance mode.
> {code:java}
> sudo -u kudu kudu tserver state enter_maintenance $MASTER_ADDRESSES 
> 5ae499b1b870419daabb0e8da90ef233 {code}
> 3. Ran rebalance with {{-ignored_tservers}} and 
> {{-move_replicas_from_ignored_tservers}} flags.
> {code:java}
> sudo -u kudu kudu cluster rebalance $MASTER_ADDRESSES 
> -move_replicas_from_ignored_tservers 
> -ignored_tservers=5ae499b1b870419daabb0e8da90ef233 -v=1{code}
> The logs for the rebalace command are attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KUDU-3346) Rebalance fails when trying to decommission tserver on a rack-aware cluster

2021-12-24 Thread YifanZhang (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464941#comment-17464941
 ] 

YifanZhang commented on KUDU-3346:
--

I think there is something wrong when populating 
`ClusterInfo::tservers_to_empty`, because sometimes the `ClusterRawInfo` only 
contains  tservers/tablets info of a specific location.  I plan to fix it.

> Rebalance fails when trying to decommission tserver on a rack-aware cluster
> ---
>
> Key: KUDU-3346
> URL: https://issues.apache.org/jira/browse/KUDU-3346
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Georgiana Ogrean
>Priority: Major
> Attachments: rebalance_ignored_tserver_1c.log.Z, rebalance_v1.log.Z
>
>
> When following the steps [in the 
> docs|https://docs.cloudera.com/runtime/7.2.0/administering-kudu/topics/kudu-decommissioning-or-permanently-removing-tablet-server-from-cluster.html]
>  for decommissioning a tserver, the rebalance job fails with:
> {code:java}
> Invalid argument: ignored tserver  is not reported among know 
> tservers 
> {code}
> Steps followed:
> 1. Checked that ksck passes.
> 2. Put the tserver to be decommissioned in maintenance mode.
> {code:java}
> sudo -u kudu kudu tserver state enter_maintenance $MASTER_ADDRESSES 
> 5ae499b1b870419daabb0e8da90ef233 {code}
> 3. Ran rebalance with {{-ignored_tservers}} and 
> {{-move_replicas_from_ignored_tservers}} flags.
> {code:java}
> sudo -u kudu kudu cluster rebalance $MASTER_ADDRESSES 
> -move_replicas_from_ignored_tservers 
> -ignored_tservers=5ae499b1b870419daabb0e8da90ef233 -v=1{code}
> The logs for the rebalace command are attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KUDU-3346) Rebalance fails when trying to decommission tserver on a rack-aware cluster

2021-12-23 Thread Georgiana Ogrean (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464857#comment-17464857
 ] 

Georgiana Ogrean commented on KUDU-3346:


In case it helps with getting to the bottom of this:

After noticing that some logs appear twice for tservers in us-east-1c, e.g.
{code:java}
I1223 13:52:53.569551 11613 rebalancer.cc:305] found tserver 
ca2b022920654fd2aacd320adfe39148 at location '/us-east-1/us-east-1c'{code}
I tried placing in maintenance a tserver in that region and then running 
rebalance with the same flags. It fails with the same error as above, but while 
for the other two regions in our cluster all it printed before failing was the 
*Locations load summary* table, when ignoring a tserver in us-east-1c it also 
prints the *replica distribution summary* tables for that region (both 
per-server and per-table). I attached the rebalance log file when the job is 
run with a tserver in us-east-1c ignored after being put in maintenance.

[^rebalance_ignored_tserver_1c.log.Z] 

 

> Rebalance fails when trying to decommission tserver on a rack-aware cluster
> ---
>
> Key: KUDU-3346
> URL: https://issues.apache.org/jira/browse/KUDU-3346
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Georgiana Ogrean
>Priority: Major
> Attachments: rebalance_ignored_tserver_1c.log.Z, rebalance_v1.log.Z
>
>
> When following the steps [in the 
> docs|https://docs.cloudera.com/runtime/7.2.0/administering-kudu/topics/kudu-decommissioning-or-permanently-removing-tablet-server-from-cluster.html]
>  for decommissioning a tserver, the rebalance job fails with:
> {code:java}
> Invalid argument: ignored tserver  is not reported among know 
> tservers 
> {code}
> Steps followed:
> 1. Checked that ksck passes.
> 2. Put the tserver to be decommissioned in maintenance mode.
> {code:java}
> sudo -u kudu kudu tserver state enter_maintenance $MASTER_ADDRESSES 
> 5ae499b1b870419daabb0e8da90ef233 {code}
> 3. Ran rebalance with {{-ignored_tservers}} and 
> {{-move_replicas_from_ignored_tservers}} flags.
> {code:java}
> sudo -u kudu kudu cluster rebalance $MASTER_ADDRESSES 
> -move_replicas_from_ignored_tservers 
> -ignored_tservers=5ae499b1b870419daabb0e8da90ef233 -v=1{code}
> The logs for the rebalace command are attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)