[
https://issues.apache.org/jira/browse/RATIS-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kaijie Chen updated RATIS-1769:
-------------------------------
Description:
This is a followup of RATIS-1762. -TransferCommand should not change priority
of peers (or at least not by default).-
-Sadly this will break backward compatibility. But version 3.0 hasn't been
released, so it might be OK.-
-Add a new TransferLeadershipCommand which will not change priority of peers
when transfer leadership.-
-The old TransferCommand is deprecated and keeped as is for backward
compatibility reasons.-
Try to avoid changing priorities before transfer leadership in TransferCommand.
It will fallback to "transfer leadership by changing priority" for backward
compatibility.
h3. Example
{code:java}
$ bin/ratis sh election transfer -peers
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124[main]
INFO org.reflections.Reflections - Reflections took 157 ms to scan 1 urls,
producing 5 keys and 18 values[main] WARN
org.apache.ratis.metrics.MetricRegistries - Found multiple MetricRegistries
implementations: class org.apache.ratis.metrics.impl.MetricRegistriesImpl,
class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first
found implementation:
org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849Transferring
leadership to server with address <127.0.0.1:10124> Transferring leadership
initiated{code}
h3. Backward compatibility
Ratis shell version: {{{}3.0.0-SNAPSHOT{}}}.
Ratis server version: {{{}2.4.1{}}}.
{code:java}
$ bin/ratis sh election transfer -peers
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124[main]
INFO org.reflections.Reflections - Reflections took 131 ms to scan 1 urls,
producing 5 keys and 18 values[main] WARN
org.apache.ratis.metrics.MetricRegistries - Found multiple MetricRegistries
implementations: class org.apache.ratis.metrics.impl.MetricRegistriesImpl,
class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first
found implementation:
org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849Transferring
leadership to server with address <127.0.0.1:10124> Changing priority of
<127.0.0.1:10124> to 2: Transferring leadership to server with address
<127.0.0.1:10124> Changing priority of <127.0.0.1:10124> to 3: Transferring
leadership initiated{code}
h3. In case of failure
In most cases, just a retry will fix the problem. And users can also set
timeout manually by {{-timeout}} option.
{code:java}
$ bin/ratis sh election transfer -peers
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10124[main]
INFO org.reflections.Reflections - Reflections took 135 ms to scan 1 urls,
producing 5 keys and 18 values[main] WARN
org.apache.ratis.metrics.MetricRegistries - Found multiple MetricRegistries
implementations: class org.apache.ratis.metrics.impl.MetricRegistriesImpl,
class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first
found implementation:
org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849Transferring
leadership to server with address <127.0.0.1:10124> Failed to transfer peer n1
with address 127.0.0.1:10124:
org.apache.ratis.protocol.exceptions.TransferLeadershipException:
n2@group-ABB3109A44C1: Failed to transfer leadership to n1 (timed out 3000ms):
current leader is n2 at
org.apache.ratis.server.impl.TransferLeadership$PendingRequest.complete(TransferLeadership.java:67)
at
org.apache.ratis.server.impl.TransferLeadership.lambda$finish$7(TransferLeadership.java:163)
at java.util.Optional.ifPresent(Optional.java:159) at
org.apache.ratis.server.impl.TransferLeadership.finish(TransferLeadership.java:163)
at
org.apache.ratis.server.impl.TransferLeadership.lambda$start$4(TransferLeadership.java:136)
at
org.apache.ratis.util.TimeoutTimer.lambda$onTimeout$2(TimeoutTimer.java:101) at
org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38) at
org.apache.ratis.util.LogUtils$1.run(LogUtils.java:79) at
org.apache.ratis.util.TimeoutTimer$Task.run(TimeoutTimer.java:55) at
java.util.TimerThread.mainLoop(Timer.java:555) at
java.util.TimerThread.run(Timer.java:505){code}
was:
This is a followup of RATIS-1762. -TransferCommand should not change priority
of peers (or at least not by default).-
-Sadly this will break backward compatibility. But version 3.0 hasn't been
released, so it might be OK.-
-Add a new TransferLeadershipCommand which will not change priority of peers
when transfer leadership.-
-The old TransferCommand is deprecated and keeped as is for backward
compatibility reasons.-
Try to avoid changing priorities before transfer leadership in TransferCommand.
It will fallback to "transfer leadership by changing priority" for backward
compatibility.
Example:
{{bin/ratis sh election transfer -peers
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:10024}}
Backward compatibility:
{{{}Ratis shell version: 3.0.0-SNAPSHOT{}}}.
{{{}Ratis server version: 2.4.1{}}}.
{code:java}
$ bin/ratis sh election transfer -peers
127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:11124[main]
INFO org.reflections.Reflections - Reflections took 164 ms to scan 1 urls,
producing 5 keys and 18 values[main] WARN
org.apache.ratis.metrics.MetricRegistries - Found multiple MetricRegistries
implementations: class org.apache.ratis.metrics.impl.MetricRegistriesImpl,
class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first
found implementation:
org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849Transferring
leadership to server with address <127.0.0.1:11124> Changing priority of
<127.0.0.1:11124> to 1: caught an error when executing transfer:
n0@group-ABB3109A44C1 refused to transfer leadership to peer n2 as it does not
has highest priority 9:
peers:[n0|rpc:127.0.0.1:10024|admin:|client:|dataStream:|priority:1|startupRole:FOLLOWER,
n1|rpc:127.0.0.1:10124|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER,
n2|rpc:127.0.0.1:11124|admin:|client:|dataStream:|priority:1|startupRole:FOLLOWER]|listeners:[],
old=nullTransferring leadership to server with address <127.0.0.1:11124>
Changing priority of <127.0.0.1:11124> to 2: Transferring leadership
initiated{code}
> Avoid changing priorities in TransferCommand unless necessary
> -------------------------------------------------------------
>
> Key: RATIS-1769
> URL: https://issues.apache.org/jira/browse/RATIS-1769
> Project: Ratis
> Issue Type: Sub-task
> Reporter: Kaijie Chen
> Priority: Major
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
> This is a followup of RATIS-1762. -TransferCommand should not change priority
> of peers (or at least not by default).-
> -Sadly this will break backward compatibility. But version 3.0 hasn't been
> released, so it might be OK.-
> -Add a new TransferLeadershipCommand which will not change priority of peers
> when transfer leadership.-
> -The old TransferCommand is deprecated and keeped as is for backward
> compatibility reasons.-
> Try to avoid changing priorities before transfer leadership in
> TransferCommand.
> It will fallback to "transfer leadership by changing priority" for backward
> compatibility.
> h3. Example
> {code:java}
> $ bin/ratis sh election transfer -peers
> 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address
> 127.0.0.1:10124[main] INFO org.reflections.Reflections - Reflections took 157
> ms to scan 1 urls, producing 5 keys and 18 values[main] WARN
> org.apache.ratis.metrics.MetricRegistries - Found multiple MetricRegistries
> implementations: class org.apache.ratis.metrics.impl.MetricRegistriesImpl,
> class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using
> first found implementation:
> org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849Transferring
> leadership to server with address <127.0.0.1:10124> Transferring leadership
> initiated{code}
> h3. Backward compatibility
> Ratis shell version: {{{}3.0.0-SNAPSHOT{}}}.
> Ratis server version: {{{}2.4.1{}}}.
> {code:java}
> $ bin/ratis sh election transfer -peers
> 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address
> 127.0.0.1:10124[main] INFO org.reflections.Reflections - Reflections took 131
> ms to scan 1 urls, producing 5 keys and 18 values[main] WARN
> org.apache.ratis.metrics.MetricRegistries - Found multiple MetricRegistries
> implementations: class org.apache.ratis.metrics.impl.MetricRegistriesImpl,
> class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using
> first found implementation:
> org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849Transferring
> leadership to server with address <127.0.0.1:10124> Changing priority of
> <127.0.0.1:10124> to 2: Transferring leadership to server with address
> <127.0.0.1:10124> Changing priority of <127.0.0.1:10124> to 3: Transferring
> leadership initiated{code}
> h3. In case of failure
> In most cases, just a retry will fix the problem. And users can also set
> timeout manually by {{-timeout}} option.
> {code:java}
> $ bin/ratis sh election transfer -peers
> 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address
> 127.0.0.1:10124[main] INFO org.reflections.Reflections - Reflections took 135
> ms to scan 1 urls, producing 5 keys and 18 values[main] WARN
> org.apache.ratis.metrics.MetricRegistries - Found multiple MetricRegistries
> implementations: class org.apache.ratis.metrics.impl.MetricRegistriesImpl,
> class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using
> first found implementation:
> org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849Transferring
> leadership to server with address <127.0.0.1:10124> Failed to transfer peer
> n1 with address 127.0.0.1:10124:
> org.apache.ratis.protocol.exceptions.TransferLeadershipException:
> n2@group-ABB3109A44C1: Failed to transfer leadership to n1 (timed out
> 3000ms): current leader is n2 at
> org.apache.ratis.server.impl.TransferLeadership$PendingRequest.complete(TransferLeadership.java:67)
> at
> org.apache.ratis.server.impl.TransferLeadership.lambda$finish$7(TransferLeadership.java:163)
> at java.util.Optional.ifPresent(Optional.java:159) at
> org.apache.ratis.server.impl.TransferLeadership.finish(TransferLeadership.java:163)
> at
> org.apache.ratis.server.impl.TransferLeadership.lambda$start$4(TransferLeadership.java:136)
> at
> org.apache.ratis.util.TimeoutTimer.lambda$onTimeout$2(TimeoutTimer.java:101)
> at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38) at
> org.apache.ratis.util.LogUtils$1.run(LogUtils.java:79) at
> org.apache.ratis.util.TimeoutTimer$Task.run(TimeoutTimer.java:55) at
> java.util.TimerThread.mainLoop(Timer.java:555) at
> java.util.TimerThread.run(Timer.java:505){code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)