Hello,
Based on your inputs, I was able to reproduce the issue consistently.
1. After starting n1, n2 and n3 nodes
```
./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001,0.0.0.0:9002,
0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2
[main] WARN org.apache.ratis.metrics.MetricRegistriesLoader - Found
multiple MetricRegistries: [class
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl]. Using the
first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl
group id: 02511d47-d67c-49a3-9011-abb3109a44c2
leader info: n1(0.0.0.0:9000)
[server {
id: "n1"
address: "0.0.0.0:9000"
startupRole: FOLLOWER
}
commitIndex: 8
, server {
id: "n2"
address: "0.0.0.0:9001"
startupRole: FOLLOWER
}
commitIndex: 8
, server {
id: "n3"
address: "0.0.0.0:9002"
startupRole: FOLLOWER
}
commitIndex: 8
]
applied {
term: 1
index: 8
}
committed {
term: 1
index: 8
}
lastEntry {
term: 1
index: 8
}
```
2. After adding n4 as listener
```
./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001,0.0.0.0:9002,
0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2
[main] WARN org.apache.ratis.metrics.MetricRegistriesLoader - Found
multiple MetricRegistries: [class
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl]. Using the
first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl
group id: 02511d47-d67c-49a3-9011-abb3109a44c2
leader info: n1(0.0.0.0:9000)
[server {
id: "n1"
address: "0.0.0.0:9000"
startupRole: FOLLOWER
}
commitIndex: 12
, server {
id: "n2"
address: "0.0.0.0:9001"
startupRole: FOLLOWER
}
commitIndex: 12
, server {
id: "n3"
address: "0.0.0.0:9002"
startupRole: FOLLOWER
}
commitIndex: 12
, server {
id: "n4"
address: "0.0.0.0:9003"
startupRole: LISTENER
}
commitIndex: 12
]
applied {
term: 1
index: 12
}
committed {
term: 1
index: 12
}
lastEntry {
term: 1
index: 12
}
```
3. After killing n3 and promoting n4 as follower
```
❯ ./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001,0.0.0.0:9002,
0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2
[main] WARN org.apache.ratis.metrics.MetricRegistriesLoader - Found
multiple MetricRegistries: [class
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl]. Using the
first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl
group id: 02511d47-d67c-49a3-9011-abb3109a44c2
leader info: n1(0.0.0.0:9000)
[server {
id: "n1"
address: "0.0.0.0:9000"
startupRole: FOLLOWER
}
commitIndex: 16
, server {
id: "n2"
address: "0.0.0.0:9001"
startupRole: FOLLOWER
}
commitIndex: 16
, server {
id: "n4"
address: "0.0.0.0:9003"
startupRole: FOLLOWER
}
commitIndex: 16
]
applied {
term: 1
index: 16
}
committed {
term: 1
index: 16
}
lastEntry {
term: 1
index: 16
}
```
4. After killing n1 (leader) instance
```
❯ ./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001,0.0.0.0:9002,
0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2
[main] WARN org.apache.ratis.metrics.MetricRegistriesLoader - Found
multiple MetricRegistries: [class
org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl]. Using the
first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
exception
at
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:368)
at
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:349)
at
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:174)
at
org.apache.ratis.proto.grpc.AdminProtocolServiceGrpc$AdminProtocolServiceBlockingStub.groupList(AdminProtocolServiceGrpc.java:573)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupList(GrpcClientProtocolClient.java:167)
at
org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:106)
at
org.apache.ratis.client.impl.BlockingImpl.sendRequest(BlockingImpl.java:147)
at
org.apache.ratis.client.impl.BlockingImpl.sendRequestWithRetry(BlockingImpl.java:109)
at
org.apache.ratis.client.impl.GroupManagementImpl.list(GroupManagementImpl.java:69)
at
org.apache.ratis.shell.cli.CliUtils.lambda$getGroupId$1(CliUtils.java:118)
at
org.apache.ratis.shell.cli.CliUtils.applyFunctionReturnFirstNonNull(CliUtils.java:72)
at org.apache.ratis.shell.cli.CliUtils.getGroupId(CliUtils.java:117)
at
org.apache.ratis.shell.cli.sh.command.AbstractRatisCommand.run(AbstractRatisCommand.java:70)
at
org.apache.ratis.shell.cli.sh.group.GroupInfoCommand.run(GroupInfoCommand.java:47)
at org.apache.ratis.shell.cli.AbstractShell.run(AbstractShell.java:104)
at org.apache.ratis.shell.cli.sh.RatisShell.main(RatisShell.java:62)
Caused by:
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
Connection refused: /0.0.0.0:9000
Caused by: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.Net.pollConnect(Native Method)
at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
at
java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946)
at
org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:336)
at
org.apache.ratis.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:339)
at
org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:784)
at
org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:732)
at
org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:658)
at
org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
at
org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:833)
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
exception
at
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:368)
at
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:349)
at
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:174)
at
org.apache.ratis.proto.grpc.AdminProtocolServiceGrpc$AdminProtocolServiceBlockingStub.groupInfo(AdminProtocolServiceGrpc.java:580)
at
org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupInfo(GrpcClientProtocolClient.java:173)
at
org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:110)
at
org.apache.ratis.client.impl.BlockingImpl.sendRequest(BlockingImpl.java:147)
at
org.apache.ratis.client.impl.BlockingImpl.sendRequestWithRetry(BlockingImpl.java:109)
at org.apache.ratis.client.impl.GroupManagementImpl.info
(GroupManagementImpl.java:79)
at
org.apache.ratis.shell.cli.CliUtils.lambda$getGroupInfo$2(CliUtils.java:146)
at
org.apache.ratis.shell.cli.CliUtils.applyFunctionReturnFirstNonNull(CliUtils.java:72)
at org.apache.ratis.shell.cli.CliUtils.getGroupInfo(CliUtils.java:145)
at
org.apache.ratis.shell.cli.sh.command.AbstractRatisCommand.run(AbstractRatisCommand.java:71)
at
org.apache.ratis.shell.cli.sh.group.GroupInfoCommand.run(GroupInfoCommand.java:47)
at org.apache.ratis.shell.cli.AbstractShell.run(AbstractShell.java:104)
at org.apache.ratis.shell.cli.sh.RatisShell.main(RatisShell.java:62)
Caused by:
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
Connection refused: /0.0.0.0:9000
Caused by: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.Net.pollConnect(Native Method)
at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
at
java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946)
at
org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:336)
at
org.apache.ratis.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:339)
at
org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:784)
at
org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:732)
at
org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:658)
at
org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
at
org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at
org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:833)
group id: 02511d47-d67c-49a3-9011-abb3109a44c2
leader info: ()
[server {
id: "n2"
address: "0.0.0.0:9001"
startupRole: FOLLOWER
}
commitIndex: 16
, server {
id: "n1"
address: "0.0.0.0:9000"
startupRole: FOLLOWER
}
commitIndex: 16
, server {
id: "n4"
address: "0.0.0.0:9003"
startupRole: FOLLOWER
}
commitIndex: 16
]
applied {
term: 1
index: 16
}
committed {
term: 1
index: 16
}
lastEntry {
term: 1
index: 16
}
```
Logs from n4
```
INFO [2026-01-15 17:48:06,696] [grpc-default-executor-2]
[RaftServer$Division]: n4@group-ABB3109A44C2 replies to PRE_VOTE vote
request: n2<-n4#0:FAIL-t1-last:(t:1, i:16). Peer's state:
n4@group-ABB3109A44C2:t1, leader=n1, voted=null,
raftlog=Memoized:n4@group-ABB3109A44C2-SegmentedRaftLog:OPENED:c16:last(t:1,
i:16), conf=conf: {index: 15, cur=peers:[n1|0.0.0.0:9000, n2|0.0.0.0:9001,
n4|0.0.0.0:9003]|listeners:[], old=null}
INFO [2026-01-15 17:48:06,897] [grpc-default-executor-2]
[RaftServer$Division]: n4@group-ABB3109A44C2: receive requestVote(PRE_VOTE,
n2, group-ABB3109A44C2, 1, (t:1, i:16))
INFO [2026-01-15 17:48:06,897] [grpc-default-executor-2] [VoteContext]:
n4@group-ABB3109A44C2-LISTENER: reject PRE_VOTE from n2: this server is a
listener, who is a non-voting member
```
Logs from n2
```
INFO [2026-01-15 17:48:03,347] [n2@group-ABB3109A44C2-LeaderElection176]
[LeaderElection]: n2@group-ABB3109A44C2-LeaderElection176 PRE_VOTE round 0:
submit vote requests at term 1 for conf: {index: 15, cur=peers:[n1|
0.0.0.0:9000, n2|0.0.0.0:9001, n4|0.0.0.0:9003]|listeners:[], old=null}
INFO [2026-01-15 17:48:03,348] [n2@group-ABB3109A44C2-LeaderElection176]
[LeaderElection]: n2@group-ABB3109A44C2-LeaderElection176 got exception
when requesting votes: java.util.concurrent.ExecutionException:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
exception
INFO [2026-01-15 17:48:03,352] [n2@group-ABB3109A44C2-LeaderElection176]
[LeaderElection]: n2@group-ABB3109A44C2-LeaderElection176: PRE_VOTE
REJECTED received 1 response(s) and 1 exception(s):
INFO [2026-01-15 17:48:03,352] [n2@group-ABB3109A44C2-LeaderElection176]
[LeaderElection]: Response 0: n2<-n4#0:FAIL-t1-last:(t:1, i:16)
INFO [2026-01-15 17:48:03,352] [n2@group-ABB3109A44C2-LeaderElection176]
[LeaderElection]: Exception 1: java.util.concurrent.ExecutionException:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io
exception
```
This indicates that the cluster is in an unstable state.
I am willing to contribute, could you guide me a bit on this?
Regards,
Snehasish
On Wed, 14 Jan 2026 at 08:44, Snehasish Roy <[email protected]>
wrote:
> Hello,
>
>
> Thank you for your inputs. I will check and update this thread.
>
>
> Regards,
> Snehasish
>
> On Wed, 7 Jan, 2026, 8:52 am Xinyu Tan, <[email protected]> wrote:
>
>> Hi,Snehasish
>>
>> In your scenario, if you kill n3, which is acting as a follower, the
>> cluster will have 3 non-listener and 1 listener, with one follower already
>> offline. At this point, the majority situation becomes quite risky because
>> if any non-listener goes down from here, the Raft group will not be able to
>> form a quorum and elect a new leader.
>>
>> Although you have promoted n4 to a listener and removed n3, before this
>> request completes, the majority of the Raft group is still 2. Therefore,
>> after you kill n1, a new leader cannot be elected. In my understanding,
>> this phenomenon is not a bug and aligns with the expected behavior of the
>> algorithm.
>>
>> If you want to test how to safely promote a listener to a follower, make
>> sure that before the promotion request completes (you can confirm this with
>> shell commands as suggested by sze), the current leader and follower
>> members maintain the majority online. Otherwise, the promotion action will
>> not be successful, and this is not a problem with the implementation but a
>> boundary of the Raft algorithm.
>>
>> Feel free to do more testing on this feature of Ratis. If you encounter
>> the following issues, it would indicate that there is indeed a problem with
>> the implementation, and we welcome discussions and contributions:
>> * You find that even with the majority of leader and follower members
>> online, you still cannot successfully promote a listener to a follower.
>> * In your case, because the majority was not maintained, the member
>> change failed. But after you restart n1 or n3 and re-establish the
>> majority, the Raft group still cannot elect a leader or elects a leader but
>> fails to perform member changes.
>>
>> We look forward to your testing.
>>
>> Best
>> --------------
>> Xinyu Tan
>>
>>
>> On 2025/12/29 10:53:40 Snehasish Roy wrote:
>> > Hello everyone,
>> >
>> > Happy Holidays. This is my first email to this community so kindly
>> excuse
>> > me for any mistakes.
>> >
>> > I initially started a 3 node Ratis Cluster and then added a listener in
>> the
>> > Cluster using the setConfiguration(List.of(n1,n2,n3), List.of(n4))
>> based on
>> > the following documentation
>> > https://jojochuang.github.io/ratis-site/docs/developer-guide/listeners
>> >
>> > ```
>> > INFO [2025-12-29 15:57:01,887] [n1-server-thread1]
>> [RaftServer$Division]:
>> > n1@group-ABB3109A44C2-LeaderStateImpl: startSetConfiguration
>> > SetConfigurationRequest:client-044D31187FB4->n1@group-ABB3109A44C2,
>> cid=3,
>> > seq=null, RW, null, SET_UNCONDITIONALLY, servers:[n1|0.0.0.0:9000, n2|
>> > 0.0.0.0:9001, n3|0.0.0.0:9002], listeners:[n4|0.0.0.0:9003]
>> > ```
>> >
>> > Then I killed one of the Ratis follower node (n3) followed by promoting
>> the
>> > listener to the follower using setConfiguration(List.of(n1,n2,n4))
>> command
>> > to maintain the cluster size of 3.
>> > Please note that n3 has been removed from the list of followers and
>> there
>> > are no more listeners in the cluster and there were no failures observed
>> > while issuing the command.
>> >
>> > ```
>> > INFO [2025-12-29 16:02:54,227] [n1-server-thread2]
>> [RaftServer$Division]:
>> > n1@group-ABB3109A44C2-LeaderStateImpl: startSetConfiguration
>> > SetConfigurationRequest:client-2438CA24E2F3->n1@group-ABB3109A44C2,
>> cid=4,
>> > seq=null, RW, null, SET_UNCONDITIONALLY, servers:[n1|0.0.0.0:9000, n2|
>> > 0.0.0.0:9001, n4|0.0.0.0:9003], listeners:[]
>> > ```
>> >
>> > Then I killed the leader instance n1. Post which n2 attempted to become
>> a
>> > leader and starts asking for votes from n1 and n4. There is no response
>> > from n1 as it's not alive and n4 is rejecting the pre_vote request from
>> n2
>> > because it still thinks it's a listener.
>> >
>> > Logs from n2
>> > ```
>> > INFO [2025-12-29 16:04:10,051] [n2@group-ABB3109A44C2-LeaderElection30
>> ]
>> > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection30 PRE_VOTE
>> round 0:
>> > submit vote requests at term 1 for conf: {index: 15, cur=peers:[n1|
>> > 0.0.0.0:9000, n2|0.0.0.0:9001, n4|0.0.0.0:9003]|listeners:[], old=null}
>> > INFO [2025-12-29 16:04:10,052] [n2@group-ABB3109A44C2-LeaderElection30
>> ]
>> > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection30 got exception
>> when
>> > requesting votes: java.util.concurrent.ExecutionException:
>> > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
>> UNAVAILABLE: io
>> > exception
>> > INFO [2025-12-29 16:04:10,054] [n2@group-ABB3109A44C2-LeaderElection30
>> ]
>> > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection30: PRE_VOTE
>> REJECTED
>> > received 1 response(s) and 1 exception(s):
>> > INFO [2025-12-29 16:04:10,054] [n2@group-ABB3109A44C2-LeaderElection30
>> ]
>> > [LeaderElection]: Response 0: n2<-n4#0:FAIL-t1-last:(t:1, i:16)
>> > INFO [2025-12-29 16:04:10,054] [n2@group-ABB3109A44C2-LeaderElection30
>> ]
>> > [LeaderElection]: Exception 1:
>> java.util.concurrent.ExecutionException:
>> > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
>> UNAVAILABLE: io
>> > exception
>> > ```
>> >
>> >
>> > Due to lack of leader, the cluster is no more stable.
>> >
>> > Logs from n4
>> > ```
>> > INFO [2025-12-29 16:05:03,405] [grpc-default-executor-2]
>> > [RaftServer$Division]: n4@group-ABB3109A44C2: receive
>> requestVote(PRE_VOTE,
>> > n2, group-ABB3109A44C2, 1, (t:1, i:16))
>> > INFO [2025-12-29 16:05:03,405] [grpc-default-executor-2] [VoteContext]:
>> > n4@group-ABB3109A44C2-LISTENER: reject PRE_VOTE from n2: this server
>> is a
>> > listener, who is a non-voting member
>> > INFO [2025-12-29 16:05:03,405] [grpc-default-executor-2]
>> > [RaftServer$Division]: n4@group-ABB3109A44C2 replies to PRE_VOTE vote
>> > request: n2<-n4#0:FAIL-t1-last:(t:1, i:16). Peer's state:
>> > n4@group-ABB3109A44C2:t1, leader=n1, voted=null,
>> > raftlog=Memoized:n4@group-ABB3109A44C2-SegmentedRaftLog
>> :OPENED:c16:last(t:1,
>> > i:16), conf=conf: {index: 15, cur=peers:[n1|0.0.0.0:9000, n2|
>> 0.0.0.0:9001,
>> > n4|0.0.0.0:9003]|listeners:[], old=null}
>> > ```
>> >
>> > So my question is how to correctly promote a listener to a follower?
>> Did I
>> > miss some step? Or is there a bug in the code? If it's the latter, I
>> would
>> > be happy to contribute. Please let me know if you need any more
>> debugging
>> > information.
>> >
>> > Thank you again for looking into this issue.
>> >
>> >
>> > Regards,
>> > Snehasish
>> >
>>
>