[ 
https://issues.apache.org/jira/browse/RATIS-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301135#comment-16301135
 ] 

Tsz Wo Nicholas Sze commented on RATIS-178:
-------------------------------------------

{code}
  @Test
  public void testLateServerStart() throws Exception {
    final int numServer = 3;
    LOG.info("Running testLateServerStart");
    final MiniRaftCluster cluster = newCluster(numServer);
    cluster.initServers();

    // start all except one servers
    final Iterator<RaftServerProxy> i = cluster.getServers().iterator();
    for(int j = 1; j < numServer; j++) {
      i.next().start();
    }

    final RaftServerImpl leader = waitForLeader(cluster);
    TimeUnit.SECONDS.sleep(10);

    // start the last server
    final RaftServerProxy lastServer = i.next();
    lastServer.start();
    final RaftPeerId lastServerLeaderId = JavaUtils.attempt(
        () -> getLeader(lastServer.getImpl().getState()),
        10, 1000, "getLeaderId", LOG);
    Assert.assertEquals(leader.getId(), lastServerLeaderId);
  }

  static RaftPeerId getLeader(ServerState state) {
    final RaftPeerId leader = state.getLeaderId();
    if (leader == null) {
      throw new IllegalStateException("No leader yet");
    }
    return leader;
  }
{code}
The test above can reproduce the bug.  The last server s2 may not able to join 
the group.  s2 keeps starting a leader election but s0 and s1 keep withhold the 
vote.
{code}
2017-12-22 16:48:13,621 INFO  impl.RaftServerImpl 
(RaftServerImpl.java:requestVote(622)) - s0 Withhold vote from server s2 with 
term 1. This server:  LEADER group-C2DF75108086 s0:t2, leader=s0, voted=s0, 
raftlog=[(t:2, i:0)], conf=[s0:0.0.0.0:55968, s1:0.0.0.0:55969, 
s2:0.0.0.0:55970], old=null RUNNING, last rpc time from leader s0 is -1
2017-12-22 16:48:13,621 INFO  impl.RaftServerImpl 
(RaftServerImpl.java:requestVote(622)) - s1 Withhold vote from server s2 with 
term 1. This server:FOLLOWER group-C2DF75108086 s1:t2, leader=s0, voted=s0, 
raftlog=[(t:2, i:0)], conf=[s0:0.0.0.0:55968, s1:0.0.0.0:55969, 
s2:0.0.0.0:55970], old=null RUNNING, last rpc time from leader s0 is 11137ms
{code}


> The third server cannot join the raft group
> -------------------------------------------
>
>                 Key: RATIS-178
>                 URL: https://issues.apache.org/jira/browse/RATIS-178
>             Project: Ratis
>          Issue Type: Bug
>            Reporter: Tsz Wo Nicholas Sze
>
> When two servers starts in a 3-server group, they may elect a leader and then 
> start the service.  Then, start the third server.  It somehow fails to join 
> the group.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to