Hi Eric, I just read the HeartbeatThread.java file. Follower waits ClusterConstant.getConnectionTimeoutInMS() time interval for getting the new heartbeat. If not, it will wait a random time to start its election.
Leader sends heartbeat per ClusterConstant.getHeartBeatIntervalMs(). Seems that Follower does not know the heartbeatInterval... Best, ----------------------------------- Xiangdong Huang School of Software, Tsinghua University 黄向东 清华大学 软件学院 Eric Pai <[email protected]> 于2021年8月21日周六 下午7:33写道: > > Hi, all, > > Now the randomElectionWait time is hardcode as 3-5s, which is not suitable > when the heartbeat_interval_ms and election_timeout_ms is too small. > > I decide to change it to [2* heartbeat_interval_ms, 2* heartbeat_interval_ms > + 50ms). > > The 50ms is referred from the Raft paper with a low probability and fast > election when split votes happens. > > But I haven’t found any detailed descriptions about the relationship between > heartbeat_interval_ms and the least waiting time. > > Any good suggestions? > > 发件人: 白 渐 > 发送时间: 2021年8月18日 22:14 > 收件人: [email protected] > 主题: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure detection > and election faster > > Hi, all, > > @Xinyu Tan and me have made a conclusion about the refine of hearbeat and > election related timeout parameters: > > JIRA link: https://issues.apache.org/jira/browse/IOTDB-1564 > > Two parameters are added: > > heartbeat_interval_ms (t1): The time interval(ms) between two rounds of > heartbeat broadcast of one raft group leader. > > election_timeout_ms (t2 and t3): The election timeout time of candidates and > followers, or as the parameter of waiting for voting result. > > t1 t1 > Leader view: Send HB - - -> Send HB - - -> Send HB > t2 > t3 > Follower view: Receive HB - - -> Receive HB - - - - -> HB expired / Start > election - - - - -> Election Timeout > > I will do the following works sooner or later: > > 1. Coding. > > 2. Proper test cases. > > 3. Docs about new parameters. > > Thanks. > >
