Hi, Xiangdong and Xinyu, The PR https://github.com/apache/iotdb/pull/3797 for JIRA https://issues.apache.org/jira/browse/IOTDB-1564 is ready for review. Please give some suggestions to those codes~.
Thanks. -----邮件原件----- 发件人: Xiangdong Huang <saint...@gmail.com> 发送时间: 2021年8月25日 12:02 收件人: dev <dev@iotdb.apache.org> 主题: Re: 回复: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure detection and election faster Hi, current codes are: ``` long electionWait = ClusterConstant.getElectionLeastTimeOutMs() + Math.abs(random.nextLong() % ClusterConstant.getElectionRandomTimeOutMs()); ``` where the comment says: electionLeastTimeOutMs should be at least as long as a heartbeat; IMO, these two parameters are enough, and we do not need to add more parameters. But the default value can be changed: 1. electionLeastTimeOutMs can be heartbeat *2 or something others, rather than 2 seconds by default. 2. by default, electionRandomTimeOutMs can be 50 ms or something like heartbeat/10 ? Best, ----------------------------------- Xiangdong Huang School of Software, Tsinghua University 黄向东 清华大学 软件学院 Eric Pai <ericpa...@hotmail.com> 于2021年8月23日周一 上午10:18写道: > > Hi, Xiangdong, > > So what your suggestions about the election waiting time? Add another > configuration parameter called election_wait_time_ms, or left as a shorter > hardcode constant? > > 发件人: Eric Pai <ericpa...@hotmail.com> > 日期: 2021年8月21日 星期六 下午7:32 > 收件人: "dev@iotdb.apache.org" <dev@iotdb.apache.org> > 主题: 回复: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure > detection and election faster > > Hi, all, > > Now the randomElectionWait time is hardcode as 3-5s, which is not suitable > when the heartbeat_interval_ms and election_timeout_ms is too small. > > I decide to change it to [2* heartbeat_interval_ms, 2* heartbeat_interval_ms > + 50ms). > > The 50ms is referred from the Raft paper with a low probability and fast > election when split votes happens. > > But I haven’t found any detailed descriptions about the relationship between > heartbeat_interval_ms and the least waiting time. > > Any good suggestions? > > 发件人: 白 渐 > 发送时间: 2021年8月18日 22:14 > 收件人: dev@iotdb.apache.org > 主题: Conclusion about JIRA issue[IOTDB-1564]: Make leader failure > detection and election faster > > Hi, all, > > @Xinyu Tan and me have made a conclusion about the refine of hearbeat and > election related timeout parameters: > > JIRA link: > https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fiss > ues.apache.org%2Fjira%2Fbrowse%2FIOTDB-1564&data=04%7C01%7C%7C9782 > 3463d4104095d18608d9677d1fd9%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C > 0%7C637654609373686618%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJ > QIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XxyiqSz7m > KozmmG4E85jShds9D63H5vEVMfYExv4Sag%3D&reserved=0 > > Two parameters are added: > > heartbeat_interval_ms (t1): The time interval(ms) between two rounds of > heartbeat broadcast of one raft group leader. > > election_timeout_ms (t2 and t3): The election timeout time of candidates and > followers, or as the parameter of waiting for voting result. > > t1 t1 > Leader view: Send HB - - -> Send HB - - -> Send HB > t2 > t3 > Follower view: Receive HB - - -> Receive HB - - - - -> HB expired / > Start election - - - - -> Election Timeout > > I will do the following works sooner or later: > > 1. Coding. > > 2. Proper test cases. > > 3. Docs about new parameters. > > Thanks. > >