[jira] [Updated] (RATIS-857) Thread unsafe RaftServerMetrics::metricsMap HashMap in multi thread
[ https://issues.apache.org/jira/browse/RATIS-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated RATIS-857: - Attachment: (was: RATIS-881.001.patch) > Thread unsafe RaftServerMetrics::metricsMap HashMap in multi thread > --- > > Key: RATIS-857 > URL: https://issues.apache.org/jira/browse/RATIS-857 > Project: Ratis > Issue Type: Bug > Components: metrics >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Fix For: 0.6.0 > > Attachments: RATIS-857.001.patch > > > *What's the problem ?* > The {color:#DE350B}static{color} variable > [RaftServerMetrics::metricsMap|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L71] > is type of HashMap, which is not thread safe. But entry will be > [put|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L76] > into metricsMap by different thread, when create each RaftServerImpl > instance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
[ https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092523#comment-17092523 ] runzhiwang commented on RATIS-883: -- [~shashikant] Could you help review this patch ? Thank you very much. > Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader > > > Key: RATIS-883 > URL: https://issues.apache.org/jira/browse/RATIS-883 > Project: Ratis > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: RATIS-883.001.patch, screenshot-1.png > > > *What's the problem ?* > !screenshot-1.png! > *What's the reason ?* > The reason is follower update commitInfoCache after leader. > The stack of follower update commitInfoCache is: > RaftServerImpl::appendEntriesAsync > -> state.updateStateMachine > -> StateMachineUpdater::applyLog > -> RaftServerImpl::applyLogToStateMachine > -> RaftServerImpl::replyPendingRequest > -> RaftServerImpl::getCommitInfos > -> infos.add(commitInfoCache.update(getPeer(), > state.getLog().getLastCommittedIndex())) > -> CommitInfoCache::update. > The stack of leader update commitInfoCache is: > follower finish RaftServerImpl::appendEntriesAsync and return reply > -> GrpcLogAppender::runAppenderImpl > -> GrpcLogAppender::appendLog > ->LogAppender::createRequest > ->LeaderState::newAppendEntriesRequestProto > ->RaftServerImpl::getCommitInfos > ->LeaderState::updateFollowerCommitInfos > ->CommitInfoCache::update. > Because follower need to notify thread StateMachineUpdater to update > CommitInfoCache, we can not ensure follower update CommitInfoCache before > leader. > *How to fix ?* > Follower update CommitInfoCache before return reply to leader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
[ https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated RATIS-883: - Attachment: RATIS-883.001.patch > Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader > > > Key: RATIS-883 > URL: https://issues.apache.org/jira/browse/RATIS-883 > Project: Ratis > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: RATIS-883.001.patch, screenshot-1.png > > > *What's the problem ?* > !screenshot-1.png! > *What's the reason ?* > The reason is follower update commitInfoCache after leader. > The stack of follower update commitInfoCache is: > RaftServerImpl::appendEntriesAsync > -> state.updateStateMachine > -> StateMachineUpdater::applyLog > -> RaftServerImpl::applyLogToStateMachine > -> RaftServerImpl::replyPendingRequest > -> RaftServerImpl::getCommitInfos > -> infos.add(commitInfoCache.update(getPeer(), > state.getLog().getLastCommittedIndex())) > -> CommitInfoCache::update. > The stack of leader update commitInfoCache is: > follower finish RaftServerImpl::appendEntriesAsync and return reply > -> GrpcLogAppender::runAppenderImpl > -> GrpcLogAppender::appendLog > ->LogAppender::createRequest > ->LeaderState::newAppendEntriesRequestProto > ->RaftServerImpl::getCommitInfos > ->LeaderState::updateFollowerCommitInfos > ->CommitInfoCache::update. > Because follower need to notify thread StateMachineUpdater to update > CommitInfoCache, we can not ensure follower update CommitInfoCache before > leader. > *How to fix ?* > Follower update CommitInfoCache before return reply to leader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
[ https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated RATIS-883: - Description: *What's the problem ?* !screenshot-1.png! *What's the reason ?* The reason is follower update commitInfoCache after leader. The stack of follower update commitInfoCache is: RaftServerImpl::appendEntriesAsync -> state.updateStateMachine -> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> infos.add(commitInfoCache.update(getPeer(), state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update. The stack of leader update commitInfoCache is: follower finish RaftServerImpl::appendEntriesAsync and return reply -> GrpcLogAppender::runAppenderImpl -> GrpcLogAppender::appendLog ->LogAppender::createRequest ->LeaderState::newAppendEntriesRequestProto ->RaftServerImpl::getCommitInfos ->LeaderState::updateFollowerCommitInfos ->CommitInfoCache::update. Because follower need to notify thread StateMachineUpdater to update CommitInfoCache, we can not ensure follower update CommitInfoCache before leader. *How to fix ?* Follower update CommitInfoCache before return reply to leader. was: *What's the problem ?* !screenshot-1.png! *What's the reason ?* The reason is follower update commitInfoCache after leader. The stack of follower update commitInfoCache is: RaftServerImpl::appendEntriesAsync -> state.updateStateMachine -> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> infos.add(commitInfoCache.update(getPeer(), state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update. The stack of leader update commitInfoCache is: follower finish RaftServerImpl::appendEntriesAsync and return reply -> GrpcLogAppender::runAppenderImpl -> GrpcLogAppender::appendLog ->LogAppender::createRequest ->LeaderState::newAppendEntriesRequestProto ->RaftServerImpl::getCommitInfos ->LeaderState::updateFollowerCommitInfos ->CommitInfoCache::update. Because follower need to notify thread StateMachineUpdater to update CommitInfoCache, we can not ensure follower update CommitInfoCache before leader. > Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader > > > Key: RATIS-883 > URL: https://issues.apache.org/jira/browse/RATIS-883 > Project: Ratis > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > > *What's the problem ?* > !screenshot-1.png! > *What's the reason ?* > The reason is follower update commitInfoCache after leader. > The stack of follower update commitInfoCache is: > RaftServerImpl::appendEntriesAsync > -> state.updateStateMachine > -> StateMachineUpdater::applyLog > -> RaftServerImpl::applyLogToStateMachine > -> RaftServerImpl::replyPendingRequest > -> RaftServerImpl::getCommitInfos > -> infos.add(commitInfoCache.update(getPeer(), > state.getLog().getLastCommittedIndex())) > -> CommitInfoCache::update. > The stack of leader update commitInfoCache is: > follower finish RaftServerImpl::appendEntriesAsync and return reply > -> GrpcLogAppender::runAppenderImpl > -> GrpcLogAppender::appendLog > ->LogAppender::createRequest > ->LeaderState::newAppendEntriesRequestProto > ->RaftServerImpl::getCommitInfos > ->LeaderState::updateFollowerCommitInfos > ->CommitInfoCache::update. > Because follower need to notify thread StateMachineUpdater to update > CommitInfoCache, we can not ensure follower update CommitInfoCache before > leader. > *How to fix ?* > Follower update CommitInfoCache before return reply to leader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
[ https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated RATIS-883: - Description: *What's the problem ?* !screenshot-1.png! *What's the reason ?* The reason is follower update commitInfoCache after leader. The stack of follower update commitInfoCache is: RaftServerImpl::appendEntriesAsync -> state.updateStateMachine -> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> infos.add(commitInfoCache.update(getPeer(), state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update. The stack of leader update commitInfoCache is: follower finish RaftServerImpl::appendEntriesAsync and return reply -> GrpcLogAppender::runAppenderImpl -> GrpcLogAppender::appendLog ->LogAppender::createRequest ->LeaderState::newAppendEntriesRequestProto ->RaftServerImpl::getCommitInfos ->LeaderState::updateFollowerCommitInfos ->CommitInfoCache::update. Because follower need to notify thread StateMachineUpdater to update CommitInfoCache, we can not ensure follower update CommitInfoCache before leader. was: *What's the problem ?* !screenshot-1.png! *What's the reason ?* The reason is follower update commitInfoCache after leader. The stack of follower update commitInfoCache is: RaftServerImpl::appendEntriesAsync -> state.updateStateMachine -> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> infos.add(commitInfoCache.update(getPeer(), state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update. The stack of leader update commitInfoCache is: follower finish RaftServerImpl::appendEntriesAsync and return reply -> GrpcLogAppender::runAppenderImpl -> GrpcLogAppender::appendLog ->LogAppender::createRequest ->LeaderState::newAppendEntriesRequestProto ->RaftServerImpl::getCommitInfos ->LeaderState::updateFollowerCommitInfos ->CommitInfoCache::update. > Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader > > > Key: RATIS-883 > URL: https://issues.apache.org/jira/browse/RATIS-883 > Project: Ratis > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > > *What's the problem ?* > !screenshot-1.png! > *What's the reason ?* > The reason is follower update commitInfoCache after leader. > The stack of follower update commitInfoCache is: > RaftServerImpl::appendEntriesAsync > -> state.updateStateMachine > -> StateMachineUpdater::applyLog > -> RaftServerImpl::applyLogToStateMachine > -> RaftServerImpl::replyPendingRequest > -> RaftServerImpl::getCommitInfos > -> infos.add(commitInfoCache.update(getPeer(), > state.getLog().getLastCommittedIndex())) > -> CommitInfoCache::update. > The stack of leader update commitInfoCache is: > follower finish RaftServerImpl::appendEntriesAsync and return reply > -> GrpcLogAppender::runAppenderImpl > -> GrpcLogAppender::appendLog > ->LogAppender::createRequest > ->LeaderState::newAppendEntriesRequestProto > ->RaftServerImpl::getCommitInfos > ->LeaderState::updateFollowerCommitInfos > ->CommitInfoCache::update. > Because follower need to notify thread StateMachineUpdater to update > CommitInfoCache, we can not ensure follower update CommitInfoCache before > leader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
[ https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated RATIS-883: - Description: *What's the problem ?* !screenshot-1.png! *What's the reason ?* The reason is follower update commitInfoCache after leader. The stack of follower update commitInfoCache is: RaftServerImpl::appendEntriesAsync -> state.updateStateMachine -> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> infos.add(commitInfoCache.update(getPeer(), state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update. The stack of leader update commitInfoCache is: follower finish RaftServerImpl::appendEntriesAsync and return reply -> GrpcLogAppender::runAppenderImpl -> GrpcLogAppender::appendLog ->LogAppender::createRequest ->LeaderState::newAppendEntriesRequestProto ->RaftServerImpl::getCommitInfos ->LeaderState::updateFollowerCommitInfos ->CommitInfoCache::update. was: *What's the problem ?* !screenshot-1.png! *What's the reason ?* The reason is follower update commitInfoCache after leader. The stack of follower update commitInfoCache is: RaftServerImpl::appendEntriesAsync -> state.updateStateMachine -> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> infos.add(commitInfoCache.update(getPeer(), state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update. The stack of leader update commitInfoCache is: follower finish RaftServerImpl::appendEntriesAsync and return reply -> GrpcLogAppender::runAppenderImpl -> GrpcLogAppender::appendLog ->LogAppender::createRequest ->LeaderState::newAppendEntriesRequestProto ->RaftServerImpl::getCommitInfos ->LeaderState::updateFollowerCommitInfos- >CommitInfoCache::update. > Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader > > > Key: RATIS-883 > URL: https://issues.apache.org/jira/browse/RATIS-883 > Project: Ratis > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > > *What's the problem ?* > !screenshot-1.png! > *What's the reason ?* > The reason is follower update commitInfoCache after leader. > The stack of follower update commitInfoCache is: > RaftServerImpl::appendEntriesAsync > -> state.updateStateMachine > -> StateMachineUpdater::applyLog > -> RaftServerImpl::applyLogToStateMachine > -> RaftServerImpl::replyPendingRequest > -> RaftServerImpl::getCommitInfos > -> infos.add(commitInfoCache.update(getPeer(), > state.getLog().getLastCommittedIndex())) > -> CommitInfoCache::update. > The stack of leader update commitInfoCache is: > follower finish RaftServerImpl::appendEntriesAsync and return reply > -> GrpcLogAppender::runAppenderImpl > -> GrpcLogAppender::appendLog > ->LogAppender::createRequest > ->LeaderState::newAppendEntriesRequestProto > ->RaftServerImpl::getCommitInfos > ->LeaderState::updateFollowerCommitInfos > ->CommitInfoCache::update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
[ https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated RATIS-883: - Description: *What's the problem ?* !screenshot-1.png! *What's the reason ?* The reason is follower update commitInfoCache after leader. The stack of follower update commitInfoCache is: RaftServerImpl::appendEntriesAsync -> state.updateStateMachine -> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> infos.add(commitInfoCache.update(getPeer(), state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update. The stack of leader update commitInfoCache is: follower finish RaftServerImpl::appendEntriesAsync and return reply -> GrpcLogAppender::runAppenderImpl -> GrpcLogAppender::appendLog ->LogAppender::createRequest ->LeaderState::newAppendEntriesRequestProto ->RaftServerImpl::getCommitInfos ->LeaderState::updateFollowerCommitInfos- >CommitInfoCache::update. was: *What's the problem ?* !screenshot-1.png! *What's the reason ?* The reason is follower update commitInfoCache after leader. The stack of follower update commitInfoCache is: RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> infos.add(commitInfoCache.update(getPeer(), state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update. The stack of leader update commitInfoCache is: follower finish RaftServerImpl::appendEntriesAsync and return reply-> GrpcLogAppender::runAppenderImpl -> GrpcLogAppender::appendLog ->LogAppender::createRequest ->LeaderState::newAppendEntriesRequestProto ->RaftServerImpl::getCommitInfos ->LeaderState::updateFollowerCommitInfos->CommitInfoCache::update. > Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader > > > Key: RATIS-883 > URL: https://issues.apache.org/jira/browse/RATIS-883 > Project: Ratis > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > > *What's the problem ?* > !screenshot-1.png! > *What's the reason ?* > The reason is follower update commitInfoCache after leader. > The stack of follower update commitInfoCache is: > RaftServerImpl::appendEntriesAsync > -> state.updateStateMachine > -> StateMachineUpdater::applyLog > -> RaftServerImpl::applyLogToStateMachine > -> RaftServerImpl::replyPendingRequest > -> RaftServerImpl::getCommitInfos > -> infos.add(commitInfoCache.update(getPeer(), > state.getLog().getLastCommittedIndex())) > -> CommitInfoCache::update. > The stack of leader update commitInfoCache is: > follower finish RaftServerImpl::appendEntriesAsync and return reply > -> GrpcLogAppender::runAppenderImpl > -> GrpcLogAppender::appendLog > ->LogAppender::createRequest > ->LeaderState::newAppendEntriesRequestProto > ->RaftServerImpl::getCommitInfos > ->LeaderState::updateFollowerCommitInfos- > >CommitInfoCache::update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
[ https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated RATIS-883: - Description: *What's the problem ?* !screenshot-1.png! *What's the reason ?* The reason is follower update commitInfoCache after leader. The stack of follower update commitInfoCache is: RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> infos.add(commitInfoCache.update(getPeer(), state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update. The stack of leader update commitInfoCache is: follower finish RaftServerImpl::appendEntriesAsync and return reply-> GrpcLogAppender::runAppenderImpl -> GrpcLogAppender::appendLog ->LogAppender::createRequest ->LeaderState::newAppendEntriesRequestProto ->RaftServerImpl::getCommitInfos ->LeaderState::updateFollowerCommitInfos->CommitInfoCache::update. was: *What's the problem ?* !screenshot-1.png! *What's the reason ?* The reason is follower update commitInfoCache after leader. The stack of follower update commitInfoCache is: RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> infos.add(commitInfoCache.update(getPeer(), state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update. The stack of leader update commitInfoCache is: follower finish RaftServerImpl::appendEntriesAsync and return reply-> GrpcLogAppender::runAppenderImpl -> GrpcLogAppender::appendLog ->LogAppender::createRequest ->LeaderState::newAppendEntriesRequestProto ->RaftServerImpl::getCommitInfos ->LeaderState::updateFollowerCommitInfos->CommitInfoCache::update. > Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader > > > Key: RATIS-883 > URL: https://issues.apache.org/jira/browse/RATIS-883 > Project: Ratis > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > > *What's the problem ?* > !screenshot-1.png! > *What's the reason ?* > The reason is follower update commitInfoCache after leader. > The stack of follower update commitInfoCache is: > RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> > StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> > RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> > infos.add(commitInfoCache.update(getPeer(), > state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update. > The stack of leader update commitInfoCache is: follower finish > RaftServerImpl::appendEntriesAsync and return reply-> > GrpcLogAppender::runAppenderImpl > -> GrpcLogAppender::appendLog ->LogAppender::createRequest > ->LeaderState::newAppendEntriesRequestProto ->RaftServerImpl::getCommitInfos > ->LeaderState::updateFollowerCommitInfos->CommitInfoCache::update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
[ https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated RATIS-883: - Description: *What's the problem ?* !screenshot-1.png! *What's the reason ?* The reason is follower update commitInfoCache after leader. The stack of follower update commitInfoCache is: RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> infos.add(commitInfoCache.update(getPeer(), state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update. The stack of leader update commitInfoCache is: follower finish RaftServerImpl::appendEntriesAsync and return reply-> GrpcLogAppender::runAppenderImpl -> GrpcLogAppender::appendLog ->LogAppender::createRequest ->LeaderState::newAppendEntriesRequestProto ->RaftServerImpl::getCommitInfos ->LeaderState::updateFollowerCommitInfos->CommitInfoCache::update. was: *What's the problem ?* !screenshot-1.png! *What's the reason ?* The reason is follower update commitInfoCache after leader. The stack of follower update commitInfoCache is: RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> infos.add(commitInfoCache.update(getPeer(), state.getLog().getLastCommittedIndex())) -> CommitInfoProto::update. Leader update > Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader > > > Key: RATIS-883 > URL: https://issues.apache.org/jira/browse/RATIS-883 > Project: Ratis > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > > *What's the problem ?* > !screenshot-1.png! > *What's the reason ?* > The reason is follower update commitInfoCache after leader. > The stack of follower update commitInfoCache is: > RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> > StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> > RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> > infos.add(commitInfoCache.update(getPeer(), > state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update. > The stack of leader update commitInfoCache is: follower finish > RaftServerImpl::appendEntriesAsync and return reply-> > GrpcLogAppender::runAppenderImpl -> GrpcLogAppender::appendLog > ->LogAppender::createRequest ->LeaderState::newAppendEntriesRequestProto > ->RaftServerImpl::getCommitInfos > ->LeaderState::updateFollowerCommitInfos->CommitInfoCache::update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
[ https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated RATIS-883: - Description: *What's the problem ?* !screenshot-1.png! *What's the reason ?* The reason is follower update commitInfoCache after leader. The stack of follower update commitInfoCache is: RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> infos.add(commitInfoCache.update(getPeer(), state.getLog().getLastCommittedIndex())) -> CommitInfoProto::update. Leader update was: *What's the problem ?* !screenshot-1.png! *What's the reason ?* The reason is follower update commitInfoCache after leader. Follower update commitInfoCache when finish appendEntry and then state.updateStateMachine -> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> infos.add(commitInfoCache.update(getPeer(), state.getLog().getLastCommittedIndex())) -> CommitInfoProto::update. > Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader > > > Key: RATIS-883 > URL: https://issues.apache.org/jira/browse/RATIS-883 > Project: Ratis > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > > *What's the problem ?* > !screenshot-1.png! > *What's the reason ?* > The reason is follower update commitInfoCache after leader. > The stack of follower update commitInfoCache is: > RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> > StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> > RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> > infos.add(commitInfoCache.update(getPeer(), > state.getLog().getLastCommittedIndex())) -> CommitInfoProto::update. > Leader update -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
[ https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated RATIS-883: - Description: *What's the problem ?* !screenshot-1.png! *What's the reason ?* The reason is follower update commitInfoCache after leader. Follower update commitInfoCache when finish appendEntry and then state.updateStateMachine -> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> infos.add(commitInfoCache.update(getPeer(), state.getLog().getLastCommittedIndex())) -> CommitInfoProto::update. was: *What's the problem ?* !screenshot-1.png! *What's the reason ?* The reason is follower update commitInfoCache after leader. > Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader > > > Key: RATIS-883 > URL: https://issues.apache.org/jira/browse/RATIS-883 > Project: Ratis > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > > *What's the problem ?* > !screenshot-1.png! > *What's the reason ?* > The reason is follower update commitInfoCache after leader. > Follower update commitInfoCache when finish appendEntry and then > state.updateStateMachine -> StateMachineUpdater::applyLog -> > RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest > -> RaftServerImpl::getCommitInfos -> > infos.add(commitInfoCache.update(getPeer(), > state.getLog().getLastCommittedIndex())) -> CommitInfoProto::update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
[ https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated RATIS-883: - Description: *What's the problem ?* !screenshot-1.png! *What's the reason ?* The reason is follower update commitInfoCache after leader. was: *What's the problem ?* !screenshot-1.png! *What's the reason ?* > Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader > > > Key: RATIS-883 > URL: https://issues.apache.org/jira/browse/RATIS-883 > Project: Ratis > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > > *What's the problem ?* > !screenshot-1.png! > *What's the reason ?* > The reason is follower update commitInfoCache after leader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
[ https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated RATIS-883: - Attachment: screenshot-1.png > Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader > > > Key: RATIS-883 > URL: https://issues.apache.org/jira/browse/RATIS-883 > Project: Ratis > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
[ https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated RATIS-883: - Description: *What's the problem ?* !screenshot-1.png! *What's the reason ?* > Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader > > > Key: RATIS-883 > URL: https://issues.apache.org/jira/browse/RATIS-883 > Project: Ratis > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > > *What's the problem ?* > !screenshot-1.png! > *What's the reason ?* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
runzhiwang created RATIS-883: Summary: Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader Key: RATIS-883 URL: https://issues.apache.org/jira/browse/RATIS-883 Project: Ratis Issue Type: Bug Reporter: runzhiwang Assignee: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-857) Thread unsafe RaftServerMetrics::metricsMap HashMap in multi thread
[ https://issues.apache.org/jira/browse/RATIS-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated RATIS-857: - Attachment: RATIS-881.001.patch > Thread unsafe RaftServerMetrics::metricsMap HashMap in multi thread > --- > > Key: RATIS-857 > URL: https://issues.apache.org/jira/browse/RATIS-857 > Project: Ratis > Issue Type: Bug > Components: metrics >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Fix For: 0.6.0 > > Attachments: RATIS-857.001.patch, RATIS-881.001.patch > > > *What's the problem ?* > The {color:#DE350B}static{color} variable > [RaftServerMetrics::metricsMap|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L71] > is type of HashMap, which is not thread safe. But entry will be > [put|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L76] > into metricsMap by different thread, when create each RaftServerImpl > instance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-881) Failed unit test because test before MiniRaftCluster ready
[ https://issues.apache.org/jira/browse/RATIS-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated RATIS-881: - Attachment: RATIS-881.001.patch > Failed unit test because test before MiniRaftCluster ready > -- > > Key: RATIS-881 > URL: https://issues.apache.org/jira/browse/RATIS-881 > Project: Ratis > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: RATIS-881.001.patch, screenshot-1.png > > > For the failed > [TestRaftWithGrpc::testStateMachineMetrics|https://builds.apache.org/job/PreCommit-RATIS-Build/1305/testReport/org.apache.ratis.grpc/TestRaftWithGrpc/testStateMachineMetrics/], > the reason is the > [RaftServerMetrics::getPeerCommitIndexGauge|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L141] > happens before > [RaftServerMetrics::addPeerCommitIndexGauge|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L122]. > > When some RaftServerImpl [setRole(RaftPeerRole.LEADER, > "changeToLeader")|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L345], > the statement > [waitForLeader|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/test/java/org/apache/ratis/RaftBasicTests.java#L446] > succ to get leader and test begin, but > [role.startLeaderState|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L349] > -> > [new > LeaderState|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RoleInfo.java#L94] > -> > [LeaderState::addSenders|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L409]->[RaftServerMetrics::addFollower|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L106] > -> > [RaftServerMetrics::addPeerCommitIndexGauge|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L122] > has not finished. > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (RATIS-840) Memory leak of LogAppender
[ https://issues.apache.org/jira/browse/RATIS-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Elek updated RATIS-840: -- Priority: Blocker (was: Critical) > Memory leak of LogAppender > -- > > Key: RATIS-840 > URL: https://issues.apache.org/jira/browse/RATIS-840 > Project: Ratis > Issue Type: Bug > Components: server >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Blocker > Attachments: RATIS-840.001.patch, RATIS-840.002.patch, > RATIS-840.003.patch, image-2020-04-06-14-27-28-485.png, > image-2020-04-06-14-27-39-582.png, screenshot-1.png, screenshot-2.png > > > *What's the problem ?* > When run hadoop-ozone for 4 days, datanode memory leak. When dump heap, I > found there are 460710 instances of GrpcLogAppender. But there are only 6 > instances of SenderList, and each SenderList contains 1-2 instance of > GrpcLogAppender. And there are a lot of logs related to > [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428]. > {code:java}INFO impl.RaftServerImpl: > 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-LeaderState: > Restarting GrpcLogAppender for > 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-\u003e229cbcc1-a3b2-4383-9c0d-c0f4c28c3d4a\n","stream":"stderr","time":"2020-04-06T03:59:53.37892512Z"}{code} > > So there are a lot of GrpcLogAppender did not stop the Daemon Thread when > removed from senders. > !image-2020-04-06-14-27-28-485.png! > !image-2020-04-06-14-27-39-582.png! > > *Why > [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428] > so many times ?* > 1. As the image shows, when remove group, SegmentedRaftLog will close, then > GrpcLogAppender throw exception when find the SegmentedRaftLog was closed. > Then GrpcLogAppender will be > [restarted|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LogAppender.java#L94], > and the new GrpcLogAppender throw exception again when find the > SegmentedRaftLog was closed, then GrpcLogAppender will be restarted again ... > . It results in an infinite restart of GrpcLogAppender. > 2. Actually, when remove group, GrpcLogAppender will be stoped: > RaftServerImpl::shutdown -> > [RoleInfo::shutdownLeaderState|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L266] > -> LeaderState::stop -> LogAppender::stopAppender, then SegmentedRaftLog > will be closed: RaftServerImpl::shutdown -> > [ServerState:close|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L271] > ... . Though RoleInfo::shutdownLeaderState called before ServerState:close, > but the GrpcLogAppender was stopped asynchronously. So infinite restart of > GrpcLogAppender happens, when GrpcLogAppender stop after SegmentedRaftLog > close. > !screenshot-1.png! > *Why GrpcLogAppender did not stop the Daemon Thread when removed from senders > ?* > I find a lot of GrpcLogAppender blocked inside logs4j. I think it's > GrpcLogAppender restart too fast, then blocked in logs4j. > !screenshot-2.png! > *Can the new GrpcLogAppender work normally ?* > 1. Even though without the above problem, the new created GrpcLogAppender > still can not work normally. > 2. When creat a new GrpcLogAppender, a new FollowerInfo will also be created: > LeaderState::addAndStartSenders -> > LeaderState::addSenders->RaftServerImpl::newLogAppender -> [new > FollowerInfo|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L129] > 3. When the new created GrpcLogAppender append entry to follower, then the > follower response SUCCESS. > 4. Then LeaderState::updateCommit -> [LeaderState::getMajorityMin | > https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L599] > -> > [voterLists.get(0) | > https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L607]. > {color:#DE350B}Error happens because voterLists.get(0) return the > FollowerInfo of the old GrpcLogAppender, not the FollowerInfo of the new > GrpcLogAppender. {color} > 5. Because the majority commit got from the FollowerInfo of the old > GrpcLogAppender never changes. So even though follower has append entry > successfully, the leader can not update commit. So the new created > GrpcLogAppender can never work normally. > 6. The reason of unit test of
[jira] [Resolved] (RATIS-880) Update github description and disable merge options apart from Squash and merge
[ https://issues.apache.org/jira/browse/RATIS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Elek resolved RATIS-880. --- Fix Version/s: 0.6.0 Resolution: Fixed > Update github description and disable merge options apart from Squash and > merge > --- > > Key: RATIS-880 > URL: https://issues.apache.org/jira/browse/RATIS-880 > Project: Ratis > Issue Type: Bug >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 0.6.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Update github description and disable merge options apart from Squash and > merge -- This message was sent by Atlassian Jira (v8.3.4#803005)