[jira] [Updated] (RATIS-857) Thread unsafe RaftServerMetrics::metricsMap HashMap in multi thread

2020-04-25 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated RATIS-857:
-
Attachment: (was: RATIS-881.001.patch)

> Thread unsafe RaftServerMetrics::metricsMap HashMap in multi thread
> ---
>
> Key: RATIS-857
> URL: https://issues.apache.org/jira/browse/RATIS-857
> Project: Ratis
>  Issue Type: Bug
>  Components: metrics
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: RATIS-857.001.patch
>
>
> *What's the problem ?*
> The {color:#DE350B}static{color} variable 
> [RaftServerMetrics::metricsMap|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L71]
>  is type of HashMap, which is not thread safe. But entry will be 
> [put|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L76]
>  into metricsMap by different thread, when create each RaftServerImpl 
> instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader

2020-04-25 Thread runzhiwang (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092523#comment-17092523
 ] 

runzhiwang commented on RATIS-883:
--

[~shashikant] Could you help review this patch ? Thank you very much.

> Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
> 
>
> Key: RATIS-883
> URL: https://issues.apache.org/jira/browse/RATIS-883
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: RATIS-883.001.patch, screenshot-1.png
>
>
> *What's the problem ?*
>  !screenshot-1.png! 
> *What's the reason ?*
> The reason is follower update commitInfoCache after leader.
> The stack of follower update commitInfoCache is: 
> RaftServerImpl::appendEntriesAsync
> -> state.updateStateMachine 
> -> StateMachineUpdater::applyLog 
> -> RaftServerImpl::applyLogToStateMachine
> -> RaftServerImpl::replyPendingRequest 
> -> RaftServerImpl::getCommitInfos 
> -> infos.add(commitInfoCache.update(getPeer(), 
> state.getLog().getLastCommittedIndex())) 
> -> CommitInfoCache::update.
> The stack of leader update commitInfoCache is: 
> follower finish RaftServerImpl::appendEntriesAsync and return reply
> -> GrpcLogAppender::runAppenderImpl 
> -> GrpcLogAppender::appendLog 
> ->LogAppender::createRequest 
> ->LeaderState::newAppendEntriesRequestProto 
> ->RaftServerImpl::getCommitInfos 
> ->LeaderState::updateFollowerCommitInfos
> ->CommitInfoCache::update.
> Because follower need to notify thread StateMachineUpdater to update 
> CommitInfoCache, we can not ensure follower update CommitInfoCache before 
> leader.
> *How to fix ?*
> Follower update CommitInfoCache before return reply to leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader

2020-04-25 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated RATIS-883:
-
Attachment: RATIS-883.001.patch

> Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
> 
>
> Key: RATIS-883
> URL: https://issues.apache.org/jira/browse/RATIS-883
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: RATIS-883.001.patch, screenshot-1.png
>
>
> *What's the problem ?*
>  !screenshot-1.png! 
> *What's the reason ?*
> The reason is follower update commitInfoCache after leader.
> The stack of follower update commitInfoCache is: 
> RaftServerImpl::appendEntriesAsync
> -> state.updateStateMachine 
> -> StateMachineUpdater::applyLog 
> -> RaftServerImpl::applyLogToStateMachine
> -> RaftServerImpl::replyPendingRequest 
> -> RaftServerImpl::getCommitInfos 
> -> infos.add(commitInfoCache.update(getPeer(), 
> state.getLog().getLastCommittedIndex())) 
> -> CommitInfoCache::update.
> The stack of leader update commitInfoCache is: 
> follower finish RaftServerImpl::appendEntriesAsync and return reply
> -> GrpcLogAppender::runAppenderImpl 
> -> GrpcLogAppender::appendLog 
> ->LogAppender::createRequest 
> ->LeaderState::newAppendEntriesRequestProto 
> ->RaftServerImpl::getCommitInfos 
> ->LeaderState::updateFollowerCommitInfos
> ->CommitInfoCache::update.
> Because follower need to notify thread StateMachineUpdater to update 
> CommitInfoCache, we can not ensure follower update CommitInfoCache before 
> leader.
> *How to fix ?*
> Follower update CommitInfoCache before return reply to leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader

2020-04-25 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated RATIS-883:
-
Description: 
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*
The reason is follower update commitInfoCache after leader.

The stack of follower update commitInfoCache is: 
RaftServerImpl::appendEntriesAsync
-> state.updateStateMachine 
-> StateMachineUpdater::applyLog 
-> RaftServerImpl::applyLogToStateMachine
-> RaftServerImpl::replyPendingRequest 
-> RaftServerImpl::getCommitInfos 
-> infos.add(commitInfoCache.update(getPeer(), 
state.getLog().getLastCommittedIndex())) 
-> CommitInfoCache::update.

The stack of leader update commitInfoCache is: 
follower finish RaftServerImpl::appendEntriesAsync and return reply
-> GrpcLogAppender::runAppenderImpl 
-> GrpcLogAppender::appendLog 
->LogAppender::createRequest 
->LeaderState::newAppendEntriesRequestProto 
->RaftServerImpl::getCommitInfos 
->LeaderState::updateFollowerCommitInfos
->CommitInfoCache::update.


Because follower need to notify thread StateMachineUpdater to update 
CommitInfoCache, we can not ensure follower update CommitInfoCache before 
leader.

*How to fix ?*
Follower update CommitInfoCache before return reply to leader.

  was:
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*
The reason is follower update commitInfoCache after leader.

The stack of follower update commitInfoCache is: 
RaftServerImpl::appendEntriesAsync
-> state.updateStateMachine 
-> StateMachineUpdater::applyLog 
-> RaftServerImpl::applyLogToStateMachine
-> RaftServerImpl::replyPendingRequest 
-> RaftServerImpl::getCommitInfos 
-> infos.add(commitInfoCache.update(getPeer(), 
state.getLog().getLastCommittedIndex())) 
-> CommitInfoCache::update.

The stack of leader update commitInfoCache is: 
follower finish RaftServerImpl::appendEntriesAsync and return reply
-> GrpcLogAppender::runAppenderImpl 
-> GrpcLogAppender::appendLog 
->LogAppender::createRequest 
->LeaderState::newAppendEntriesRequestProto 
->RaftServerImpl::getCommitInfos 
->LeaderState::updateFollowerCommitInfos
->CommitInfoCache::update.


Because follower need to notify thread StateMachineUpdater to update 
CommitInfoCache, we can not ensure follower update CommitInfoCache before 
leader.


> Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
> 
>
> Key: RATIS-883
> URL: https://issues.apache.org/jira/browse/RATIS-883
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> *What's the problem ?*
>  !screenshot-1.png! 
> *What's the reason ?*
> The reason is follower update commitInfoCache after leader.
> The stack of follower update commitInfoCache is: 
> RaftServerImpl::appendEntriesAsync
> -> state.updateStateMachine 
> -> StateMachineUpdater::applyLog 
> -> RaftServerImpl::applyLogToStateMachine
> -> RaftServerImpl::replyPendingRequest 
> -> RaftServerImpl::getCommitInfos 
> -> infos.add(commitInfoCache.update(getPeer(), 
> state.getLog().getLastCommittedIndex())) 
> -> CommitInfoCache::update.
> The stack of leader update commitInfoCache is: 
> follower finish RaftServerImpl::appendEntriesAsync and return reply
> -> GrpcLogAppender::runAppenderImpl 
> -> GrpcLogAppender::appendLog 
> ->LogAppender::createRequest 
> ->LeaderState::newAppendEntriesRequestProto 
> ->RaftServerImpl::getCommitInfos 
> ->LeaderState::updateFollowerCommitInfos
> ->CommitInfoCache::update.
> Because follower need to notify thread StateMachineUpdater to update 
> CommitInfoCache, we can not ensure follower update CommitInfoCache before 
> leader.
> *How to fix ?*
> Follower update CommitInfoCache before return reply to leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader

2020-04-25 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated RATIS-883:
-
Description: 
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*
The reason is follower update commitInfoCache after leader.

The stack of follower update commitInfoCache is: 
RaftServerImpl::appendEntriesAsync
-> state.updateStateMachine 
-> StateMachineUpdater::applyLog 
-> RaftServerImpl::applyLogToStateMachine
-> RaftServerImpl::replyPendingRequest 
-> RaftServerImpl::getCommitInfos 
-> infos.add(commitInfoCache.update(getPeer(), 
state.getLog().getLastCommittedIndex())) 
-> CommitInfoCache::update.

The stack of leader update commitInfoCache is: 
follower finish RaftServerImpl::appendEntriesAsync and return reply
-> GrpcLogAppender::runAppenderImpl 
-> GrpcLogAppender::appendLog 
->LogAppender::createRequest 
->LeaderState::newAppendEntriesRequestProto 
->RaftServerImpl::getCommitInfos 
->LeaderState::updateFollowerCommitInfos
->CommitInfoCache::update.


Because follower need to notify thread StateMachineUpdater to update 
CommitInfoCache, we can not ensure follower update CommitInfoCache before 
leader.

  was:
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*
The reason is follower update commitInfoCache after leader.

The stack of follower update commitInfoCache is: 
RaftServerImpl::appendEntriesAsync
-> state.updateStateMachine 
-> StateMachineUpdater::applyLog
-> RaftServerImpl::applyLogToStateMachine
-> RaftServerImpl::replyPendingRequest 
-> RaftServerImpl::getCommitInfos 
-> infos.add(commitInfoCache.update(getPeer(), 
state.getLog().getLastCommittedIndex())) 
-> CommitInfoCache::update.

The stack of leader update commitInfoCache is: 
follower finish RaftServerImpl::appendEntriesAsync and return reply
-> GrpcLogAppender::runAppenderImpl 
-> GrpcLogAppender::appendLog 
->LogAppender::createRequest 
->LeaderState::newAppendEntriesRequestProto 
->RaftServerImpl::getCommitInfos 
->LeaderState::updateFollowerCommitInfos
->CommitInfoCache::update.


> Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
> 
>
> Key: RATIS-883
> URL: https://issues.apache.org/jira/browse/RATIS-883
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> *What's the problem ?*
>  !screenshot-1.png! 
> *What's the reason ?*
> The reason is follower update commitInfoCache after leader.
> The stack of follower update commitInfoCache is: 
> RaftServerImpl::appendEntriesAsync
> -> state.updateStateMachine 
> -> StateMachineUpdater::applyLog 
> -> RaftServerImpl::applyLogToStateMachine
> -> RaftServerImpl::replyPendingRequest 
> -> RaftServerImpl::getCommitInfos 
> -> infos.add(commitInfoCache.update(getPeer(), 
> state.getLog().getLastCommittedIndex())) 
> -> CommitInfoCache::update.
> The stack of leader update commitInfoCache is: 
> follower finish RaftServerImpl::appendEntriesAsync and return reply
> -> GrpcLogAppender::runAppenderImpl 
> -> GrpcLogAppender::appendLog 
> ->LogAppender::createRequest 
> ->LeaderState::newAppendEntriesRequestProto 
> ->RaftServerImpl::getCommitInfos 
> ->LeaderState::updateFollowerCommitInfos
> ->CommitInfoCache::update.
> Because follower need to notify thread StateMachineUpdater to update 
> CommitInfoCache, we can not ensure follower update CommitInfoCache before 
> leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader

2020-04-25 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated RATIS-883:
-
Description: 
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*
The reason is follower update commitInfoCache after leader.

The stack of follower update commitInfoCache is: 
RaftServerImpl::appendEntriesAsync
-> state.updateStateMachine 
-> StateMachineUpdater::applyLog
-> RaftServerImpl::applyLogToStateMachine
-> RaftServerImpl::replyPendingRequest 
-> RaftServerImpl::getCommitInfos 
-> infos.add(commitInfoCache.update(getPeer(), 
state.getLog().getLastCommittedIndex())) 
-> CommitInfoCache::update.

The stack of leader update commitInfoCache is: 
follower finish RaftServerImpl::appendEntriesAsync and return reply
-> GrpcLogAppender::runAppenderImpl 
-> GrpcLogAppender::appendLog 
->LogAppender::createRequest 
->LeaderState::newAppendEntriesRequestProto 
->RaftServerImpl::getCommitInfos 
->LeaderState::updateFollowerCommitInfos
->CommitInfoCache::update.

  was:
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*
The reason is follower update commitInfoCache after leader.

The stack of follower update commitInfoCache is: 
RaftServerImpl::appendEntriesAsync
-> state.updateStateMachine 
-> StateMachineUpdater::applyLog
-> RaftServerImpl::applyLogToStateMachine
-> RaftServerImpl::replyPendingRequest 
-> RaftServerImpl::getCommitInfos 
-> infos.add(commitInfoCache.update(getPeer(), 
state.getLog().getLastCommittedIndex())) 
-> CommitInfoCache::update.

The stack of leader update commitInfoCache is: 
follower finish RaftServerImpl::appendEntriesAsync and return reply
-> GrpcLogAppender::runAppenderImpl 
-> GrpcLogAppender::appendLog 
->LogAppender::createRequest 
->LeaderState::newAppendEntriesRequestProto 
->RaftServerImpl::getCommitInfos 
->LeaderState::updateFollowerCommitInfos-
>CommitInfoCache::update.


> Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
> 
>
> Key: RATIS-883
> URL: https://issues.apache.org/jira/browse/RATIS-883
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> *What's the problem ?*
>  !screenshot-1.png! 
> *What's the reason ?*
> The reason is follower update commitInfoCache after leader.
> The stack of follower update commitInfoCache is: 
> RaftServerImpl::appendEntriesAsync
> -> state.updateStateMachine 
> -> StateMachineUpdater::applyLog
> -> RaftServerImpl::applyLogToStateMachine
> -> RaftServerImpl::replyPendingRequest 
> -> RaftServerImpl::getCommitInfos 
> -> infos.add(commitInfoCache.update(getPeer(), 
> state.getLog().getLastCommittedIndex())) 
> -> CommitInfoCache::update.
> The stack of leader update commitInfoCache is: 
> follower finish RaftServerImpl::appendEntriesAsync and return reply
> -> GrpcLogAppender::runAppenderImpl 
> -> GrpcLogAppender::appendLog 
> ->LogAppender::createRequest 
> ->LeaderState::newAppendEntriesRequestProto 
> ->RaftServerImpl::getCommitInfos 
> ->LeaderState::updateFollowerCommitInfos
> ->CommitInfoCache::update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader

2020-04-25 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated RATIS-883:
-
Description: 
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*
The reason is follower update commitInfoCache after leader.

The stack of follower update commitInfoCache is: 
RaftServerImpl::appendEntriesAsync
-> state.updateStateMachine 
-> StateMachineUpdater::applyLog
-> RaftServerImpl::applyLogToStateMachine
-> RaftServerImpl::replyPendingRequest 
-> RaftServerImpl::getCommitInfos 
-> infos.add(commitInfoCache.update(getPeer(), 
state.getLog().getLastCommittedIndex())) 
-> CommitInfoCache::update.

The stack of leader update commitInfoCache is: 
follower finish RaftServerImpl::appendEntriesAsync and return reply
-> GrpcLogAppender::runAppenderImpl 
-> GrpcLogAppender::appendLog 
->LogAppender::createRequest 
->LeaderState::newAppendEntriesRequestProto 
->RaftServerImpl::getCommitInfos 
->LeaderState::updateFollowerCommitInfos-
>CommitInfoCache::update.

  was:
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*
The reason is follower update commitInfoCache after leader.

The stack of follower update commitInfoCache is: 
RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> 
StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> 
RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> 
infos.add(commitInfoCache.update(getPeer(), 
state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update.

The stack of leader update commitInfoCache is: follower finish 
RaftServerImpl::appendEntriesAsync and return reply-> 
GrpcLogAppender::runAppenderImpl 
-> GrpcLogAppender::appendLog ->LogAppender::createRequest 
->LeaderState::newAppendEntriesRequestProto ->RaftServerImpl::getCommitInfos 
->LeaderState::updateFollowerCommitInfos->CommitInfoCache::update.


> Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
> 
>
> Key: RATIS-883
> URL: https://issues.apache.org/jira/browse/RATIS-883
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> *What's the problem ?*
>  !screenshot-1.png! 
> *What's the reason ?*
> The reason is follower update commitInfoCache after leader.
> The stack of follower update commitInfoCache is: 
> RaftServerImpl::appendEntriesAsync
> -> state.updateStateMachine 
> -> StateMachineUpdater::applyLog
> -> RaftServerImpl::applyLogToStateMachine
> -> RaftServerImpl::replyPendingRequest 
> -> RaftServerImpl::getCommitInfos 
> -> infos.add(commitInfoCache.update(getPeer(), 
> state.getLog().getLastCommittedIndex())) 
> -> CommitInfoCache::update.
> The stack of leader update commitInfoCache is: 
> follower finish RaftServerImpl::appendEntriesAsync and return reply
> -> GrpcLogAppender::runAppenderImpl 
> -> GrpcLogAppender::appendLog 
> ->LogAppender::createRequest 
> ->LeaderState::newAppendEntriesRequestProto 
> ->RaftServerImpl::getCommitInfos 
> ->LeaderState::updateFollowerCommitInfos-
> >CommitInfoCache::update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader

2020-04-25 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated RATIS-883:
-
Description: 
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*
The reason is follower update commitInfoCache after leader.

The stack of follower update commitInfoCache is: 
RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> 
StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> 
RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> 
infos.add(commitInfoCache.update(getPeer(), 
state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update.

The stack of leader update commitInfoCache is: follower finish 
RaftServerImpl::appendEntriesAsync and return reply-> 
GrpcLogAppender::runAppenderImpl 
-> GrpcLogAppender::appendLog ->LogAppender::createRequest 
->LeaderState::newAppendEntriesRequestProto ->RaftServerImpl::getCommitInfos 
->LeaderState::updateFollowerCommitInfos->CommitInfoCache::update.

  was:
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*
The reason is follower update commitInfoCache after leader.

The stack of follower update commitInfoCache is: 
RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> 
StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> 
RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> 
infos.add(commitInfoCache.update(getPeer(), 
state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update.

The stack of leader update commitInfoCache is: follower finish 
RaftServerImpl::appendEntriesAsync and return reply-> 
GrpcLogAppender::runAppenderImpl -> GrpcLogAppender::appendLog 
->LogAppender::createRequest ->LeaderState::newAppendEntriesRequestProto 
->RaftServerImpl::getCommitInfos 
->LeaderState::updateFollowerCommitInfos->CommitInfoCache::update.


> Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
> 
>
> Key: RATIS-883
> URL: https://issues.apache.org/jira/browse/RATIS-883
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> *What's the problem ?*
>  !screenshot-1.png! 
> *What's the reason ?*
> The reason is follower update commitInfoCache after leader.
> The stack of follower update commitInfoCache is: 
> RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> 
> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> 
> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> 
> infos.add(commitInfoCache.update(getPeer(), 
> state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update.
> The stack of leader update commitInfoCache is: follower finish 
> RaftServerImpl::appendEntriesAsync and return reply-> 
> GrpcLogAppender::runAppenderImpl 
> -> GrpcLogAppender::appendLog ->LogAppender::createRequest 
> ->LeaderState::newAppendEntriesRequestProto ->RaftServerImpl::getCommitInfos 
> ->LeaderState::updateFollowerCommitInfos->CommitInfoCache::update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader

2020-04-25 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated RATIS-883:
-
Description: 
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*
The reason is follower update commitInfoCache after leader.

The stack of follower update commitInfoCache is: 
RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> 
StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> 
RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> 
infos.add(commitInfoCache.update(getPeer(), 
state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update.

The stack of leader update commitInfoCache is: follower finish 
RaftServerImpl::appendEntriesAsync and return reply-> 
GrpcLogAppender::runAppenderImpl -> GrpcLogAppender::appendLog 
->LogAppender::createRequest ->LeaderState::newAppendEntriesRequestProto 
->RaftServerImpl::getCommitInfos 
->LeaderState::updateFollowerCommitInfos->CommitInfoCache::update.

  was:
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*
The reason is follower update commitInfoCache after leader.

The stack of follower update commitInfoCache is: 
RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> 
StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> 
RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> 
infos.add(commitInfoCache.update(getPeer(), 
state.getLog().getLastCommittedIndex())) -> CommitInfoProto::update.

Leader update 


> Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
> 
>
> Key: RATIS-883
> URL: https://issues.apache.org/jira/browse/RATIS-883
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> *What's the problem ?*
>  !screenshot-1.png! 
> *What's the reason ?*
> The reason is follower update commitInfoCache after leader.
> The stack of follower update commitInfoCache is: 
> RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> 
> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> 
> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> 
> infos.add(commitInfoCache.update(getPeer(), 
> state.getLog().getLastCommittedIndex())) -> CommitInfoCache::update.
> The stack of leader update commitInfoCache is: follower finish 
> RaftServerImpl::appendEntriesAsync and return reply-> 
> GrpcLogAppender::runAppenderImpl -> GrpcLogAppender::appendLog 
> ->LogAppender::createRequest ->LeaderState::newAppendEntriesRequestProto 
> ->RaftServerImpl::getCommitInfos 
> ->LeaderState::updateFollowerCommitInfos->CommitInfoCache::update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader

2020-04-25 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated RATIS-883:
-
Description: 
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*
The reason is follower update commitInfoCache after leader.

The stack of follower update commitInfoCache is: 
RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> 
StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> 
RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> 
infos.add(commitInfoCache.update(getPeer(), 
state.getLog().getLastCommittedIndex())) -> CommitInfoProto::update.

Leader update 

  was:
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*
The reason is follower update commitInfoCache after leader.

Follower update commitInfoCache when finish appendEntry and then 
state.updateStateMachine -> StateMachineUpdater::applyLog -> 
RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest 
-> RaftServerImpl::getCommitInfos -> 
infos.add(commitInfoCache.update(getPeer(), 
state.getLog().getLastCommittedIndex())) -> CommitInfoProto::update.


> Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
> 
>
> Key: RATIS-883
> URL: https://issues.apache.org/jira/browse/RATIS-883
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> *What's the problem ?*
>  !screenshot-1.png! 
> *What's the reason ?*
> The reason is follower update commitInfoCache after leader.
> The stack of follower update commitInfoCache is: 
> RaftServerImpl::appendEntriesAsync-> state.updateStateMachine -> 
> StateMachineUpdater::applyLog -> RaftServerImpl::applyLogToStateMachine -> 
> RaftServerImpl::replyPendingRequest -> RaftServerImpl::getCommitInfos -> 
> infos.add(commitInfoCache.update(getPeer(), 
> state.getLog().getLastCommittedIndex())) -> CommitInfoProto::update.
> Leader update 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader

2020-04-25 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated RATIS-883:
-
Description: 
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*
The reason is follower update commitInfoCache after leader.

Follower update commitInfoCache when finish appendEntry and then 
state.updateStateMachine -> StateMachineUpdater::applyLog -> 
RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest 
-> RaftServerImpl::getCommitInfos -> 
infos.add(commitInfoCache.update(getPeer(), 
state.getLog().getLastCommittedIndex())) -> CommitInfoProto::update.

  was:
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*
The reason is follower update commitInfoCache after leader.


> Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
> 
>
> Key: RATIS-883
> URL: https://issues.apache.org/jira/browse/RATIS-883
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> *What's the problem ?*
>  !screenshot-1.png! 
> *What's the reason ?*
> The reason is follower update commitInfoCache after leader.
> Follower update commitInfoCache when finish appendEntry and then 
> state.updateStateMachine -> StateMachineUpdater::applyLog -> 
> RaftServerImpl::applyLogToStateMachine -> RaftServerImpl::replyPendingRequest 
> -> RaftServerImpl::getCommitInfos -> 
> infos.add(commitInfoCache.update(getPeer(), 
> state.getLog().getLastCommittedIndex())) -> CommitInfoProto::update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader

2020-04-25 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated RATIS-883:
-
Description: 
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*
The reason is follower update commitInfoCache after leader.

  was:
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*


> Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
> 
>
> Key: RATIS-883
> URL: https://issues.apache.org/jira/browse/RATIS-883
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> *What's the problem ?*
>  !screenshot-1.png! 
> *What's the reason ?*
> The reason is follower update commitInfoCache after leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader

2020-04-25 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated RATIS-883:
-
Attachment: screenshot-1.png

> Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
> 
>
> Key: RATIS-883
> URL: https://issues.apache.org/jira/browse/RATIS-883
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader

2020-04-25 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated RATIS-883:
-
Description: 
*What's the problem ?*
 !screenshot-1.png! 

*What's the reason ?*

> Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader
> 
>
> Key: RATIS-883
> URL: https://issues.apache.org/jira/browse/RATIS-883
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> *What's the problem ?*
>  !screenshot-1.png! 
> *What's the reason ?*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-883) Failed UT: testStateMachineMetrics.checkFollowerCommitLagsLeader

2020-04-25 Thread runzhiwang (Jira)
runzhiwang created RATIS-883:


 Summary: Failed UT: 
testStateMachineMetrics.checkFollowerCommitLagsLeader
 Key: RATIS-883
 URL: https://issues.apache.org/jira/browse/RATIS-883
 Project: Ratis
  Issue Type: Bug
Reporter: runzhiwang
Assignee: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-857) Thread unsafe RaftServerMetrics::metricsMap HashMap in multi thread

2020-04-25 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated RATIS-857:
-
Attachment: RATIS-881.001.patch

> Thread unsafe RaftServerMetrics::metricsMap HashMap in multi thread
> ---
>
> Key: RATIS-857
> URL: https://issues.apache.org/jira/browse/RATIS-857
> Project: Ratis
>  Issue Type: Bug
>  Components: metrics
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Fix For: 0.6.0
>
> Attachments: RATIS-857.001.patch, RATIS-881.001.patch
>
>
> *What's the problem ?*
> The {color:#DE350B}static{color} variable 
> [RaftServerMetrics::metricsMap|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L71]
>  is type of HashMap, which is not thread safe. But entry will be 
> [put|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L76]
>  into metricsMap by different thread, when create each RaftServerImpl 
> instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-881) Failed unit test because test before MiniRaftCluster ready

2020-04-25 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated RATIS-881:
-
Attachment: RATIS-881.001.patch

> Failed unit test because test before MiniRaftCluster ready
> --
>
> Key: RATIS-881
> URL: https://issues.apache.org/jira/browse/RATIS-881
> Project: Ratis
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: RATIS-881.001.patch, screenshot-1.png
>
>
> For the failed 
> [TestRaftWithGrpc::testStateMachineMetrics|https://builds.apache.org/job/PreCommit-RATIS-Build/1305/testReport/org.apache.ratis.grpc/TestRaftWithGrpc/testStateMachineMetrics/],
>  the reason is the 
> [RaftServerMetrics::getPeerCommitIndexGauge|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L141]
>  happens before 
> [RaftServerMetrics::addPeerCommitIndexGauge|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L122].
>   
> When some RaftServerImpl [setRole(RaftPeerRole.LEADER, 
> "changeToLeader")|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L345],
>  the statement 
> [waitForLeader|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/test/java/org/apache/ratis/RaftBasicTests.java#L446]
>  succ to get leader and test begin, but 
> [role.startLeaderState|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L349]
>  ->
>  [new 
> LeaderState|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RoleInfo.java#L94]
>  ->
> [LeaderState::addSenders|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L409]->[RaftServerMetrics::addFollower|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L106]
>  -> 
> [RaftServerMetrics::addPeerCommitIndexGauge|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerMetrics.java#L122]
>  has not finished.
> !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-840) Memory leak of LogAppender

2020-04-25 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated RATIS-840:
--
Priority: Blocker  (was: Critical)

> Memory leak of LogAppender
> --
>
> Key: RATIS-840
> URL: https://issues.apache.org/jira/browse/RATIS-840
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Blocker
> Attachments: RATIS-840.001.patch, RATIS-840.002.patch, 
> RATIS-840.003.patch, image-2020-04-06-14-27-28-485.png, 
> image-2020-04-06-14-27-39-582.png, screenshot-1.png, screenshot-2.png
>
>
> *What's the problem ?*
>  When run hadoop-ozone for 4 days, datanode memory leak.  When dump heap, I 
> found there are 460710 instances of GrpcLogAppender. But there are only 6 
> instances of SenderList, and each SenderList contains 1-2 instance of 
> GrpcLogAppender. And there are a lot of logs related to 
> [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428].
>  {code:java}INFO impl.RaftServerImpl: 
> 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-LeaderState: 
> Restarting GrpcLogAppender for 
> 1665f5ea-ab17-4a0e-af6d-6958efd322fa@group-F64B465F37B5-\u003e229cbcc1-a3b2-4383-9c0d-c0f4c28c3d4a\n","stream":"stderr","time":"2020-04-06T03:59:53.37892512Z"}{code}
>  
>  So there are a lot of GrpcLogAppender did not stop the Daemon Thread when 
> removed from senders. 
>  !image-2020-04-06-14-27-28-485.png! 
>  !image-2020-04-06-14-27-39-582.png! 
>  
> *Why 
> [LeaderState::restartSender|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L428]
>  so many times ?*
> 1. As the image shows, when remove group, SegmentedRaftLog will close, then 
> GrpcLogAppender throw exception when find the SegmentedRaftLog was closed. 
> Then GrpcLogAppender will be 
> [restarted|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LogAppender.java#L94],
>  and the new GrpcLogAppender throw exception again when find the 
> SegmentedRaftLog was closed, then GrpcLogAppender will be restarted again ... 
> . It results in an infinite restart of GrpcLogAppender.
> 2. Actually, when remove group, GrpcLogAppender will be stoped: 
> RaftServerImpl::shutdown -> 
> [RoleInfo::shutdownLeaderState|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L266]
>  -> LeaderState::stop -> LogAppender::stopAppender, then SegmentedRaftLog 
> will be closed:  RaftServerImpl::shutdown -> 
> [ServerState:close|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L271]
>  ... . Though RoleInfo::shutdownLeaderState called before ServerState:close, 
> but the GrpcLogAppender was stopped asynchronously. So infinite restart of 
> GrpcLogAppender happens, when GrpcLogAppender stop after SegmentedRaftLog 
> close.
>  !screenshot-1.png! 
> *Why GrpcLogAppender did not stop the Daemon Thread when removed from senders 
> ?*
>  I find a lot of GrpcLogAppender blocked inside logs4j. I think it's 
> GrpcLogAppender restart too fast, then blocked in logs4j.
>  !screenshot-2.png! 
> *Can the new GrpcLogAppender work normally ?*
> 1. Even though without the above problem, the new created GrpcLogAppender 
> still can not work normally. 
> 2. When creat a new GrpcLogAppender, a new FollowerInfo will also be created: 
> LeaderState::addAndStartSenders -> 
> LeaderState::addSenders->RaftServerImpl::newLogAppender -> [new 
> FollowerInfo|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L129]
> 3. When the new created GrpcLogAppender append entry to follower, then the 
> follower response SUCCESS.
> 4. Then LeaderState::updateCommit -> [LeaderState::getMajorityMin | 
> https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L599]
>  -> 
> [voterLists.get(0) | 
> https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderState.java#L607].
>  {color:#DE350B}Error happens because voterLists.get(0) return the 
> FollowerInfo of the old GrpcLogAppender, not the FollowerInfo of the new 
> GrpcLogAppender. {color}
> 5. Because the majority commit got from the FollowerInfo of the old 
> GrpcLogAppender never changes. So even though follower has append entry 
> successfully, the leader can not update commit. So the new created 
> GrpcLogAppender can never work normally.
> 6. The reason of unit test of 

[jira] [Resolved] (RATIS-880) Update github description and disable merge options apart from Squash and merge

2020-04-25 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek resolved RATIS-880.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Update github description and disable merge options apart from Squash and 
> merge
> ---
>
> Key: RATIS-880
> URL: https://issues.apache.org/jira/browse/RATIS-880
> Project: Ratis
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Update github description and disable merge options apart from Squash and 
> merge



--
This message was sent by Atlassian Jira
(v8.3.4#803005)