Brokenice0415 commented on code in PR #939:
URL: https://github.com/apache/ratis/pull/939#discussion_r1361478687
##########
ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java:
##########
@@ -143,7 +143,12 @@ private void resetClient(AppendEntriesRequest request,
boolean onError) {
if (request != null && request.isHeartbeat()) {
return;
}
- getFollower().decreaseNextIndex(nextIndex);
+ // decrease next index
+ final long oldNextIndex = getFollower().getNextIndex();
+ final long matchIndex = getFollower().getMatchIndex();
+ getFollower().updateNextIndex(
+ Math.max(matchIndex + 1
+ , oldNextIndex <= 0L? oldNextIndex:
Math.min(oldNextIndex - 1, nextIndex)));
Review Comment:
@szetszwo, I agree that we should not update next index blindly.
The patch I wrote is according to the operations to next index in
`FollowerInfoImpl`, which aims to add a low bound `match index + 1` to the
decreased next index.
And I think I should call `setNextIndex` instead of `updateNextIndex` to
ensure the decrease.
```java
@Override
public void decreaseNextIndex(long newNextIndex) {
nextIndex.updateUnconditionally(old -> old <= 0L? old: Math.min(old - 1,
newNextIndex),
message -> info("decreaseNextIndex", message));
}
@Override
public void setNextIndex(long newNextIndex) {
nextIndex.updateUnconditionally(old -> newNextIndex >= 0 ? newNextIndex :
old,
message -> info("setNextIndex", message));
}
@Override
public void updateNextIndex(long newNextIndex) {
nextIndex.updateToMax(newNextIndex,
message -> debug("updateNextIndex", message));
}
```
The bug I describe happens in case below:
1. Leader 0 append nop log [0] and get consensus in cluster;
2. Now follower 1's next index is 1 and match index is 0.
3. Leader 0 send heartbeat to follower 1.
4. Follower 1 crashes before replying the heartbeat.
5. Leader 0 decrease 1's next index to match index 0 for assigning the new
next index by `min(old - 1, newNextIndex)`
```
2023-10-17 10:59:26 WARN GrpcLogAppender:122 -
1@group-6F7570313233->0-AppendLogResponseHandler: Failed appendEntries:
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE:
Network closed for unknown reason
2023-10-17 10:59:26 INFO FollowerInfo:51 - 1@group-6F7570313233->0:
nextIndex: updateUnconditionally 1 -> 0
```
6. Now next index is not larger than match index.
> since it got an error, it should try resending the same (previous log
index + 1) even if (previous log index + 1 <= match index).
Although resending the same request will be always right, which can only
bring some redundant request, it will be better to optimize it.
> One reason could be due to inconsistency; see getNextIndexForInconsistency.
In this case, the patch works the same before if using `setNextIndex`
instead of `updateNextIndex`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]