[ 
https://issues.apache.org/jira/browse/RATIS-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17943815#comment-17943815
 ] 

Xinhao GU edited comment on RATIS-2278 at 4/12/25 1:40 PM:
-----------------------------------------------------------

!image-2025-04-12-21-39-10-485.png!

Although we changed the the Preconditions to validate startIndex, the ERROR 
appears again. So I think the  the Preconditions should be removed completely. 
[~szetszwo] 
{code:java}
--- 
a/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerImplUtils.java
+++ 
b/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerImplUtils.java
@@ -138,13 +138,6 @@ public final class ServerImplUtils {
 
     synchronized void append(List<ConsecutiveIndices> entriesTermIndices) {
       for(ConsecutiveIndices indices : entriesTermIndices) {
-        // validate startIndex
-        final Map.Entry<Long, ConsecutiveIndices> lastEntry = map.lastEntry();
-        if (lastEntry != null) {
-          final long nextIndex = lastEntry.getValue().getNextIndex();
-          Preconditions.assertTrue(indices.startIndex >= nextIndex,
-                  () -> "startIndex = " + indices.startIndex + " < nextIndex = 
" + nextIndex);
-        }
         map.put(indices.startIndex, indices);
       }
     } {code}
 

 

 


was (Author: JIRAUSER308993):
!image-2025-04-12-21-39-10-485.png!
{code:java}
 {code}
Although we changed the the Preconditions to validate startIndex, the ERROR 
appears again. So I think the  the Preconditions should be removed completely. 
[~szetszwo] 
{code:java}
--- 
a/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerImplUtils.java
+++ 
b/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerImplUtils.java
@@ -138,13 +138,6 @@ public final class ServerImplUtils {
 
     synchronized void append(List<ConsecutiveIndices> entriesTermIndices) {
       for(ConsecutiveIndices indices : entriesTermIndices) {
-        // validate startIndex
-        final Map.Entry<Long, ConsecutiveIndices> lastEntry = map.lastEntry();
-        if (lastEntry != null) {
-          final long nextIndex = lastEntry.getValue().getNextIndex();
-          Preconditions.assertTrue(indices.startIndex >= nextIndex,
-                  () -> "startIndex = " + indices.startIndex + " < nextIndex = 
" + nextIndex);
-        }
         map.put(indices.startIndex, indices);
       }
     } {code}
 

 

 

> Follower Fails to Append Entries Due to Index Validation Race Condition in 
> NavigableIndices
> -------------------------------------------------------------------------------------------
>
>                 Key: RATIS-2278
>                 URL: https://issues.apache.org/jira/browse/RATIS-2278
>             Project: Ratis
>          Issue Type: Bug
>          Components: Leader
>            Reporter: Xinhao GU
>            Assignee: Xinhao GU
>            Priority: Major
>             Fix For: 3.2.0
>
>         Attachments: image-2025-04-06-22-56-34-207.png, 
> image-2025-04-12-21-39-10-485.png
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> h2. *Problem Description*
> When a follower receives overlapping {{appendEntries}} requests from the 
> leader, it may throw an {{IllegalStateException}} during log index 
> validation. Specifically, the {{NavigableIndices}} class fails to handle 
> concurrent log appends properly, causing the follower to incorrectly reject 
> valid entries with an "index mismatch" error.
> h2. *Steps to Reproduce*
>  # {*}Cluster Setup{*}: Deploy a Raft group with at least 1 leader and 2 
> followers.
>  # {*}High Concurrency Scenario{*}: Trigger frequent {{appendEntries}} 
> requests from the leader to a follower.
>  # {*}Overlapping Requests{*}: Ensure two threads ({{{}thread-1{}}} and 
> {{{}thread-2{}}}) process {{appendEntries}} requests simultaneously:
>  ** {{thread-1}} appends logs with indices {{16093–16185}} and enters 
> {{state.getLog().append()}} but hasn’t invoked {{{}removeExisting(){}}}.
>  ** {{thread-2}} attempts to append logs starting at index {{16093}} (from a 
> new request) while {{thread-1}} is still processing.
>  # {*}Validation Failure{*}: The {{NavigableIndices.append()}} method detects 
> a gap between {{{}thread-1{}}}'s last index ({{{}16185{}}}) and 
> {{{}thread-2{}}}'s start index ({{{}16093{}}}), throwing an error.
> h2. *Expected Behavior*
> Concurrent {{appendEntries}} requests should be serialized or validated 
> atomically to prevent index gaps. Followers should accept logs as long as 
> they pass the initial {{ConsecutiveIndices.convert()}} check.
> h2. *Actual Behavior*
>  
> {code:java}
> 2025-03-04 05:44:14,946 [7-server-thread2] ERROR 
> o.a.r.s.i.RaftServerImpl:1488 - 7@group-000200000000: Failed appendEntries 
> 8->7#73-t49,previous=(t:3, i:16092),leaderCommit=16184,initializing? 
> false,entries: size=93, first=(t:4, i:16093), 
> CONFIGURATIONENTRY(current:id:"6"address:"172.16.2.14:10750"startupRole:FOLLOWER,
>  id:"7"address:"172.16.2.15:10750"startupRole:FOLLOWER, 
> id:"8"address:"172.16.2.11:10750"startupRole:FOLLOWER, old 
> java.lang.IllegalStateException: startIndex: expected == 16186 but computed 
> == 16093 at 
> org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:77)
> at org.apache.ratis.util.Preconditions.assertSame(Preconditions.java:87)
> at 
> org.apache.ratis.server.impl.ServerImplUtils$NavigableIndices.append(ServerImplUtils.java:144)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.appendLog(RaftServerImpl.java:1631)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:1596)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:1486)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$null$28(RaftServerProxy.java:646)
> at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:118)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitServerRequestAsync$10(RaftServerImpl.java:906)
> at 
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at java.base/java.lang.Thread.run(Thread.java:833)
> 2025-03-04 05:44:14,950 [grpc-default-executor-8] WARN 
> o.a.ratis.util.LogUtils:129 - 7: Failed APPEND_ENTRIES request 
> 8->7#73-t49,previous=(t:3, i:16092),leaderCommit=16184,initializing? 
> false,entries: size=93, first=(t:4, i:16093), 
> CONFIGURATIONENTRY(current:id:"6"address:"172.16.2.14:10750"startupRole:FOLLOWER,
>  id:"7"address:"172.16.2.15:10750"startupRole:FOLLOWER, 
> id:"8"address:"172.16.2.11:10750"startupRole:FOLLOWER, old 
> java.util.concurrent.CompletionException: java.io.IOException: 
> java.lang.IllegalStateException: startIndex: expected == 16186 but computed 
> == 16093
> at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:122)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitServerRequestAsync$10(RaftServerImpl.java:906)
> at 
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at java.base/java.lang.Thread.run(Thread.java:833)
> Caused by: java.io.IOException: java.lang.IllegalStateException: startIndex: 
> expected == 16186 but computed == 16093
> at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:56)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:1490)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$null$28(RaftServerProxy.java:646)
> at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:118)
> ... 5 common frames omitted
> Caused by: java.lang.IllegalStateException: startIndex: expected == 16186 but 
> computed == 16093
> at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:77)
> at org.apache.ratis.util.Preconditions.assertSame(Preconditions.java:87)
> at 
> org.apache.ratis.server.impl.ServerImplUtils$NavigableIndices.append(ServerImplUtils.java:144)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.appendLog(RaftServerImpl.java:1631)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:1596)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:1486)
> ... 7 common frames omitted*{code}
> *!image-2025-04-06-22-56-34-207.png|width=1143,height=288!*
>  
> h2. *Root Cause Analysis*
>  # {*}Thread Race Condition{*}:
> The {{NavigableIndices}} class uses {{synchronized}} methods for {{append()}} 
> and {{{}removeExisting(){}}}, but *inter-method concurrency* is not handled. 
> If two threads interleave execution (e.g., {{thread-1}} appends logs but 
> hasn’t cleaned up {{map}} via {{{}removeExisting(){}}}, while {{thread-2}} 
> starts a new append), the index continuity check in {{append()}} fails 
> erroneously.
>  # {*}Redundant Validation{*}:
> The {{append()}} method revalidates log continuity using the {{{}map{}}}’s 
> last entry, even though {{ConsecutiveIndices.convert()}} already ensures 
> intra-request index continuity. This creates unnecessary 
> concurrency-sensitive checks.
> h2. *Proposed Solutions*
> h3. {*}Option 1{*}: Remove Redundant Validation in {{append()}}
> Modify {{NavigableIndices.append()}} to skip the startIndex check, relying 
> solely on {{convert()}} for validation.
> {*}Justification{*}:
>  * {{convert()}} already ensures all entries in a single request are 
> consecutive.
>  * {{removeExisting()}} guarantees cleanup after appending, making 
> cross-request checks redundant and race-prone.
> h3. {*}Option 2{*}: Global Lock for {{NavigableIndices'map}}
> Add a global lock for map to serialize all {{append()}} and 
> {{removeExisting()}} operations. However, this could introduce performance 
> bottlenecks.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to