Re: [PR] HDDS-13281. Disable Ratis metadata write to Raft Log on OM & SCM. [ozone]

via GitHub Mon, 30 Jun 2025 15:43:54 -0700


szetszwo commented on code in PR #8637:
URL: https://github.com/apache/ozone/pull/8637#discussion_r2176032836



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java:
##########
@@ -905,7 +905,37 @@ public void snapshotLimitCheck() throws IOException, 
OMException {
   }
 
   public void decrementInFlightSnapshotCount() {
-    inFlightSnapshotCount.decrementAndGet();
+    /*
+     * There is a race condition here because of which
+     * inFlightSnapshotCount could become negative sometimes.
+     *
+     * When we get a call back from Ratis on notifyLeaderReady,
+     * we assume that all the pending transactions (from RaftLog)
+     * are applied to the SateMachine and we go a head and reset the

Review Comment:
   This assumption is incorrect. -- LeaderReady means that the leader is able 
to commit a new transaction.  Ratis should not care about how the StateMachine 
applies the transactions.



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java:
##########
@@ -905,7 +905,37 @@ public void snapshotLimitCheck() throws IOException, 
OMException {
   }
 
   public void decrementInFlightSnapshotCount() {
-    inFlightSnapshotCount.decrementAndGet();
+    /*
+     * There is a race condition here because of which
+     * inFlightSnapshotCount could become negative sometimes.
+     *
+     * When we get a call back from Ratis on notifyLeaderReady,
+     * we assume that all the pending transactions (from RaftLog)
+     * are applied to the SateMachine and we go a head and reset the
+     * inFlightSnapshotCount to 0. The expectation here is that after
+     * the leader election we want to start inFlightSnapshotCount from 0.
+     *
+     * The applyTransaction in OzoneManagerStateMachine processes
+     * the calls in an async manner and returns a CompletableFuture,
+     * the transactions are not yet fully processed by the
+     * OzoneManagerStateMachine when Ratis notifies leader ready. (RATIS-2313)
+     *
+     * Because of the async processing in OzoneManagerStateMachine and
+     * Ratis not waiting for the CompletableFuture to complete, we mark the
+     * leader as ready (reset inFlightSnapshotCount to 0) even before we 
complete
+     * processing all the pending transactions from the old term. If there is a
+     * create snapshot transaction in the Raft Log from old term and it gets
+     * processed after we reset inFlightSnapshotCount, the count would become 
-1
+     * as we decrement the inFlightSnapshotCount when the 
OMSnapshotCreateRequest
+     * processing is completed.
+     *
+     * The workaround here is to make sure that we don't make 
inFlightSnapshotCount
+     * negative. We should be able to remove the workaround after RATIS-2313.

Review Comment:
   As mentioned in [this 
comment](https://issues.apache.org/jira/browse/RATIS-2313?focusedCommentId=17987031&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17987031),
 we should not change Ratis for fixing the `inFlightSnapshotCount` bug.
   
   BTW, I can see that `inFlightSnapshotCount` may have other bugs such as the 
request could be disallowed by PrepareState:
   
https://github.com/apache/ozone/blob/18845e0954e2b7c237d28bd65ed394f11f434fe2/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerRatisServer.java#L260
   It seems that the count won't  be decremented in that case.  The problem is 
the account logic of `inFlightSnapshotCount` has flaw.



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java:
##########
@@ -905,7 +905,37 @@ public void snapshotLimitCheck() throws IOException, 
OMException {
   }
 
   public void decrementInFlightSnapshotCount() {
-    inFlightSnapshotCount.decrementAndGet();
+    /*
+     * There is a race condition here because of which
+     * inFlightSnapshotCount could become negative sometimes.
+     *
+     * When we get a call back from Ratis on notifyLeaderReady,
+     * we assume that all the pending transactions (from RaftLog)
+     * are applied to the SateMachine and we go a head and reset the
+     * inFlightSnapshotCount to 0. The expectation here is that after

Review Comment:
   We should fix the accounting logic of `inFlightSnapshotCount`.
   - When a request is submitted, increment the count;
   - When the same request is completed/falied/cancelled, decrement the count.
   
   It should not rely on an external condition to reset the count.
   



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java:
##########
@@ -905,7 +905,37 @@ public void snapshotLimitCheck() throws IOException, 
OMException {
   }
 
   public void decrementInFlightSnapshotCount() {
-    inFlightSnapshotCount.decrementAndGet();
+    /*
+     * There is a race condition here because of which
+     * inFlightSnapshotCount could become negative sometimes.
+     *
+     * When we get a call back from Ratis on notifyLeaderReady,
+     * we assume that all the pending transactions (from RaftLog)
+     * are applied to the SateMachine and we go a head and reset the
+     * inFlightSnapshotCount to 0. The expectation here is that after
+     * the leader election we want to start inFlightSnapshotCount from 0.
+     *
+     * The applyTransaction in OzoneManagerStateMachine processes
+     * the calls in an async manner and returns a CompletableFuture,
+     * the transactions are not yet fully processed by the
+     * OzoneManagerStateMachine when Ratis notifies leader ready. (RATIS-2313)
+     *
+     * Because of the async processing in OzoneManagerStateMachine and
+     * Ratis not waiting for the CompletableFuture to complete, we mark the
+     * leader as ready (reset inFlightSnapshotCount to 0) even before we 
complete
+     * processing all the pending transactions from the old term. If there is a
+     * create snapshot transaction in the Raft Log from old term and it gets
+     * processed after we reset inFlightSnapshotCount, the count would become 
-1
+     * as we decrement the inFlightSnapshotCount when the 
OMSnapshotCreateRequest
+     * processing is completed.
+     *
+     * The workaround here is to make sure that we don't make 
inFlightSnapshotCount
+     * negative. We should be able to remove the workaround after RATIS-2313.
+     */
+    int result = inFlightSnapshotCount.decrementAndGet();
+    if (result < 0) {
+      resetInFlightSnapshotCount();
+    }

Review Comment:
   The workaround is fine.  How about updating the comment to below?
   ```java
   // TODO this is a work around for the accounting logic of 
`inFlightSnapshotCount`.
   //    - It incorrectly assumes that LeaderReady means that there are no 
inflight snapshot requests.
   //      We may consider fixing it by waiting all the pending requests in 
notifyLeaderReady().
   //    - Also, it seems to have another bug that the PrepareState could 
disallow snapshot requests.
   //      In such case, `inFlightSnapshotCount` won't be decremented. 
   ```
   
   We should file a separate Ozone JIRA for fixing it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Re: [PR] HDDS-13281. Disable Ratis metadata write to Raft Log on OM & SCM. [ozone]

Reply via email to