szetszwo commented on code in PR #8637: URL: https://github.com/apache/ozone/pull/8637#discussion_r2176032836
########## hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java: ########## @@ -905,7 +905,37 @@ public void snapshotLimitCheck() throws IOException, OMException { } public void decrementInFlightSnapshotCount() { - inFlightSnapshotCount.decrementAndGet(); + /* + * There is a race condition here because of which + * inFlightSnapshotCount could become negative sometimes. + * + * When we get a call back from Ratis on notifyLeaderReady, + * we assume that all the pending transactions (from RaftLog) + * are applied to the SateMachine and we go a head and reset the Review Comment: This assumption is incorrect. -- LeaderReady means that the leader is able to commit a new transaction. Ratis should not care about how the StateMachine applies the transactions. ########## hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java: ########## @@ -905,7 +905,37 @@ public void snapshotLimitCheck() throws IOException, OMException { } public void decrementInFlightSnapshotCount() { - inFlightSnapshotCount.decrementAndGet(); + /* + * There is a race condition here because of which + * inFlightSnapshotCount could become negative sometimes. + * + * When we get a call back from Ratis on notifyLeaderReady, + * we assume that all the pending transactions (from RaftLog) + * are applied to the SateMachine and we go a head and reset the + * inFlightSnapshotCount to 0. The expectation here is that after + * the leader election we want to start inFlightSnapshotCount from 0. + * + * The applyTransaction in OzoneManagerStateMachine processes + * the calls in an async manner and returns a CompletableFuture, + * the transactions are not yet fully processed by the + * OzoneManagerStateMachine when Ratis notifies leader ready. (RATIS-2313) + * + * Because of the async processing in OzoneManagerStateMachine and + * Ratis not waiting for the CompletableFuture to complete, we mark the + * leader as ready (reset inFlightSnapshotCount to 0) even before we complete + * processing all the pending transactions from the old term. If there is a + * create snapshot transaction in the Raft Log from old term and it gets + * processed after we reset inFlightSnapshotCount, the count would become -1 + * as we decrement the inFlightSnapshotCount when the OMSnapshotCreateRequest + * processing is completed. + * + * The workaround here is to make sure that we don't make inFlightSnapshotCount + * negative. We should be able to remove the workaround after RATIS-2313. Review Comment: As mentioned in [this comment](https://issues.apache.org/jira/browse/RATIS-2313?focusedCommentId=17987031&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17987031), we should not change Ratis for fixing the `inFlightSnapshotCount` bug. BTW, I can see that `inFlightSnapshotCount` may have other bugs such as the request could be disallowed by PrepareState: https://github.com/apache/ozone/blob/18845e0954e2b7c237d28bd65ed394f11f434fe2/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerRatisServer.java#L260 It seems that the count won't be decremented in that case. The problem is the account logic of `inFlightSnapshotCount` has flaw. ########## hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java: ########## @@ -905,7 +905,37 @@ public void snapshotLimitCheck() throws IOException, OMException { } public void decrementInFlightSnapshotCount() { - inFlightSnapshotCount.decrementAndGet(); + /* + * There is a race condition here because of which + * inFlightSnapshotCount could become negative sometimes. + * + * When we get a call back from Ratis on notifyLeaderReady, + * we assume that all the pending transactions (from RaftLog) + * are applied to the SateMachine and we go a head and reset the + * inFlightSnapshotCount to 0. The expectation here is that after Review Comment: We should fix the accounting logic of `inFlightSnapshotCount`. - When a request is submitted, increment the count; - When the same request is completed/falied/cancelled, decrement the count. It should not rely on an external condition to reset the count. ########## hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java: ########## @@ -905,7 +905,37 @@ public void snapshotLimitCheck() throws IOException, OMException { } public void decrementInFlightSnapshotCount() { - inFlightSnapshotCount.decrementAndGet(); + /* + * There is a race condition here because of which + * inFlightSnapshotCount could become negative sometimes. + * + * When we get a call back from Ratis on notifyLeaderReady, + * we assume that all the pending transactions (from RaftLog) + * are applied to the SateMachine and we go a head and reset the + * inFlightSnapshotCount to 0. The expectation here is that after + * the leader election we want to start inFlightSnapshotCount from 0. + * + * The applyTransaction in OzoneManagerStateMachine processes + * the calls in an async manner and returns a CompletableFuture, + * the transactions are not yet fully processed by the + * OzoneManagerStateMachine when Ratis notifies leader ready. (RATIS-2313) + * + * Because of the async processing in OzoneManagerStateMachine and + * Ratis not waiting for the CompletableFuture to complete, we mark the + * leader as ready (reset inFlightSnapshotCount to 0) even before we complete + * processing all the pending transactions from the old term. If there is a + * create snapshot transaction in the Raft Log from old term and it gets + * processed after we reset inFlightSnapshotCount, the count would become -1 + * as we decrement the inFlightSnapshotCount when the OMSnapshotCreateRequest + * processing is completed. + * + * The workaround here is to make sure that we don't make inFlightSnapshotCount + * negative. We should be able to remove the workaround after RATIS-2313. + */ + int result = inFlightSnapshotCount.decrementAndGet(); + if (result < 0) { + resetInFlightSnapshotCount(); + } Review Comment: The workaround is fine. How about updating the comment to below? ```java // TODO this is a work around for the accounting logic of `inFlightSnapshotCount`. // - It incorrectly assumes that LeaderReady means that there are no inflight snapshot requests. // We may consider fixing it by waiting all the pending requests in notifyLeaderReady(). // - Also, it seems to have another bug that the PrepareState could disallow snapshot requests. // In such case, `inFlightSnapshotCount` won't be decremented. ``` We should file a separate Ozone JIRA for fixing it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org