szetszwo commented on code in PR #9796:
URL: https://github.com/apache/ozone/pull/9796#discussion_r2842292299


##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/ReadConsistency.java:
##########
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.om.helpers;
+
+import 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos.ReadConsistencyType;
+
+/**
+ * Supported read consistency. It consists of two elements
+ * <ol>
+ *   <li>
+ *     {@link ConsistencyType} which specifies different types of read 
consistency
+ *     (e.g. LINEARIZABLE, LOCAL_LEASE)
+ *   </li>
+ *   <li>
+ *     Whether to allow follower read. Some consistency types supports
+ *     follower read (e.g. LINEARIZABLE) while other consistency types only
+ *     supports leader read (e.g. NON_LINEARIZABLE).
+ *   </li>
+ * </ol>
+ *
+ * handles different types of read consistency (e.g. LINEARIZABLE, LOCAL_LEASE)
+ */
+public enum ReadConsistency {
+  DEFAULT(ConsistencyType.NON_LINEARIZABLE, false),
+  STALE(ConsistencyType.STALE, true),
+  LINEARIZABLE_LEADER_READ(ConsistencyType.LINEARIZABLE, false),
+  LINEARIZABLE_FOLLOWER_READ(ConsistencyType.LINEARIZABLE, true),
+  LOCAL_LEASE_FOLLOWER_READ(ConsistencyType.LOCAL_LEASE, true);

Review Comment:
   1. When LINEARIZABLE is enabled, do we really want to disallow follower read?
   2. Stale read can be supported by local lease with infinite log lag and time 
limit.  We may remove STALE for simplicity.
   3. I think we don't need the inner enum `ConsistencyType`.  Just add a 
boolean for linearizable.
   4. `isAllowFollowerRead` sounds odd.  Let's use nouns for the fields and add 
the verb to the method.
   
   If the answer of 1. is "no", I suggest the code below:
   
   ```java
   public enum ReadConsistency {
     DEFAULT(false, false),
     LINEARIZABLE(true, true),
     LOCAL_LEASE(false, true);
   
     private final boolean linearizable;
     private final boolean followerRead;
   
     ReadConsistency(boolean linearizable, boolean followerRead) {
       this.linearizable = linearizable;
       this.followerRead = followerRead;
     }
   
     public boolean isLinearizable() {
       return linearizable;
     }
   
     public boolean allowFollowerRead() {
       return followerRead;
     }
   
     ...
   }
   ```
   



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/protocolPB/OzoneManagerProtocolServerSideTranslatorPB.java:
##########
@@ -216,42 +221,152 @@ public OMRequest getLastRequestToSubmit() {
 
   private OMResponse submitReadRequestToOM(OMRequest request)
       throws ServiceException {
-    // Read from leader or followers using linearizable read
-    if (ozoneManager.getConfig().isFollowerReadLocalLeaseEnabled() &&
-        allowFollowerReadLocalLease(omRatisServer.getServerDivision(),
-            ozoneManager.getConfig().getFollowerReadLocalLeaseLagLimit(),
-            ozoneManager.getConfig().getFollowerReadLocalLeaseTimeMs())) {
-      ozoneManager.getMetrics().incNumFollowerReadLocalLeaseSuccess();
+    if (request.getCmdType().equals(PrepareStatus)) {
+      // PrepareStatus is an OM request that only target a single OM node.
+      // Therefore, all PrepareStatus requests should be served immediately 
without failover regardless
+      // of the OM node leadership or the read consistency. See 
PrepareSubCommand.
+      // The implementation is not ideal, but exists for compatibility reason.
       return handler.handleReadRequest(request);
-    } 
-    // Get current OM's role
-    RaftServerStatus raftServerStatus = omRatisServer.getLeaderStatus();
-    // === 1. Follower linearizable read ===
-    if (raftServerStatus == NOT_LEADER && omRatisServer.isLinearizableRead()) {
-      ozoneManager.getMetrics().incNumLinearizableRead();
-      return ozoneManager.getOmExecutionFlow().submit(request, false);
     }
-    // === 2. Leader local read (skip ReadIndex if allowed) ===
-    if (raftServerStatus == LEADER_AND_READY || 
request.getCmdType().equals(PrepareStatus)) {
-      if (ozoneManager.getConfig().isAllowLeaderSkipLinearizableRead()) {
-        ozoneManager.getMetrics().incNumLeaderSkipLinearizableRead();
-        // leader directly serves local committed data
+
+    if (!request.hasReadConsistencyHint() || 
!request.getReadConsistencyHint().hasConsistencyType() ||
+        request.getReadConsistencyHint().getConsistencyType() == 
CONSISTENCY_TYPE_UNKNOWN) {
+      // Read from leader or followers using linearizable read
+      if (ozoneManager.getConfig().isFollowerReadLocalLeaseEnabled() &&
+          allowFollowerReadLocalLease(omRatisServer.getServerDivision(),
+              ozoneManager.getConfig().getFollowerReadLocalLeaseLagLimit(),
+              ozoneManager.getConfig().getFollowerReadLocalLeaseTimeMs())) {
+        ozoneManager.getMetrics().incNumFollowerReadLocalLeaseSuccess();
         return handler.handleReadRequest(request);
       }
-      // otherwise use linearizable path when enabled
-      if (omRatisServer.isLinearizableRead()) {
+      // Get current OM's role
+      RaftServerStatus raftServerStatus = omRatisServer.getLeaderStatus();
+      // === 1. Follower linearizable read ===
+      if (raftServerStatus == NOT_LEADER && 
omRatisServer.isLinearizableRead()) {
         ozoneManager.getMetrics().incNumLinearizableRead();
         return ozoneManager.getOmExecutionFlow().submit(request, false);
       }
+      // === 2. Leader local read (skip ReadIndex if allowed) ===
+      if (raftServerStatus == LEADER_AND_READY) {
+        if (ozoneManager.getConfig().isAllowLeaderSkipLinearizableRead()) {
+          ozoneManager.getMetrics().incNumLeaderSkipLinearizableRead();
+          // leader directly serves local committed data
+          return handler.handleReadRequest(request);
+        }
+        // otherwise use linearizable path when enabled
+        if (omRatisServer.isLinearizableRead()) {
+          ozoneManager.getMetrics().incNumLinearizableRead();
+          return ozoneManager.getOmExecutionFlow().submit(request, false);
+        }
 
-      // fallback to local read
-      return handler.handleReadRequest(request);
+        // fallback to local read
+        return handler.handleReadRequest(request);
+      } else {
+        throw createLeaderErrorException(raftServerStatus);
+      }
     } else {
-      throw createLeaderErrorException(raftServerStatus);
+      // If read consistency hint is specified, we should try to respect it 
although
+      // there is no guarantee since it depends on the OM node configuration 
(e.g.
+      // whether OM Raft server enables linearizable read).
+      ReadConsistencyHint readConsistencyHint = 
request.getReadConsistencyHint();
+      ReadConsistencyType consistencyType = 
readConsistencyHint.getConsistencyType();
+      RaftServerStatus raftServerStatus;
+      switch (consistencyType) {
+      case STALE:
+        // Serve the stale read request immediately for both leader and 
follower
+        ozoneManager.getMetrics().incNumStaleRead();
+        return handler.handleReadRequest(request);
+      case LOCAL_LEASE_FOLLOWER_READ:
+        raftServerStatus = omRatisServer.getLeaderStatus();
+        switch (raftServerStatus) {
+        case NOT_LEADER:
+          if (!ozoneManager.getConfig().isFollowerReadLocalLeaseEnabled()) {
+            throw createLeaderErrorException(raftServerStatus);
+          }
+          LocalLeaseContext localLeaseContext = 
readConsistencyHint.getLocalLeaseContext();
+          long localLeaseLagLimit = localLeaseContext.getLagLimit() > 0 ?
+              localLeaseContext.getLagLimit() : 
ozoneManager.getConfig().getFollowerReadLocalLeaseLagLimit();
+          long localLeaseLeaseTimeMs = localLeaseContext.getLeaseTimeMs() > 0 ?
+              localLeaseContext.getLeaseTimeMs() : 
ozoneManager.getConfig().getFollowerReadLocalLeaseTimeMs();
+          if (allowFollowerReadLocalLease(omRatisServer.getServerDivision(),
+              localLeaseLagLimit, localLeaseLeaseTimeMs)) {
+            ozoneManager.getMetrics().incNumFollowerReadLocalLeaseSuccess();
+            return handler.handleReadRequest(request);
+          }
+          // The LocalLease lag is too high, trigger failover
+          throw createLeaderErrorException(raftServerStatus);
+        case LEADER_AND_NOT_READY:

Review Comment:
   For LOCAL_LEASE_FOLLOWER_READ, the LEADER_AND_READY case should be treated 
as the same as the NOT_LEADER case, i.e. just consider it as a follower.



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/protocolPB/OzoneManagerProtocolServerSideTranslatorPB.java:
##########
@@ -216,42 +221,152 @@ public OMRequest getLastRequestToSubmit() {
 
   private OMResponse submitReadRequestToOM(OMRequest request)
       throws ServiceException {
-    // Read from leader or followers using linearizable read
-    if (ozoneManager.getConfig().isFollowerReadLocalLeaseEnabled() &&
-        allowFollowerReadLocalLease(omRatisServer.getServerDivision(),
-            ozoneManager.getConfig().getFollowerReadLocalLeaseLagLimit(),
-            ozoneManager.getConfig().getFollowerReadLocalLeaseTimeMs())) {
-      ozoneManager.getMetrics().incNumFollowerReadLocalLeaseSuccess();
+    if (request.getCmdType().equals(PrepareStatus)) {
+      // PrepareStatus is an OM request that only target a single OM node.
+      // Therefore, all PrepareStatus requests should be served immediately 
without failover regardless
+      // of the OM node leadership or the read consistency. See 
PrepareSubCommand.
+      // The implementation is not ideal, but exists for compatibility reason.
       return handler.handleReadRequest(request);
-    } 
-    // Get current OM's role
-    RaftServerStatus raftServerStatus = omRatisServer.getLeaderStatus();
-    // === 1. Follower linearizable read ===
-    if (raftServerStatus == NOT_LEADER && omRatisServer.isLinearizableRead()) {
-      ozoneManager.getMetrics().incNumLinearizableRead();
-      return ozoneManager.getOmExecutionFlow().submit(request, false);
     }
-    // === 2. Leader local read (skip ReadIndex if allowed) ===
-    if (raftServerStatus == LEADER_AND_READY || 
request.getCmdType().equals(PrepareStatus)) {
-      if (ozoneManager.getConfig().isAllowLeaderSkipLinearizableRead()) {
-        ozoneManager.getMetrics().incNumLeaderSkipLinearizableRead();
-        // leader directly serves local committed data
+
+    if (!request.hasReadConsistencyHint() || 
!request.getReadConsistencyHint().hasConsistencyType() ||
+        request.getReadConsistencyHint().getConsistencyType() == 
CONSISTENCY_TYPE_UNKNOWN) {
+      // Read from leader or followers using linearizable read
+      if (ozoneManager.getConfig().isFollowerReadLocalLeaseEnabled() &&
+          allowFollowerReadLocalLease(omRatisServer.getServerDivision(),
+              ozoneManager.getConfig().getFollowerReadLocalLeaseLagLimit(),
+              ozoneManager.getConfig().getFollowerReadLocalLeaseTimeMs())) {
+        ozoneManager.getMetrics().incNumFollowerReadLocalLeaseSuccess();
         return handler.handleReadRequest(request);
       }
-      // otherwise use linearizable path when enabled
-      if (omRatisServer.isLinearizableRead()) {
+      // Get current OM's role
+      RaftServerStatus raftServerStatus = omRatisServer.getLeaderStatus();
+      // === 1. Follower linearizable read ===
+      if (raftServerStatus == NOT_LEADER && 
omRatisServer.isLinearizableRead()) {
         ozoneManager.getMetrics().incNumLinearizableRead();
         return ozoneManager.getOmExecutionFlow().submit(request, false);
       }
+      // === 2. Leader local read (skip ReadIndex if allowed) ===
+      if (raftServerStatus == LEADER_AND_READY) {
+        if (ozoneManager.getConfig().isAllowLeaderSkipLinearizableRead()) {
+          ozoneManager.getMetrics().incNumLeaderSkipLinearizableRead();
+          // leader directly serves local committed data
+          return handler.handleReadRequest(request);
+        }
+        // otherwise use linearizable path when enabled
+        if (omRatisServer.isLinearizableRead()) {
+          ozoneManager.getMetrics().incNumLinearizableRead();
+          return ozoneManager.getOmExecutionFlow().submit(request, false);
+        }
 
-      // fallback to local read
-      return handler.handleReadRequest(request);
+        // fallback to local read
+        return handler.handleReadRequest(request);
+      } else {
+        throw createLeaderErrorException(raftServerStatus);
+      }
     } else {
-      throw createLeaderErrorException(raftServerStatus);
+      // If read consistency hint is specified, we should try to respect it 
although
+      // there is no guarantee since it depends on the OM node configuration 
(e.g.
+      // whether OM Raft server enables linearizable read).
+      ReadConsistencyHint readConsistencyHint = 
request.getReadConsistencyHint();
+      ReadConsistencyType consistencyType = 
readConsistencyHint.getConsistencyType();
+      RaftServerStatus raftServerStatus;
+      switch (consistencyType) {
+      case STALE:
+        // Serve the stale read request immediately for both leader and 
follower
+        ozoneManager.getMetrics().incNumStaleRead();
+        return handler.handleReadRequest(request);
+      case LOCAL_LEASE_FOLLOWER_READ:
+        raftServerStatus = omRatisServer.getLeaderStatus();
+        switch (raftServerStatus) {
+        case NOT_LEADER:
+          if (!ozoneManager.getConfig().isFollowerReadLocalLeaseEnabled()) {
+            throw createLeaderErrorException(raftServerStatus);
+          }
+          LocalLeaseContext localLeaseContext = 
readConsistencyHint.getLocalLeaseContext();
+          long localLeaseLagLimit = localLeaseContext.getLagLimit() > 0 ?
+              localLeaseContext.getLagLimit() : 
ozoneManager.getConfig().getFollowerReadLocalLeaseLagLimit();
+          long localLeaseLeaseTimeMs = localLeaseContext.getLeaseTimeMs() > 0 ?
+              localLeaseContext.getLeaseTimeMs() : 
ozoneManager.getConfig().getFollowerReadLocalLeaseTimeMs();

Review Comment:
   Use localLeaseContext.getLagLimit() and hasLeaseTimeMs().  We may use -1 for 
allowing infinite lag/time.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to