Re: [PR] KAFKA-17743: Add minBytes implementation to DelayedShareFetch [kafka]

via GitHub Thu, 31 Oct 2024 11:21:21 -0700


junrao commented on code in PR #17539:
URL: https://github.com/apache/kafka/pull/17539#discussion_r1824937560



##########
core/src/main/java/kafka/server/share/DelayedShareFetch.java:
##########
@@ -65,6 +70,7 @@ public class DelayedShareFetch extends DelayedOperation {
         this.shareFetchData = shareFetchData;
         this.replicaManager = replicaManager;
         this.topicPartitionDataFromTryComplete = new LinkedHashMap<>();
+        this.logReadResponse = new LinkedHashMap<>();

Review Comment:
   topicPartitionDataFromTryComplete => partitionsToComplete ?
   logReadResponse => partitionsAlreadyFetched ?



##########
core/src/main/java/kafka/server/share/DelayedShareFetch.java:
##########
@@ -146,16 +145,34 @@ public void onComplete() {
      */
     @Override
     public boolean tryComplete() {
-        topicPartitionDataFromTryComplete = acquirablePartitions();
-
-        if (!topicPartitionDataFromTryComplete.isEmpty()) {
-            boolean completedByMe = forceComplete();
-            // If invocation of forceComplete is not successful, then that 
means the request is already completed
-            // hence release the acquired locks.
-            if (!completedByMe) {
-                releasePartitionLocks(shareFetchData.groupId(), 
topicPartitionDataFromTryComplete.keySet());
+        if (anySharePartitionNoLongerExists()) {

Review Comment:
   We are still calling `sharePartitionManager.sharePartition` in multiple 
places (`anySharePartitionNoLongerExists`, `acquirablePartitions`, 
`isMinBytesSatisfied` and `maybeUpdateFetchOffsetMetadataForTopicPartitions`), 
each of which needs to handle null SharePartition since the sharePartition 
could disappear any time. I was thinking that we could get all sharePartitions 
once at the beginning and pass them around to other methods. This way, the null 
handling is only done once.



##########
core/src/main/java/kafka/server/share/DelayedShareFetch.java:
##########
@@ -207,7 +224,158 @@ Map<TopicIdPartition, FetchRequest.PartitionData> 
acquirablePartitions() {
         return topicPartitionData;
     }
 
-    private void releasePartitionLocks(String groupId, Set<TopicIdPartition> 
topicIdPartitions) {
+    // In case, fetch offset metadata doesn't exist for one or more topic 
partitions, we do a
+    // replicaManager.readFromLog to populate the offset metadata.
+    private Map<TopicIdPartition, LogReadResult> 
maybeUpdateFetchOffsetMetadataForTopicPartitions(Map<TopicIdPartition, 
FetchRequest.PartitionData> topicPartitionData) {
+        Map<TopicIdPartition, FetchRequest.PartitionData> 
missingFetchOffsetMetadataTopicPartitions = new LinkedHashMap<>();
+        for (Map.Entry<TopicIdPartition, FetchRequest.PartitionData> entry : 
topicPartitionData.entrySet()) {
+            TopicIdPartition topicIdPartition = entry.getKey();
+            SharePartition sharePartition = 
sharePartitionManager.sharePartition(shareFetchData.groupId(), 
topicIdPartition);
+            if (sharePartition.fetchOffsetMetadata().isEmpty()) {
+                
missingFetchOffsetMetadataTopicPartitions.put(topicIdPartition, 
entry.getValue());
+            }
+        }
+        if (missingFetchOffsetMetadataTopicPartitions.isEmpty()) {
+            return null;
+        }
+        // We fetch data from replica manager corresponding to the topic 
partitions that have missing fetch offset metadata.
+        Map<TopicIdPartition, LogReadResult> replicaManagerReadResponseData = 
readFromLog(missingFetchOffsetMetadataTopicPartitions);
+        return 
updateFetchOffsetMetadataForMissingTopicPartitions(replicaManagerReadResponseData);
+    }
+
+    private Map<TopicIdPartition, LogReadResult> 
updateFetchOffsetMetadataForMissingTopicPartitions(
+        Map<TopicIdPartition, LogReadResult> replicaManagerReadResponseData) {
+        for (Map.Entry<TopicIdPartition, LogReadResult> entry : 
replicaManagerReadResponseData.entrySet()) {
+            TopicIdPartition topicIdPartition = entry.getKey();
+            SharePartition sharePartition = 
sharePartitionManager.sharePartition(shareFetchData.groupId(), 
topicIdPartition);
+            LogReadResult replicaManagerLogReadResult = entry.getValue();
+            if (replicaManagerLogReadResult == null) {
+                log.debug("Replica manager read log result {} does not contain 
topic partition {}",
+                    replicaManagerReadResponseData, topicIdPartition);
+                continue;
+            }
+            
sharePartition.updateLatestFetchOffsetMetadata(Optional.of(replicaManagerLogReadResult.info().fetchOffsetMetadata));
+        }
+        return replicaManagerReadResponseData;
+    }
+
+    private boolean isMinBytesSatisfied(Map<TopicIdPartition, 
FetchRequest.PartitionData> topicPartitionData) {
+        long accumulatedSize = 0;
+        try {
+            for (Map.Entry<TopicIdPartition, FetchRequest.PartitionData> entry 
: topicPartitionData.entrySet()) {
+                TopicIdPartition topicIdPartition = entry.getKey();
+                FetchRequest.PartitionData partitionData = entry.getValue();
+                LogOffsetMetadata endOffsetMetadata = 
endOffsetMetadataForTopicPartition(topicIdPartition);
+
+                if (endOffsetMetadata == 
LogOffsetMetadata.UNKNOWN_OFFSET_METADATA)
+                    continue;
+
+                SharePartition sharePartition = 
sharePartitionManager.sharePartition(shareFetchData.groupId(), 
topicIdPartition);
+
+                Optional<LogOffsetMetadata> optionalFetchOffsetMetadata = 
sharePartition.fetchOffsetMetadata();
+                if (optionalFetchOffsetMetadata.isEmpty() || 
optionalFetchOffsetMetadata.get() == LogOffsetMetadata.UNKNOWN_OFFSET_METADATA)
+                    continue;
+                LogOffsetMetadata fetchOffsetMetadata = 
optionalFetchOffsetMetadata.get();
+
+                if (fetchOffsetMetadata.messageOffset > 
endOffsetMetadata.messageOffset) {
+                    log.debug("Satisfying delayed share fetch request for 
group {}, member {} since it is fetching later segments of " +
+                        "topicIdPartition {}", shareFetchData.groupId(), 
shareFetchData.memberId(), topicIdPartition);
+                    return true;
+                } else if (fetchOffsetMetadata.messageOffset < 
endOffsetMetadata.messageOffset) {
+                    if (fetchOffsetMetadata.onOlderSegment(endOffsetMetadata)) 
{
+                        // This can happen when the fetch operation is falling 
behind the current segment or the partition
+                        // has just rolled a new segment.
+                        log.debug("Satisfying delayed share fetch request for 
group {}, member {} immediately since it is fetching older " +
+                            "segments of topicIdPartition {}", 
shareFetchData.groupId(), shareFetchData.memberId(), topicIdPartition);
+                        return true;
+                    } else if 
(fetchOffsetMetadata.onSameSegment(endOffsetMetadata)) {
+                        // we take the partition fetch size as upper bound 
when accumulating the bytes.
+                        long bytesAvailable = 
Math.min(endOffsetMetadata.positionDiff(fetchOffsetMetadata), 
partitionData.maxBytes);
+                        accumulatedSize += bytesAvailable;
+                    }
+                }
+            }
+            return accumulatedSize >= shareFetchData.fetchParams().minBytes;
+        } catch (Exception e) {
+            // Ideally we should complete the share fetch request's future 
exceptionally in this case from tryComplete itself.
+            // A function that can be utilized is handleFetchException in an 
in-flight PR https://github.com/apache/kafka/pull/16842.
+            // Perhaps, once the mentioned PR is merged, I'll change it to 
better exception handling.
+            log.error("Error processing the minBytes criteria for share fetch 
request", e);
+            return true;
+        }
+    }
+
+    private LogOffsetMetadata 
endOffsetMetadataForTopicPartition(TopicIdPartition topicIdPartition) {
+        Partition partition = 
replicaManager.getPartitionOrException(topicIdPartition.topicPartition());
+        LogOffsetSnapshot offsetSnapshot = 
partition.fetchOffsetSnapshot(Optional.empty(), true);
+        // The FetchIsolation type that we use for share fetch is 
FetchIsolation.HIGH_WATERMARK. In the future, we can
+        // extend it other FetchIsolation types.
+        FetchIsolation isolationType = shareFetchData.fetchParams().isolation;
+        if (isolationType == FetchIsolation.LOG_END)
+            return offsetSnapshot.logEndOffset;
+        else if (isolationType == FetchIsolation.HIGH_WATERMARK)
+            return offsetSnapshot.highWatermark;
+        else
+            return offsetSnapshot.lastStableOffset;
+
+    }
+
+    private Map<TopicIdPartition, LogReadResult> 
readFromLog(Map<TopicIdPartition, FetchRequest.PartitionData> 
topicPartitionData) {
+        Seq<Tuple2<TopicIdPartition, LogReadResult>> responseLogResult = 
replicaManager.readFromLog(
+            shareFetchData.fetchParams(),
+            CollectionConverters.asScala(
+                topicPartitionData.entrySet().stream().map(entry ->
+                    new Tuple2<>(entry.getKey(), 
entry.getValue())).collect(Collectors.toList())
+            ),
+            QuotaFactory.UNBOUNDED_QUOTA,
+            true);
+
+        Map<TopicIdPartition, LogReadResult> responseData = new HashMap<>();
+        responseLogResult.foreach(tpLogResult -> {
+            responseData.put(tpLogResult._1(), tpLogResult._2());
+            return BoxedUnit.UNIT;
+        });
+
+        log.trace("Data successfully retrieved by replica manager: {}", 
responseData);
+        return responseData;
+    }
+
+    private boolean anySharePartitionNoLongerExists() {
+        for (TopicIdPartition topicIdPartition: 
shareFetchData.partitionMaxBytes().keySet()) {
+            SharePartition sharePartition = 
sharePartitionManager.sharePartition(shareFetchData.groupId(), 
topicIdPartition);
+            if (sharePartition == null) {
+                log.debug("Encountered null share partition for groupId={}, 
topicIdPartition={}. Skipping it.", shareFetchData.groupId(), topicIdPartition);
+                return true;
+            }
+        }
+        return false;
+    }
+
+    // Visible for testing.
+    Map<TopicIdPartition, LogReadResult> 
combineLogReadResponse(Map<TopicIdPartition, FetchRequest.PartitionData> 
topicPartitionData) {
+        Map<TopicIdPartition, FetchRequest.PartitionData> 
missingLogReadTopicPartitions = new LinkedHashMap<>();
+        for (Map.Entry<TopicIdPartition, FetchRequest.PartitionData> entry : 
topicPartitionData.entrySet()) {
+            TopicIdPartition topicIdPartition = entry.getKey();
+            FetchRequest.PartitionData partitionData = entry.getValue();
+            if (!logReadResponse.containsKey(topicIdPartition)) {
+                missingLogReadTopicPartitions.put(topicIdPartition, 
partitionData);
+            }
+        }

Review Comment:
   This code can be a bit more concise.
   
   ```
   topicPartitionData.forEach((topicIdPartition, partitionData) -> {
       if (!logReadResponse.containsKey(topicIdPartition)) {
           missingLogReadTopicPartitions.put(topicIdPartition, partitionData);
       }
   });
   ```



##########
core/src/main/java/kafka/server/share/DelayedShareFetch.java:
##########
@@ -207,7 +224,158 @@ Map<TopicIdPartition, FetchRequest.PartitionData> 
acquirablePartitions() {
         return topicPartitionData;
     }
 
-    private void releasePartitionLocks(String groupId, Set<TopicIdPartition> 
topicIdPartitions) {
+    // In case, fetch offset metadata doesn't exist for one or more topic 
partitions, we do a
+    // replicaManager.readFromLog to populate the offset metadata.
+    private Map<TopicIdPartition, LogReadResult> 
maybeUpdateFetchOffsetMetadataForTopicPartitions(Map<TopicIdPartition, 
FetchRequest.PartitionData> topicPartitionData) {

Review Comment:
   maybeUpdateFetchOffsetMetadataForTopicPartitions => 
maybeReadFromLogAndUpdateFetchOffsetMetadata ?



##########
core/src/main/java/kafka/server/share/DelayedShareFetch.java:
##########
@@ -207,7 +224,158 @@ Map<TopicIdPartition, FetchRequest.PartitionData> 
acquirablePartitions() {
         return topicPartitionData;
     }
 
-    private void releasePartitionLocks(String groupId, Set<TopicIdPartition> 
topicIdPartitions) {
+    // In case, fetch offset metadata doesn't exist for one or more topic 
partitions, we do a
+    // replicaManager.readFromLog to populate the offset metadata.
+    private Map<TopicIdPartition, LogReadResult> 
maybeUpdateFetchOffsetMetadataForTopicPartitions(Map<TopicIdPartition, 
FetchRequest.PartitionData> topicPartitionData) {
+        Map<TopicIdPartition, FetchRequest.PartitionData> 
missingFetchOffsetMetadataTopicPartitions = new LinkedHashMap<>();
+        for (Map.Entry<TopicIdPartition, FetchRequest.PartitionData> entry : 
topicPartitionData.entrySet()) {
+            TopicIdPartition topicIdPartition = entry.getKey();
+            SharePartition sharePartition = 
sharePartitionManager.sharePartition(shareFetchData.groupId(), 
topicIdPartition);
+            if (sharePartition.fetchOffsetMetadata().isEmpty()) {
+                
missingFetchOffsetMetadataTopicPartitions.put(topicIdPartition, 
entry.getValue());
+            }
+        }
+        if (missingFetchOffsetMetadataTopicPartitions.isEmpty()) {
+            return null;

Review Comment:
   Could we just return an empty map? This way, the caller doesn't need to do 
the null check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] KAFKA-17743: Add minBytes implementation to DelayedShareFetch [kafka]

Reply via email to