Re: [PR] KAFKA-17400: Added share fetch purgatory for delaying share fetch requests which cannot be completed because of reaching record lock partition limit [kafka]

via GitHub Mon, 09 Sep 2024 05:09:14 -0700


AndrewJSchofield commented on code in PR #16969:
URL: https://github.com/apache/kafka/pull/16969#discussion_r1750086012



##########
core/src/main/java/kafka/server/share/DelayedShareFetch.java:
##########
@@ -0,0 +1,210 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package kafka.server.share;
+
+import kafka.server.DelayedOperation;
+import kafka.server.DelayedOperationPurgatory;
+import kafka.server.LogReadResult;
+import kafka.server.QuotaFactory;
+import kafka.server.ReplicaManager;
+
+import org.apache.kafka.common.TopicIdPartition;
+import org.apache.kafka.common.requests.FetchRequest;
+import org.apache.kafka.storage.internals.log.FetchPartitionData;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import scala.Option;
+import scala.Tuple2;
+import scala.collection.Seq;
+import scala.jdk.javaapi.CollectionConverters;
+import scala.runtime.BoxedUnit;
+
+/**
+ * A delayed share fetch operation has been introduced in case there is no 
share partition for which we can acquire records. We will try to wait

Review Comment:
   I don't think this comment is strictly true now this PR has evolved into a 
more complete solution. The delayed share fetch operation is used whenever we 
cannot satisfy a share fetch immediately.



##########
core/src/main/java/kafka/server/share/DelayedShareFetchKey.java:
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package kafka.server.share;
+
+import kafka.server.DelayedOperationKey;
+
+import org.apache.kafka.common.TopicIdPartition;
+
+import java.util.Objects;
+
+/**
+ * A key for delayed operations that fetch data for share consumers.
+ */
+public class DelayedShareFetchKey implements DelayedOperationKey {
+    private final TopicIdPartition topicIdPartition;
+    private final String groupId;
+
+    DelayedShareFetchKey(TopicIdPartition topicIdPartition, String groupId) {

Review Comment:
   nit: I think group ID should come first logically. If you look at 
`keyLabel()` below, you have been inconsistent.



##########
core/src/main/java/kafka/server/share/ShareFetchUtils.java:
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package kafka.server.share;
+
+import kafka.server.ReplicaManager;
+
+import org.apache.kafka.common.TopicIdPartition;
+import org.apache.kafka.common.message.ShareFetchResponseData;
+import org.apache.kafka.common.protocol.Errors;
+import org.apache.kafka.common.record.FileRecords;
+import org.apache.kafka.common.requests.ListOffsetsRequest;
+import org.apache.kafka.storage.internals.log.FetchPartitionData;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.concurrent.CompletableFuture;
+
+import scala.Option;
+import scala.Tuple2;
+
+/**
+ * Utility class for post-processing of share fetch operations.
+ */
+public class ShareFetchUtils {
+    private static final Logger log = 
LoggerFactory.getLogger(ShareFetchUtils.class);
+
+    static CompletableFuture<Map<TopicIdPartition, 
ShareFetchResponseData.PartitionData>> processFetchResponse(
+            SharePartitionManager.ShareFetchPartitionData 
shareFetchPartitionData,
+            List<Tuple2<TopicIdPartition, FetchPartitionData>> responseData,
+            Map<SharePartitionManager.SharePartitionKey, SharePartition> 
partitionCacheMap,
+            ReplicaManager replicaManager
+    ) {
+        Map<TopicIdPartition, 
CompletableFuture<ShareFetchResponseData.PartitionData>> futures = new 
HashMap<>();
+        responseData.forEach(data -> {
+            TopicIdPartition topicIdPartition = data._1;
+            FetchPartitionData fetchPartitionData = data._2;
+
+            SharePartition sharePartition = partitionCacheMap.get(new 
SharePartitionManager.SharePartitionKey(
+                    shareFetchPartitionData.groupId(), topicIdPartition));
+            futures.put(topicIdPartition, 
sharePartition.acquire(shareFetchPartitionData.memberId(), fetchPartitionData)
+                    .handle((acquiredRecords, throwable) -> {
+                        log.trace("Acquired records for topicIdPartition: {} 
with share fetch data: {}, records: {}",
+                                topicIdPartition, shareFetchPartitionData, 
acquiredRecords);
+                        ShareFetchResponseData.PartitionData partitionData = 
new ShareFetchResponseData.PartitionData()
+                                
.setPartitionIndex(topicIdPartition.partition());
+
+                        if (throwable != null) {
+                            
partitionData.setErrorCode(Errors.forException(throwable).code());
+                            return partitionData;
+                        }
+
+                        if (fetchPartitionData.error.code() == 
Errors.OFFSET_OUT_OF_RANGE.code()) {
+                            // In case we get OFFSET_OUT_OF_RANGE error, 
that's because the LSO is later than the fetch offset.

Review Comment:
   In this case, this is Log Start Offset, as opposed to Last Stable Offset? 
Maybe this comment could be a bit tighter.



##########
core/src/main/java/kafka/server/share/DelayedShareFetch.java:
##########
@@ -0,0 +1,210 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package kafka.server.share;
+
+import kafka.server.DelayedOperation;
+import kafka.server.DelayedOperationPurgatory;
+import kafka.server.LogReadResult;
+import kafka.server.QuotaFactory;
+import kafka.server.ReplicaManager;
+
+import org.apache.kafka.common.TopicIdPartition;
+import org.apache.kafka.common.requests.FetchRequest;
+import org.apache.kafka.storage.internals.log.FetchPartitionData;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import scala.Option;
+import scala.Tuple2;
+import scala.collection.Seq;
+import scala.jdk.javaapi.CollectionConverters;
+import scala.runtime.BoxedUnit;
+
+/**
+ * A delayed share fetch operation has been introduced in case there is no 
share partition for which we can acquire records. We will try to wait
+ * for MaxWaitMs for records to be released else complete the share fetch 
request.
+ */
+public class DelayedShareFetch extends DelayedOperation {
+    private final SharePartitionManager.ShareFetchPartitionData 
shareFetchPartitionData;
+    private final ReplicaManager replicaManager;
+    private final Map<SharePartitionManager.SharePartitionKey, SharePartition> 
partitionCacheMap;
+    private Map<TopicIdPartition, FetchRequest.PartitionData> 
topicPartitionDataFromTryComplete = new LinkedHashMap<>();
+    private final DelayedOperationPurgatory<DelayedShareFetch> 
delayedShareFetchPurgatory;
+
+    private static final Logger log = 
LoggerFactory.getLogger(DelayedShareFetch.class);
+
+    DelayedShareFetch(
+            SharePartitionManager.ShareFetchPartitionData 
shareFetchPartitionData,
+            ReplicaManager replicaManager,
+            Map<SharePartitionManager.SharePartitionKey, SharePartition> 
partitionCacheMap,
+            DelayedOperationPurgatory<DelayedShareFetch> 
delayedShareFetchPurgatory) {
+        super(shareFetchPartitionData.fetchParams().maxWaitMs, Option.empty());
+        this.shareFetchPartitionData = shareFetchPartitionData;
+        this.replicaManager = replicaManager;
+        this.partitionCacheMap = partitionCacheMap;
+        this.delayedShareFetchPurgatory = delayedShareFetchPurgatory;
+    }
+
+    @Override
+    public void onExpiration() {
+    }
+
+    /**
+     * Complete the share fetch operation by fetching records for all 
partitions in the share fetch request irrespective
+     * of whether they have any acquired records. This is called when the 
fetch operation is forced to complete either
+     * because records can be acquired for some partitions or due to MaxWaitMs 
timeout.
+     */
+    @Override
+    public void onComplete() {
+        log.trace("Completing the delayed share fetch request for group {}, 
member {}, " +
+                        "topic partitions {}", 
shareFetchPartitionData.groupId(),
+                shareFetchPartitionData.memberId(), 
shareFetchPartitionData.partitionMaxBytes().keySet());
+
+        if (shareFetchPartitionData.future().isDone())
+            return;
+
+        Map<TopicIdPartition, FetchRequest.PartitionData> topicPartitionData;
+        // tryComplete did not invoke forceComplete, so we need to check if we 
have any partitions to fetch.
+        if (topicPartitionDataFromTryComplete.isEmpty())
+            topicPartitionData = acquirablePartitions();
+        // tryComplete invoked forceComplete, so we can use the data from 
tryComplete.
+        else
+            topicPartitionData = topicPartitionDataFromTryComplete;
+        try {
+            if (topicPartitionData.isEmpty()) {
+                // No locks for share partitions could be acquired, so we 
complete the request with an empty response.
+                
shareFetchPartitionData.future().complete(Collections.emptyMap());
+                return;
+            }
+            log.trace("Fetchable share partitions data: {} with groupId: {} 
fetch params: {}",
+                    topicPartitionData, shareFetchPartitionData.groupId(), 
shareFetchPartitionData.fetchParams());
+
+            Seq<Tuple2<TopicIdPartition, LogReadResult>> responseLogResult = 
replicaManager.readFromLog(
+                shareFetchPartitionData.fetchParams(),
+                CollectionConverters.asScala(
+                    topicPartitionData.entrySet().stream().map(entry ->
+                        new Tuple2<>(entry.getKey(), 
entry.getValue())).collect(Collectors.toList())
+                ),
+                QuotaFactory.UnboundedQuota$.MODULE$,
+                true);
+
+            List<Tuple2<TopicIdPartition, FetchPartitionData>> responseData = 
new ArrayList<>();
+            responseLogResult.foreach(tpLogResult -> {
+                TopicIdPartition topicIdPartition = tpLogResult._1();
+                LogReadResult logResult = tpLogResult._2();
+                FetchPartitionData fetchPartitionData = 
logResult.toFetchPartitionData(false);
+                responseData.add(new Tuple2<>(topicIdPartition, 
fetchPartitionData));
+                return BoxedUnit.UNIT;
+            });
+
+            log.trace("Data successfully retrieved by replica manager: {}", 
responseData);
+            ShareFetchUtils.processFetchResponse(shareFetchPartitionData, 
responseData, partitionCacheMap, replicaManager)
+                .whenComplete((result, throwable) -> {
+                    if (throwable != null) {
+                        log.error("Error processing fetch response for share 
partitions", throwable);
+                        
shareFetchPartitionData.future().completeExceptionally(throwable);
+                    } else {
+                        shareFetchPartitionData.future().complete(result);
+                    }
+                    // Releasing the lock to move ahead with the next request 
in queue.
+                    releasePartitionsLock(shareFetchPartitionData.groupId(), 
topicPartitionData.keySet());
+                    // If we have a fetch request completed for a 
topic-partition, it means  the HWM has advanced,
+                    // then we should check if there is a pending share fetch 
request for the topic-partition and complete it.
+                    result.keySet().forEach(topicIdPartition -> 
delayedShareFetchPurgatory.checkAndComplete(
+                            new DelayedShareFetchKey(topicIdPartition, 
shareFetchPartitionData.groupId())));
+                });
+
+        } catch (Exception e) {
+            // Release the locks acquired for the partitions in the share 
fetch request in case there is an exception
+            log.error("Error processing delayed share fetch request", e);
+            shareFetchPartitionData.future().completeExceptionally(e);
+            releasePartitionsLock(shareFetchPartitionData.groupId(), 
topicPartitionData.keySet());
+        }
+    }
+
+    /**
+     * Try to complete the fetch operation if we can acquire records for any 
partition in the share fetch request.
+     */
+    @Override
+    public boolean tryComplete() {
+        log.trace("Try to complete the delayed share fetch request for group 
{}, member {}, topic partitions {}",
+                shareFetchPartitionData.groupId(), 
shareFetchPartitionData.memberId(),
+                shareFetchPartitionData.partitionMaxBytes().keySet());
+
+        topicPartitionDataFromTryComplete = acquirablePartitions();
+
+        if (!topicPartitionDataFromTryComplete.isEmpty())
+            return forceComplete();
+        log.info("Can't acquire records for any partition in the share fetch 
request for group {}, member {}, " +
+                "topic partitions {}", shareFetchPartitionData.groupId(),
+                shareFetchPartitionData.memberId(), 
shareFetchPartitionData.partitionMaxBytes().keySet());
+        return false;
+    }
+
+    /**
+     * Prepare fetch request structure for partitions in the share fetch 
request for which we can acquire records.
+     */
+    // Visible for testing
+    Map<TopicIdPartition, FetchRequest.PartitionData> acquirablePartitions() {
+        // Initialize the topic partitions for which the fetch should be 
attempted.
+        Map<TopicIdPartition, FetchRequest.PartitionData> topicPartitionData = 
new LinkedHashMap<>();
+
+        
shareFetchPartitionData.partitionMaxBytes().keySet().forEach(topicIdPartition 
-> {
+            SharePartition sharePartition = partitionCacheMap.get(new 
SharePartitionManager.SharePartitionKey(
+                    shareFetchPartitionData.groupId(), topicIdPartition));
+
+            int partitionMaxBytes = 
shareFetchPartitionData.partitionMaxBytes().getOrDefault(topicIdPartition, 0);
+            // Add the share partition to the list of partitions to be fetched 
only if we can
+            // acquire the fetch lock on it.
+            if (sharePartition.maybeAcquireFetchLock()) {
+                // If the share partition is already at capacity, we should 
not attempt to fetch.
+                if (sharePartition.canAcquireRecords()) {
+                    topicPartitionData.put(
+                            topicIdPartition,
+                            new FetchRequest.PartitionData(
+                                    topicIdPartition.topicId(),
+                                    sharePartition.nextFetchOffset(),
+                                    0,
+                                    partitionMaxBytes,
+                                    Optional.empty()
+                            )
+                    );
+                } else {
+                    sharePartition.releaseFetchLock();
+                    log.trace("Record lock partition limit exceeded for 
SharePartition {}, " +
+                            "cannot acquire more records", sharePartition);
+                }
+            }
+        });
+        return topicPartitionData;
+    }
+
+    private void releasePartitionsLock(String groupId, Set<TopicIdPartition> 
topicIdPartitions) {

Review Comment:
   `releasePartitionLocks` I would say. This sounds like one lock for many 
partitions.



##########
core/src/main/java/kafka/server/share/DelayedShareFetchKey.java:
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package kafka.server.share;
+
+import kafka.server.DelayedOperationKey;
+
+import org.apache.kafka.common.TopicIdPartition;
+
+import java.util.Objects;
+
+/**
+ * A key for delayed operations that fetch data for share consumers.
+ */
+public class DelayedShareFetchKey implements DelayedOperationKey {
+    private final TopicIdPartition topicIdPartition;
+    private final String groupId;
+
+    DelayedShareFetchKey(TopicIdPartition topicIdPartition, String groupId) {
+        this.topicIdPartition = topicIdPartition;
+        this.groupId = groupId;
+    }
+
+    @Override
+    public boolean equals(Object o) {
+        if (this == o) return true;
+        if (o == null || getClass() != o.getClass()) return false;
+        DelayedShareFetchKey that = (DelayedShareFetchKey) o;
+        return topicIdPartition.equals(that.topicIdPartition) && 
groupId.equals(that.groupId);
+    }
+
+    @Override
+    public int hashCode() {
+        return Objects.hash(topicIdPartition, groupId);
+    }
+
+    @Override
+    public String toString() {
+        return "SharePartitionOperationKey(topicIdPartition=" + 
topicIdPartition +

Review Comment:
   This is really `DelayedShareFetchKey`.



##########
core/src/main/java/kafka/server/share/SharePartition.java:
##########
@@ -1664,6 +1673,12 @@ && 
checkForStartOffsetWithinBatch(inFlightBatch.firstOffset(), inFlightBatch.las
                     // Even if write share group state RPC call fails, we will 
still go ahead with the state transition.
                     // Update the cached state and start and end offsets after 
releasing the acquisition lock on timeout.
                     maybeUpdateCachedStateAndOffsets();
+
+                    // If we have an acquisition lock timeout for a 
share-partition, then we should check if
+                    // there is a pending share fetch request for the 
share-partition and complete it.
+                    DelayedShareFetchKey delayedShareFetchKey = new 
DelayedShareFetchKey(topicIdPartition, groupId);
+                    
delayedShareFetchPurgatory.checkAndComplete(delayedShareFetchKey);
+

Review Comment:
   nit: Spurious blank line



##########
core/src/main/scala/kafka/server/BrokerServer.scala:
##########
@@ -416,7 +416,8 @@ class BrokerServer(
         config.shareGroupConfig.shareGroupDeliveryCountLimit,
         config.shareGroupConfig.shareGroupPartitionMaxRecordLocks,
         persister,
-        new Metrics()
+        new Metrics(),
+        config.shareGroupConfig.shareFetchPurgatoryPurgeIntervalRequests()

Review Comment:
   nit: Unnecessary () and I suggest that it would be nicer to have all of the 
config arguments adjacent on the constructor to the SPM.



##########
core/src/main/java/kafka/server/share/SharePartitionManager.java:
##########
@@ -540,156 +558,54 @@ void maybeProcessFetchQueue() {
             return;
         }
 
+        if (shareFetchPartitionData.partitionMaxBytes.isEmpty()) {
+            // If there are no partitions to fetch then complete the future 
with an empty map.
+            shareFetchPartitionData.future.complete(Collections.emptyMap());
+            // Release the lock so that other threads can process the queue.
+            releaseProcessFetchQueueLock();
+            if (!fetchQueue.isEmpty())
+                maybeProcessFetchQueue();
+            return;
+        }
+
         try {
             
shareFetchPartitionData.partitionMaxBytes.keySet().forEach(topicIdPartition -> {
                 SharePartitionKey sharePartitionKey = sharePartitionKey(
                     shareFetchPartitionData.groupId,
                     topicIdPartition
                 );
-                SharePartition sharePartition = 
partitionCacheMap.computeIfAbsent(sharePartitionKey,
+                partitionCacheMap.computeIfAbsent(sharePartitionKey,
                     k -> {
                         long start = time.hiResClockMs();
                         SharePartition partition = new 
SharePartition(shareFetchPartitionData.groupId, topicIdPartition, 
maxInFlightMessages, maxDeliveryCount,
-                            recordLockDurationMs, timer, time, persister);
+                            recordLockDurationMs, timer, time, persister, 
delayedShareFetchPurgatory);
                         this.shareGroupMetrics.partitionLoadTime(start);
                         return partition;
                     });
-                int partitionMaxBytes = 
shareFetchPartitionData.partitionMaxBytes.getOrDefault(topicIdPartition, 0);
-                // Add the share partition to the list of partitions to be 
fetched only if we can
-                // acquire the fetch lock on it.
-                if (sharePartition.maybeAcquireFetchLock()) {
-                    // If the share partition is already at capacity, we 
should not attempt to fetch.
-                    if (sharePartition.canAcquireRecords()) {
-                        topicPartitionData.put(
-                            topicIdPartition,
-                            new FetchRequest.PartitionData(
-                                topicIdPartition.topicId(),
-                                sharePartition.nextFetchOffset(),
-                                0,
-                                partitionMaxBytes,
-                                Optional.empty()
-                            )
-                        );
-                    } else {
-                        sharePartition.releaseFetchLock();
-                        log.info("Record lock partition limit exceeded for 
SharePartition with key {}, " +
-                            "cannot acquire more records", sharePartitionKey);
-                    }
-                }
             });
 
-            if (topicPartitionData.isEmpty()) {
-                // No locks for share partitions could be acquired, so we 
complete the request and
-                // will re-fetch for the client in next poll.
-                
shareFetchPartitionData.future.complete(Collections.emptyMap());
-                // Though if no partitions can be locked then there must be 
some other request which
-                // is in-flight and should release the lock. But it's safe to 
release the lock as
-                // the lock on share partition already exists which 
facilitates correct behaviour
-                // with multiple requests from queue being processed.
-                releaseProcessFetchQueueLock();
-                if (!fetchQueue.isEmpty())
-                    maybeProcessFetchQueue();
-                return;
-            }
+            Set<Object> delayedShareFetchWatchKeys = new HashSet<>();
+            shareFetchPartitionData.partitionMaxBytes.keySet().forEach(
+                topicIdPartition -> delayedShareFetchWatchKeys.add(
+                    new DelayedShareFetchKey(topicIdPartition, 
shareFetchPartitionData.groupId)));
 
-            log.trace("Fetchable share partitions data: {} with groupId: {} 
fetch params: {}",
-                topicPartitionData, shareFetchPartitionData.groupId, 
shareFetchPartitionData.fetchParams);
-
-            replicaManager.fetchMessages(
-                shareFetchPartitionData.fetchParams,
-                CollectionConverters.asScala(
-                    topicPartitionData.entrySet().stream().map(entry ->
-                        new Tuple2<>(entry.getKey(), 
entry.getValue())).collect(Collectors.toList())
-                ),
-                QuotaFactory.UnboundedQuota$.MODULE$,
-                responsePartitionData -> {
-                    log.trace("Data successfully retrieved by replica manager: 
{}", responsePartitionData);
-                    List<Tuple2<TopicIdPartition, FetchPartitionData>> 
responseData = CollectionConverters.asJava(
-                        responsePartitionData);
-                    processFetchResponse(shareFetchPartitionData, 
responseData).whenComplete(
-                        (result, throwable) -> {
-                            if (throwable != null) {
-                                log.error("Error processing fetch response for 
share partitions", throwable);
-                                
shareFetchPartitionData.future.completeExceptionally(throwable);
-                            } else {
-                                
shareFetchPartitionData.future.complete(result);
-                            }
-                            // Releasing the lock to move ahead with the next 
request in queue.
-                            
releaseFetchQueueAndPartitionsLock(shareFetchPartitionData.groupId, 
topicPartitionData.keySet());
-                        });
-                    return BoxedUnit.UNIT;
-                });
+            // Add the share fetch to the delayed share fetch purgatory to 
process the fetch request.
+            addDelayedShareFetch(new 
DelayedShareFetch(shareFetchPartitionData, replicaManager, partitionCacheMap, 
delayedShareFetchPurgatory),
+                delayedShareFetchWatchKeys);
 
+            // Release the lock so that other threads can process the queue.
+            releaseProcessFetchQueueLock();
             // If there are more requests in the queue, then process them.
             if (!fetchQueue.isEmpty())
                 maybeProcessFetchQueue();
 
         } catch (Exception e) {
             // In case exception occurs then release the locks so queue can be 
further processed.
             log.error("Error processing fetch queue for share partitions", e);
-            shareFetchPartitionData.future.completeExceptionally(e);
-            
releaseFetchQueueAndPartitionsLock(shareFetchPartitionData.groupId, 
topicPartitionData.keySet());
+            releaseProcessFetchQueueLock();
         }

Review Comment:
   Doesn't the fetch queue potentially stall if this method exits 
exceptionally? You release the lock, but do not call `maybeProcessFetchQueue`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] KAFKA-17400: Added share fetch purgatory for delaying share fetch requests which cannot be completed because of reaching record lock partition limit [kafka]

Reply via email to