Re: [PR] KAFKA-17002: Integrated partition leader epoch for Persister APIs (KIP-932) [kafka]

via GitHub Thu, 24 Oct 2024 23:19:08 -0700


apoorvmittal10 commented on code in PR #16842:
URL: https://github.com/apache/kafka/pull/16842#discussion_r1815160655



##########
core/src/main/java/kafka/server/share/SharePartitionManager.java:
##########
@@ -606,22 +641,51 @@ private void maybeCompleteInitializationWithException(
             return;
         }
 
+        // Remove the partition from the cache as it's failed to initialize.
+        partitionCacheMap.remove(sharePartitionKey);
+        // The partition initialization failed, so complete the request with 
the exception.
+        // The server should not be in this state, so log the error on broker 
and surface the same
+        // to the client. The broker should not be in this state, investigate 
the root cause of the error.
+        log.error("Error initializing share partition with key {}", 
sharePartitionKey, throwable);
+        maybeCompleteShareFetchExceptionally(future, 
Collections.singletonList(topicIdPartition), throwable);
+    }
+
+    private void handleSharePartitionException(
+        SharePartitionKey sharePartitionKey,
+        Throwable throwable
+    ) {
         if (throwable instanceof NotLeaderOrFollowerException || throwable 
instanceof FencedStateEpochException) {
             log.info("The share partition with key {} is fenced: {}", 
sharePartitionKey, throwable.getMessage());
             // The share partition is fenced hence remove the partition from 
map and let the client retry.
             // But surface the error to the client so client might take some 
action i.e. re-fetch
             // the metadata and retry the fetch on new leader.
             partitionCacheMap.remove(sharePartitionKey);

Review Comment:
   So we remove share partition from cache at 2 places. 1) When initialization 
failed 2) When fenced error occurs.
   For 1, it's safe as its still in initilization state.
   
   For 2, I was in mixed opinion. As all interactions with share partition 
happens currently while fetching instance from cache hence once removed or 
re-initialized the new state should appear. But if old share partition instance 
is already held by some other thread then `acknowledge` will anyways fail but 
`fetch` can succeed. Do you think it would be sensible to have another state in 
`SharePartition` as `Fenced`, which once set then `fetch lock` on that share 
partition cannot be attained. Do you think we should have an `active` status 
check on all Share Partition APIs as well?
   cc: @adixitconfluent  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] KAFKA-17002: Integrated partition leader epoch for Persister APIs (KIP-932) [kafka]

Reply via email to