apoorvmittal10 commented on code in PR #16842:
URL: https://github.com/apache/kafka/pull/16842#discussion_r1815160655
##########
core/src/main/java/kafka/server/share/SharePartitionManager.java:
##########
@@ -606,22 +641,51 @@ private void maybeCompleteInitializationWithException(
return;
}
+ // Remove the partition from the cache as it's failed to initialize.
+ partitionCacheMap.remove(sharePartitionKey);
+ // The partition initialization failed, so complete the request with
the exception.
+ // The server should not be in this state, so log the error on broker
and surface the same
+ // to the client. The broker should not be in this state, investigate
the root cause of the error.
+ log.error("Error initializing share partition with key {}",
sharePartitionKey, throwable);
+ maybeCompleteShareFetchExceptionally(future,
Collections.singletonList(topicIdPartition), throwable);
+ }
+
+ private void handleSharePartitionException(
+ SharePartitionKey sharePartitionKey,
+ Throwable throwable
+ ) {
if (throwable instanceof NotLeaderOrFollowerException || throwable
instanceof FencedStateEpochException) {
log.info("The share partition with key {} is fenced: {}",
sharePartitionKey, throwable.getMessage());
// The share partition is fenced hence remove the partition from
map and let the client retry.
// But surface the error to the client so client might take some
action i.e. re-fetch
// the metadata and retry the fetch on new leader.
partitionCacheMap.remove(sharePartitionKey);
Review Comment:
So we remove share partition from cache at 2 places. 1) When initialization
failed 2) When fenced error occurs.
For 1, it's safe as its still in initilization state.
For 2, I was in mixed opinion. As all interactions with share partition
happens currently while fetching instance from cache hence once removed or
re-initialized the new state should appear. But if old share partition instance
is already held by some other thread then `acknowledge` will anyways fail but
`fetch` can succeed. Do you think it would be sensible to have another state in
`SharePartition` as `Fenced`, which once set then `fetch lock` on that share
partition cannot be attained. Do you think we should have an `active` status
check on all Share Partition APIs as well?
cc: @adixitconfluent
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]