junrao commented on code in PR #19437:
URL: https://github.com/apache/kafka/pull/19437#discussion_r2049498194
##########
core/src/main/java/kafka/server/share/DelayedShareFetch.java:
##########
@@ -277,9 +323,15 @@ public boolean tryComplete() {
return false;
} catch (Exception e) {
log.error("Error processing delayed share fetch request", e);
- releasePartitionLocks(topicPartitionData.keySet());
- partitionsAcquired.clear();
- partitionsAlreadyFetched.clear();
+ // In case we have a remote fetch exception, we have already
released locks for partitions which have potential
+ // local log read. We do not release locks for partitions which
have a remote storage read because we need to
Review Comment:
@adixitconfluent : I am saying that the code has the potential issue of
never releasing the share partition lock. This is a bit subtle, but here is a
possible scenario.
Thread 1 calls `DelayedOperationPurgatory.checkAndComplete()`, which
eventually will call the following.
```
boolean safeTryComplete() {
lock.lock();
try {
if (isCompleted()) return false;
else return tryComplete();
} finally {
lock.unlock();
}
}
```
Suppose that `isCompleted()` returns false and thread 1 is just about to
call tryComplete().
Now, the expiration thread kicks in and calls `run()` in the following. It
sets `completed` to true and runs through `onComplete()`.
```
public void run() {
if (forceComplete())
onExpiration();
}
public boolean forceComplete() {
if (completed.compareAndSet(false, true)) {
// cancel the timeout timer
cancel();
onComplete();
return true;
} else {
return false;
}
}
```
Now thread 1 continues in `tryComplete()`. It acquires the share partition
lock for a remote partition and sets `remoteStorageFetchException`. It then
calls `forceComplete()`. Since `isCompleted()` is already true, it will return
false immediately without calling `onComplete()`. So, the acquired share
partition lock will never be released. The same problem exists for local fetch
in `tryComplete()` and we have the following code to handle the lock release.
```
boolean completedByMe = forceComplete();
// If invocation of forceComplete is not successful,
then that means the request is already completed
// hence release the acquired locks.
if (!completedByMe) {
releasePartitionLocks(partitionsAcquired.keySet());
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]