lianetm commented on code in PR #18737:
URL: https://github.com/apache/kafka/pull/18737#discussion_r1945191355
##########
clients/src/main/java/org/apache/kafka/clients/consumer/internals/CommitRequestManager.java:
##########
@@ -324,7 +323,7 @@ public CompletableFuture<Void>
maybeAutoCommitSyncBeforeRevocation(final long de
CompletableFuture<Void> result = new CompletableFuture<>();
OffsetCommitRequestState requestState =
- createOffsetCommitRequest(subscriptions.allConsumed(), deadlineMs);
+ createOffsetCommitRequest(latestPartitionOffsets, deadlineMs);
Review Comment:
yes, I get your concern (and I'm still going over all this), but this is how
I see it now:
1. we would be only waiting to trigger the reconciliation, not to process it
until the end (once triggered, it will carry on in the background as always,
not blocking on anything new, just the commit request to complete and the
callbacks, as always, we should keep this behaviour)
3. we're just saying we will align the reconciliation triggering with the
consumer poll (like the classic does btw) because we need to wait for stable
positions to start reconciling a new assignment. So yes, there is a delay to
start reconciling, but it's for correctness: we have to commit before a
rebalance, but we cannot guarantee we can commit correctly the consumed
positions if we don't have stable positions.
Looking at the poll loop from a high level, these are the 3 main blocks:
1. app thread poll start (PollEvent)
2. update fetch positions
3. fetch
So we're saying we change to trigger a reconciliation only on 1, when we
have stable positions, so we know the allConsumed to commit (and then
rebalance). Of course that means that if we get a HB response with a new
assignment right after the `PollEvent` (1), we would have to wait until the
next `PollEvent` to start reconciling that assignment. But with the current
version of triggering reconciliations freely in the background, that's exactly
the root cause of the problem imo: we start a reconciliation when 2/3 are
happening, and it's a mess because we cannot determine the `allConsumed` to
commit, it's a moving target (until we know the records have been returned, and
that's on the next PollEvent).
Thoughts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]