lianetm commented on code in PR #18737:
URL: https://github.com/apache/kafka/pull/18737#discussion_r1945191355
##########
clients/src/main/java/org/apache/kafka/clients/consumer/internals/CommitRequestManager.java:
##########
@@ -324,7 +323,7 @@ public CompletableFuture<Void>
maybeAutoCommitSyncBeforeRevocation(final long de
CompletableFuture<Void> result = new CompletableFuture<>();
OffsetCommitRequestState requestState =
- createOffsetCommitRequest(subscriptions.allConsumed(), deadlineMs);
+ createOffsetCommitRequest(latestPartitionOffsets, deadlineMs);
Review Comment:
yes, I get your concern (and I'm still going over all this), but this is how
I see it now:
1. we would be only waiting to trigger the reconciliation, not to process it
until the end (once triggered, it will carry on in the background as always,
not blocking on anything new, just the commit request to complete and the
callbacks, as always)
3. we're just saying we will align the reconciliation triggering with the
consumer poll (like the classic does btw) because we need to wait for stable
positions to start reconciling a new assignment. So yes, there is a delay to
start reconciling, but it's for correctness: we have to commit before a
rebalance, but we cannot guarantee we can commit correctly the consumed
positions if we don't have stable positions.
Looking at the poll loop from a high level, these are the 3 main blocks:
1. app thread poll start with PollEvent
2. update fetch positions
3. fetch
So we're saying we trigger a reconciliation only on 1, when we have stable
positions, so we know the allConsumed to commit (and then rebalance). Of course
that means that if we get a HB response with a new assignment right after the
`PollEvent` (1), we would have to wait until the next `PollEvent` to start
reconciling that assignment. But with the current version of triggering
reconciliations freely in the background, that's exactly the root cause of the
problem imo: we start a reconciliation when 2/3 are happening, and it's a mess
because we cannot determine the `allConsumed` to commit, it's a moving target
(until we know the records have been returned, and that's on the next
PollEvent).
Thoughts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]