lianetm commented on code in PR #20521:
URL: https://github.com/apache/kafka/pull/20521#discussion_r2430990238
##########
clients/clients-integration-tests/src/test/java/org/apache/kafka/clients/consumer/ConsumerIntegrationTest.java:
##########
@@ -123,7 +123,6 @@ public void onPartitionsAssigned(Collection<TopicPartition>
partitions) {
});
TestUtils.waitForCondition(() ->
consumer.poll(Duration.ofSeconds(1)).count() == 1,
- 5000,
Review Comment:
I'm afraid this change might be hiding something relevant. This number was
to repeat the operation (poll with 1s timeout) several times until that timeout
expired. Having the async consumer not giving results with it, makes me think
that we may be still blocking more that needed on a single internal poll
iteration (same bit we addressed on this
[comment](https://github.com/apache/kafka/pull/20521/files#r2411533109) to
ensure we reduce the `pollTimeout` to the `retryBackoff` if there are some
partitions missing positions, but seems we're still missing part of it.
Back to reviewing the `pollForFetches`, I realized that the difference might
be that the classic consumer does not attempt to updatePositions until the
coordinator poll returns true (member active in a group)
https://github.com/apache/kafka/blob/44edff4bc7ad38cc9134cbab730c1b71869e3edd/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ClassicKafkaConsumer.java#L679-L683
(so the cachedSubscriptionHasAllFetchPositions remains false that whole time
really, and pollTimeout will be reduced to retryBackoff I expect)
Honestly I think this might be a gap before this PR, but surfacing now
because we do need to be able to perform the internal poll iterations to get
things done? As I see it, the gap is:
- classic consumer: calls to poll(1s) (high timeout), will block on
`pollForFetches` only for the `retryBackoff` (short timeout) until the member
is active in a group and retrieves all positions. Only then (in a group with
positions for all partitions), it will block for the full/remaining
pollTimeout.
- new consumer: same calls to poll(1s) (high timeout) will potentially block
for the full 1s waiting for data on the buffer, not performing any of the
polling logic introduced with this PR.
Makes sense? Still going over this, tricky bit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]