lianetm commented on code in PR #20521:
URL: https://github.com/apache/kafka/pull/20521#discussion_r2430990238


##########
clients/clients-integration-tests/src/test/java/org/apache/kafka/clients/consumer/ConsumerIntegrationTest.java:
##########
@@ -123,7 +123,6 @@ public void onPartitionsAssigned(Collection<TopicPartition> 
partitions) {
             });
 
             TestUtils.waitForCondition(() -> 
consumer.poll(Duration.ofSeconds(1)).count() == 1,
-                    5000,

Review Comment:
   I'm afraid this change might be hiding something relevant. This number was 
to repeat the operation (poll with 1s timeout) several times until that timeout 
expired. Having the async consumer not giving results with it, makes me think 
that we may be still blocking more that needed on a single internal poll 
iteration (same bit we addressed on this 
[comment](https://github.com/apache/kafka/pull/20521/files#r2411533109) to 
ensure we reduce the `pollTimeout` to the `retryBackoff` if there are some 
partitions missing positions, but seems we're still missing part of it.  
   
   Back to reviewing the `pollForFetches`, I realized that the difference might 
be that the classic consumer does not attempt to updatePositions until the 
coordinator poll returns true (member active in a group)
   
https://github.com/apache/kafka/blob/44edff4bc7ad38cc9134cbab730c1b71869e3edd/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ClassicKafkaConsumer.java#L679-L683
 (so the cachedSubscriptionHasAllFetchPositions remains false that whole time 
really, and pollTimeout will be reduced to retryBackoff I expect)
   
   Honestly I think this might be a gap before this PR, but surfacing now 
because we do need to be able to perform the internal poll iterations to get 
things done? As I see it, the gap is: 
   - classic consumer: calls to poll(1s) (high timeout), will block on 
`pollForFetches` only for the `retryBackoff` (short timeout) until the member 
is active in a group and retrieves all positions. Only then (in a group with 
positions for all partitions), it will block for the full/remaining 
pollTimeout.  
   - new consumer: same calls to poll(1s) (high timeout) will potentially block 
for the full 1s waiting for data on the buffer, not performing any of the 
polling logic introduced with this PR. 
   
   Makes sense? Still going over this, tricky bit. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to