Re: [PR] KAFKA-16200: Enforce that RequestManager implementations respect user-provided timeout [kafka]

via GitHub Mon, 03 Jun 2024 08:48:00 -0700


lianetm commented on code in PR #16031:
URL: https://github.com/apache/kafka/pull/16031#discussion_r1624689065



##########
clients/src/main/java/org/apache/kafka/clients/consumer/internals/TopicMetadataRequestManager.java:
##########
@@ -61,30 +62,28 @@
  */
 
 public class TopicMetadataRequestManager implements RequestManager {
+    private final Time time;
     private final boolean allowAutoTopicCreation;
     private final List<TopicMetadataRequestState> inflightRequests;
+    private final int requestTimeoutMs;
     private final long retryBackoffMs;
     private final long retryBackoffMaxMs;
     private final Logger log;
     private final LogContext logContext;
 
-    public TopicMetadataRequestManager(final LogContext context, final 
ConsumerConfig config) {
+    public TopicMetadataRequestManager(final LogContext context, final Time 
time, final ConsumerConfig config) {
         logContext = context;
         log = logContext.logger(getClass());
+        this.time = time;
         inflightRequests = new LinkedList<>();
+        requestTimeoutMs = 
config.getInt(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG);
         retryBackoffMs = 
config.getLong(ConsumerConfig.RETRY_BACKOFF_MS_CONFIG);
         retryBackoffMaxMs = 
config.getLong(ConsumerConfig.RETRY_BACKOFF_MAX_MS_CONFIG);
         allowAutoTopicCreation = 
config.getBoolean(ConsumerConfig.ALLOW_AUTO_CREATE_TOPICS_CONFIG);
     }
 
     @Override
     public NetworkClientDelegate.PollResult poll(final long currentTimeMs) {
-        // Prune any requests which have timed out
-        List<TopicMetadataRequestState> expiredRequests = 
inflightRequests.stream()
-                .filter(req -> req.isExpired(currentTimeMs))
-                .collect(Collectors.toList());
-        expiredRequests.forEach(TopicMetadataRequestState::expire);
-

Review Comment:
   I totally agree that we shouldn't prune expired request on poll before 
returning the requests (because it does not give requests with timeout 0 their 
single time to run that we wanted). But not pruning at all on poll goes a bit 
farther than what I was expecting.  
   
   Seems conceptually not right that the manager will keep requests that it has 
sent out, and it knows are expired. It then relies on a timeout response to 
expire the request, and with this the outcome is the same, but timing is very 
different: the manager will wait for the full request_timeout before throwing a 
`TimeoutException` for a request that it knew it was expired since the moment 
it poll for it the first time right? (all that time it will keep the request on 
its queue as in-flight, skipping it on every poll just because it hasn't 
received a response). I expect that tests that throw Timeout on poll would fail 
with this, and they would need to wait for a network timeout response that was 
not needed before, so that the manager actually throws TimeoutException, that's 
what I'm referring to. 
   
   Managers already apply expiration logic to requests:
   1. on poll (before returning the requests) and on response (before this PR)
   2. on response only (this PR)
   3. on poll (after returning the requests) and on responses  -> would this be 
a sensible compromise, so that we end up with managers that apply the 
expiration logic consistently wherever they can? It's closer to what we had 
before this PR, and also fixes the fire-and-forget behaviour we were missing. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] KAFKA-16200: Enforce that RequestManager implementations respect user-provided timeout [kafka]

Reply via email to