junrao commented on code in PR #14111: URL: https://github.com/apache/kafka/pull/14111#discussion_r1286480033
########## clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java: ########## @@ -613,7 +623,7 @@ private long batchReady(long nowMs, boolean exhausted, TopicPartition part, Node long waitedTimeMs, boolean backingOff, boolean full, long nextReadyCheckDelayMs, Set<Node> readyNodes) { if (!readyNodes.contains(leader) && !isMuted(part)) { - long timeToWaitMs = backingOff ? retryBackoffMs : lingerMs; + long timeToWaitMs = backingOff ? retryBackoff.backoff(0) : lingerMs; Review Comment: Hmm, we should use the number of retries for the batch instead of 0 for calculating timeToWaitMs, right? Also, this is an existing issue. It seems that nowMs is never used. ########## clients/src/main/java/org/apache/kafka/clients/Metadata.java: ########## @@ -144,13 +159,25 @@ public long metadataExpireMs() { */ public synchronized int requestUpdate() { this.needFullUpdate = true; + this.backoffUpdateRequests = 0L; Review Comment: Hmm, this probably needs some more thought. For example, if a produce request fails, we request a metadata update and set backoffUpdateRequests to 0. However, if we exponentially back off the produce request because of stale metadata, it seems that we should do the same for refreshing the metadata. Otherwise, we likely will still have the stale metadata when retrying the produce request. Ditto for fetch requests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org