[ https://issues.apache.org/jira/browse/KAFKA-9677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058293#comment-17058293 ]
ASF GitHub Bot commented on KAFKA-9677: --------------------------------------- apovzner commented on pull request #8290: KAFKA-9677: Fix consumer fetch with small consume bandwidth quotas URL: https://github.com/apache/kafka/pull/8290 When we changed quota communication with KIP-219, fetch requests get throttled by returning empty response with the delay in `throttle_time_ms` and Kafka consumer retries again after the delay. With default configs, the maximum fetch size could be as big as 50MB (or 10MB per partition). The default broker config (1-second window, 10 full windows of tracked bandwidth/thread utilization usage) means that < 5MB/s consumer quota (per broker) may block consumers from being able to fetch any data. This PR ensures that consumers cannot get blocked by quota by capping `fetchMaxBytes` in KafkaApis.handleFetchRequest() to <tracking window> * <consume bandwidth quota>. In the example of default configs (10-second quota window) and 1MB/s consumer bandwidth quota, fetchMaxBytes would be capped to 10MB. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Low consume bandwidth quota may cause consumer not being able to fetch data > --------------------------------------------------------------------------- > > Key: KAFKA-9677 > URL: https://issues.apache.org/jira/browse/KAFKA-9677 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 2.0.1, 2.1.1, 2.2.2, 2.4.0, 2.3.1 > Reporter: Anna Povzner > Assignee: Anna Povzner > Priority: Major > > When we changed quota communication with KIP-219, fetch requests get > throttled by returning empty response with the delay in `throttle_time_ms` > and Kafka consumer retrying again after the delay. > With default configs, the maximum fetch size could be as big as 50MB (or 10MB > per partition). The default broker config (1-second window, 10 full windows > of tracked bandwidth/thread utilization usage) means that < 5MB/s consumer > quota (per broker) may stop fetch request from ever being successful. > Or the other way around: 1 MB/s consumer quota (per broker) means that any > fetch request that gets >= 10MB of data (10 seconds * 1MB/second) in the > response will never get through. From consumer point of view, the behavior > will be: Consumer will get an empty response with throttle_time_ms > 0, Kafka > consumer will wait for throttle time delay and then send fetch request again, > the fetch response is still too big so broker sends another empty response > with throttle time, and so on in never ending loop > h3. Proposed fix > Return less data in fetch response in this case: Cap `fetchMaxBytes` passed > to replicaManager.fetchMessages() from KafkaApis.handleFetchRequest() to > <tracking window> * <consume bandwidth quota>. In the example of default > configs and 1MB/s consumer bandwidth quota, fetchMaxBytes will be 10MB. -- This message was sent by Atlassian Jira (v8.3.4#803005)