[jira] [Commented] (KAFKA-4725) Kafka broker fails due to OOM when producer exceeds throttling quota for extended periods of time
[ https://issues.apache.org/jira/browse/KAFKA-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851730#comment-15851730 ] Ismael Juma commented on KAFKA-4725: Nice catch, a contribution via a PR would be welcome indeed. > Kafka broker fails due to OOM when producer exceeds throttling quota for > extended periods of time > - > > Key: KAFKA-4725 > URL: https://issues.apache.org/jira/browse/KAFKA-4725 > Project: Kafka > Issue Type: Bug > Components: core, producer >Affects Versions: 0.10.1.1 > Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8 >Reporter: Jeff Chao >Priority: Critical > Labels: reliability > Fix For: 0.10.3.0, 0.10.2.1 > > Attachments: oom-references.png > > > Steps to Reproduce: > 1. Create a non-compacted topic with 1 partition > 2. Set a produce quota of 512 KB/s > 3. Send messages at 20 MB/s > 4. Observe heap memory growth as time progresses > Investigation: > While running performance tests with a user configured with a produce quota, > we found that the lead broker serving the requests would exhaust heap memory > if the producer sustained a inbound request throughput greater than the > produce quota. > Upon further investigation, we took a heap dump from that broker process and > discovered the ThrottledResponse object has a indirect reference to the > byte[] holding the messages associated with the ProduceRequest. > We're happy contributing a patch but in the meantime wanted to first raise > the issue and get feedback from the community. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4725) Kafka broker fails due to OOM when producer exceeds throttling quota for extended periods of time
[ https://issues.apache.org/jira/browse/KAFKA-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851880#comment-15851880 ] Tim Carey-Smith commented on KAFKA-4725: Hi there, Jeff and I have prototyped a fix for this bug. We repeated our stress tests against a new build and have not yet been able to reproduce the leak. The branch is hosted on GitHub at https://github.com/apache/kafka/compare/0.10.1.1...heroku:fix-throttled-response-leak Before we open a PR, which base branch should we set as the target for the PR? Thanks, Tim > Kafka broker fails due to OOM when producer exceeds throttling quota for > extended periods of time > - > > Key: KAFKA-4725 > URL: https://issues.apache.org/jira/browse/KAFKA-4725 > Project: Kafka > Issue Type: Bug > Components: core, producer >Affects Versions: 0.10.1.1 > Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8 >Reporter: Jeff Chao >Priority: Critical > Labels: reliability > Fix For: 0.10.3.0, 0.10.2.1 > > Attachments: oom-references.png > > > Steps to Reproduce: > 1. Create a non-compacted topic with 1 partition > 2. Set a produce quota of 512 KB/s > 3. Send messages at 20 MB/s > 4. Observe heap memory growth as time progresses > Investigation: > While running performance tests with a user configured with a produce quota, > we found that the lead broker serving the requests would exhaust heap memory > if the producer sustained a inbound request throughput greater than the > produce quota. > Upon further investigation, we took a heap dump from that broker process and > discovered the ThrottledResponse object has a indirect reference to the > byte[] holding the messages associated with the ProduceRequest. > We're happy contributing a patch but in the meantime wanted to first raise > the issue and get feedback from the community. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4725) Kafka broker fails due to OOM when producer exceeds throttling quota for extended periods of time
[ https://issues.apache.org/jira/browse/KAFKA-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851885#comment-15851885 ] Ismael Juma commented on KAFKA-4725: Great. The target should be trunk, we cherry-pick to other branches during the merge. > Kafka broker fails due to OOM when producer exceeds throttling quota for > extended periods of time > - > > Key: KAFKA-4725 > URL: https://issues.apache.org/jira/browse/KAFKA-4725 > Project: Kafka > Issue Type: Bug > Components: core, producer >Affects Versions: 0.10.1.1 > Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8 >Reporter: Jeff Chao >Priority: Critical > Labels: reliability > Fix For: 0.10.3.0, 0.10.2.1 > > Attachments: oom-references.png > > > Steps to Reproduce: > 1. Create a non-compacted topic with 1 partition > 2. Set a produce quota of 512 KB/s > 3. Send messages at 20 MB/s > 4. Observe heap memory growth as time progresses > Investigation: > While running performance tests with a user configured with a produce quota, > we found that the lead broker serving the requests would exhaust heap memory > if the producer sustained a inbound request throughput greater than the > produce quota. > Upon further investigation, we took a heap dump from that broker process and > discovered the ThrottledResponse object has a indirect reference to the > byte[] holding the messages associated with the ProduceRequest. > We're happy contributing a patch but in the meantime wanted to first raise > the issue and get feedback from the community. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4725) Kafka broker fails due to OOM when producer exceeds throttling quota for extended periods of time
[ https://issues.apache.org/jira/browse/KAFKA-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851898#comment-15851898 ] Jeff Chao commented on KAFKA-4725: -- Ok, we'll base it off trunk and open up a PR. Thanks. > Kafka broker fails due to OOM when producer exceeds throttling quota for > extended periods of time > - > > Key: KAFKA-4725 > URL: https://issues.apache.org/jira/browse/KAFKA-4725 > Project: Kafka > Issue Type: Bug > Components: core, producer >Affects Versions: 0.10.1.1 > Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8 >Reporter: Jeff Chao >Priority: Critical > Labels: reliability > Fix For: 0.10.3.0, 0.10.2.1 > > Attachments: oom-references.png > > > Steps to Reproduce: > 1. Create a non-compacted topic with 1 partition > 2. Set a produce quota of 512 KB/s > 3. Send messages at 20 MB/s > 4. Observe heap memory growth as time progresses > Investigation: > While running performance tests with a user configured with a produce quota, > we found that the lead broker serving the requests would exhaust heap memory > if the producer sustained a inbound request throughput greater than the > produce quota. > Upon further investigation, we took a heap dump from that broker process and > discovered the ThrottledResponse object has a indirect reference to the > byte[] holding the messages associated with the ProduceRequest. > We're happy contributing a patch but in the meantime wanted to first raise > the issue and get feedback from the community. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4725) Kafka broker fails due to OOM when producer exceeds throttling quota for extended periods of time
[ https://issues.apache.org/jira/browse/KAFKA-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852189#comment-15852189 ] ASF GitHub Bot commented on KAFKA-4725: --- GitHub user halorgium opened a pull request: https://github.com/apache/kafka/pull/2496 KAFKA-4725: Stop leaking messages in produce request body when requests are delayed This change is in response to [KAFKA-4725](https://issues.apache.org/jira/browse/KAFKA-4725). When a produce request is received, if the user/client is exceeding their produce quota, the response will be delayed until the quota is refilled appropriately. Unfortunately, the request body is still referenced in the callback which in turn leaks the messages contained within the request. This change allows the `KafkaApis` method to take ownership of the request body from the `RequestChannel.Request` object. I am not sure whether this breaks other invariants which are assumed within other parts of Kafka. You can merge this pull request into a Git repository by running: $ git pull https://github.com/heroku/kafka fix-throttled-response-leak Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2496.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2496 commit ddb0541b156db546fbf6e065670fb25d6e4baba2 Author: Tim Carey-Smith Date: 2017-02-01T23:18:43Z Stop leaking produce request in throttled requests Further isolate the request from the callbacks Remove pointless changes Move body ownership logic into RequestChannel > Kafka broker fails due to OOM when producer exceeds throttling quota for > extended periods of time > - > > Key: KAFKA-4725 > URL: https://issues.apache.org/jira/browse/KAFKA-4725 > Project: Kafka > Issue Type: Bug > Components: core, producer >Affects Versions: 0.10.1.1 > Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8 >Reporter: Jeff Chao >Priority: Critical > Labels: reliability > Fix For: 0.10.3.0, 0.10.2.1 > > Attachments: oom-references.png > > > Steps to Reproduce: > 1. Create a non-compacted topic with 1 partition > 2. Set a produce quota of 512 KB/s > 3. Send messages at 20 MB/s > 4. Observe heap memory growth as time progresses > Investigation: > While running performance tests with a user configured with a produce quota, > we found that the lead broker serving the requests would exhaust heap memory > if the producer sustained a inbound request throughput greater than the > produce quota. > Upon further investigation, we took a heap dump from that broker process and > discovered the ThrottledResponse object has a indirect reference to the > byte[] holding the messages associated with the ProduceRequest. > We're happy contributing a patch but in the meantime wanted to first raise > the issue and get feedback from the community. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4725) Kafka broker fails due to OOM when producer exceeds throttling quota for extended periods of time
[ https://issues.apache.org/jira/browse/KAFKA-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856965#comment-15856965 ] Tim Carey-Smith commented on KAFKA-4725: We have run stress tests on builds which include this patch based on the 0.10.1.1 tag, the 0.10.2 branch and the trunk branch. We were unable to reproduce the memory leak and feel comfortable with this change. > Kafka broker fails due to OOM when producer exceeds throttling quota for > extended periods of time > - > > Key: KAFKA-4725 > URL: https://issues.apache.org/jira/browse/KAFKA-4725 > Project: Kafka > Issue Type: Bug > Components: core, producer >Affects Versions: 0.10.1.1 > Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8 >Reporter: Jeff Chao >Priority: Critical > Labels: reliability > Fix For: 0.10.2.0, 0.10.3.0 > > Attachments: oom-references.png > > > Steps to Reproduce: > 1. Create a non-compacted topic with 1 partition > 2. Set a produce quota of 512 KB/s > 3. Send messages at 20 MB/s > 4. Observe heap memory growth as time progresses > Investigation: > While running performance tests with a user configured with a produce quota, > we found that the lead broker serving the requests would exhaust heap memory > if the producer sustained a inbound request throughput greater than the > produce quota. > Upon further investigation, we took a heap dump from that broker process and > discovered the ThrottledResponse object has a indirect reference to the > byte[] holding the messages associated with the ProduceRequest. > We're happy contributing a patch but in the meantime wanted to first raise > the issue and get feedback from the community. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4725) Kafka broker fails due to OOM when producer exceeds throttling quota for extended periods of time
[ https://issues.apache.org/jira/browse/KAFKA-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856990#comment-15856990 ] ASF GitHub Bot commented on KAFKA-4725: --- Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/2496 > Kafka broker fails due to OOM when producer exceeds throttling quota for > extended periods of time > - > > Key: KAFKA-4725 > URL: https://issues.apache.org/jira/browse/KAFKA-4725 > Project: Kafka > Issue Type: Bug > Components: core, producer >Affects Versions: 0.10.1.1 > Environment: Ubuntu Trusty (14.04.5), Oracle JDK 8 >Reporter: Jeff Chao >Priority: Critical > Labels: reliability > Fix For: 0.10.2.0, 0.10.3.0 > > Attachments: oom-references.png > > > Steps to Reproduce: > 1. Create a non-compacted topic with 1 partition > 2. Set a produce quota of 512 KB/s > 3. Send messages at 20 MB/s > 4. Observe heap memory growth as time progresses > Investigation: > While running performance tests with a user configured with a produce quota, > we found that the lead broker serving the requests would exhaust heap memory > if the producer sustained a inbound request throughput greater than the > produce quota. > Upon further investigation, we took a heap dump from that broker process and > discovered the ThrottledResponse object has a indirect reference to the > byte[] holding the messages associated with the ProduceRequest. > We're happy contributing a patch but in the meantime wanted to first raise > the issue and get feedback from the community. -- This message was sent by Atlassian JIRA (v6.3.15#6346)