[jira] [Commented] (KAFKA-3310) fetch requests can trigger repeated NPE when quota is enabled
[ https://issues.apache.org/jira/browse/KAFKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179001#comment-15179001 ] ASF GitHub Bot commented on KAFKA-3310: --- Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/989 > fetch requests can trigger repeated NPE when quota is enabled > - > > Key: KAFKA-3310 > URL: https://issues.apache.org/jira/browse/KAFKA-3310 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.9.0.1 >Reporter: Jun Rao >Assignee: Aditya Auradkar >Priority: Blocker > Fix For: 0.10.0.0 > > > We saw the following NPE when consumer quota is enabled. NPE is triggered on > every fetch request from the client. > java.lang.NullPointerException > at > kafka.server.ClientQuotaManager.recordAndMaybeThrottle(ClientQuotaManager.scala:122) > at > kafka.server.KafkaApis.kafka$server$KafkaApis$$sendResponseCallback$3(KafkaApis.scala:419) > at > kafka.server.KafkaApis$$anonfun$handleFetchRequest$1.apply(KafkaApis.scala:436) > at > kafka.server.KafkaApis$$anonfun$handleFetchRequest$1.apply(KafkaApis.scala:436) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:431) > at kafka.server.KafkaApis.handle(KafkaApis.scala:69) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > One possible cause of this is the logic of removing inactive sensors. > Currently, in ClientQuotaManager, we create two sensors per clientId: a > throttleTimeSensor and a quotaSensor. Each sensor expires if it's not > actively updated for 1 hour. What can happen is that initially, the quota is > not exceeded. So, quotaSensor is being updated actively, but > throttleTimeSensor is not. At some point, throttleTimeSensor is removed by > the expiring thread. Now, we are in a situation that quotaSensor is > registered, but throttleTimeSensor is not. Later on, if the quota is > exceeded, we will hit the above NPE when trying to update throttleTimeSensor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3310) fetch requests can trigger repeated NPE when quota is enabled
[ https://issues.apache.org/jira/browse/KAFKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174366#comment-15174366 ] Aditya Auradkar commented on KAFKA-3310: [~junrao] - can you take a look? > fetch requests can trigger repeated NPE when quota is enabled > - > > Key: KAFKA-3310 > URL: https://issues.apache.org/jira/browse/KAFKA-3310 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Jun Rao > > We saw the following NPE when consumer quota is enabled. NPE is triggered on > every fetch request from the client. > java.lang.NullPointerException > at > kafka.server.ClientQuotaManager.recordAndMaybeThrottle(ClientQuotaManager.scala:122) > at > kafka.server.KafkaApis.kafka$server$KafkaApis$$sendResponseCallback$3(KafkaApis.scala:419) > at > kafka.server.KafkaApis$$anonfun$handleFetchRequest$1.apply(KafkaApis.scala:436) > at > kafka.server.KafkaApis$$anonfun$handleFetchRequest$1.apply(KafkaApis.scala:436) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:431) > at kafka.server.KafkaApis.handle(KafkaApis.scala:69) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > One possible cause of this is the logic of removing inactive sensors. > Currently, in ClientQuotaManager, we create two sensors per clientId: a > throttleTimeSensor and a quotaSensor. Each sensor expires if it's not > actively updated for 1 hour. What can happen is that initially, the quota is > not exceeded. So, quotaSensor is being updated actively, but > throttleTimeSensor is not. At some point, throttleTimeSensor is removed by > the expiring thread. Now, we are in a situation that quotaSensor is > registered, but throttleTimeSensor is not. Later on, if the quota is > exceeded, we will hit the above NPE when trying to update throttleTimeSensor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3310) fetch requests can trigger repeated NPE when quota is enabled
[ https://issues.apache.org/jira/browse/KAFKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174365#comment-15174365 ] ASF GitHub Bot commented on KAFKA-3310: --- GitHub user auradkar opened a pull request: https://github.com/apache/kafka/pull/989 KAFKA-3310: Fix for NPEs observed when throttling clients. The fix basically ensures that the throttleTimeSensor is non-null before handing off to record the metric value. We also record the throttle time to 0 so that we don't recreate the sensor always. You can merge this pull request into a Git repository by running: $ git pull https://github.com/auradkar/kafka KAFKA-3310 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/989.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #989 commit cd5007eb3c94ae2d1983cc6a4b9a9fe4e96ff1b1 Author: Aditya AuradkarDate: 2016-03-01T20:18:59Z KAFKA-3310: Fix for NPEs observed when throttling clients. The fix basically ensures that the throttleTimeSensor is non-null before handing off to record the metric value. We also record the throttle time to 0 so that we don't recreate the sensor always. > fetch requests can trigger repeated NPE when quota is enabled > - > > Key: KAFKA-3310 > URL: https://issues.apache.org/jira/browse/KAFKA-3310 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Jun Rao > > We saw the following NPE when consumer quota is enabled. NPE is triggered on > every fetch request from the client. > java.lang.NullPointerException > at > kafka.server.ClientQuotaManager.recordAndMaybeThrottle(ClientQuotaManager.scala:122) > at > kafka.server.KafkaApis.kafka$server$KafkaApis$$sendResponseCallback$3(KafkaApis.scala:419) > at > kafka.server.KafkaApis$$anonfun$handleFetchRequest$1.apply(KafkaApis.scala:436) > at > kafka.server.KafkaApis$$anonfun$handleFetchRequest$1.apply(KafkaApis.scala:436) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:431) > at kafka.server.KafkaApis.handle(KafkaApis.scala:69) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > One possible cause of this is the logic of removing inactive sensors. > Currently, in ClientQuotaManager, we create two sensors per clientId: a > throttleTimeSensor and a quotaSensor. Each sensor expires if it's not > actively updated for 1 hour. What can happen is that initially, the quota is > not exceeded. So, quotaSensor is being updated actively, but > throttleTimeSensor is not. At some point, throttleTimeSensor is removed by > the expiring thread. Now, we are in a situation that quotaSensor is > registered, but throttleTimeSensor is not. Later on, if the quota is > exceeded, we will hit the above NPE when trying to update throttleTimeSensor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3310) fetch requests can trigger repeated NPE when quota is enabled
[ https://issues.apache.org/jira/browse/KAFKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173295#comment-15173295 ] Jun Rao commented on KAFKA-3310: [~aauradkar], that depends. In this case, the NPE is triggered directly when handling the fetch request in KafkaApis. The throttle time sensor is actually recorded before we add the request to the delay queue. So, we will send an empty fetch response with an unexpected error. However, the same NPE could be triggered when we try to complete a fetch request from the fetch purgatory. In this case, we won't even be able to send a fetch response. So the fetch request will timeout. What's worse is that there could be other fetch requests (both consumer and follower) in the fetch purgatory off the same key. Since we hit the unexpected exception while evaluating the completeness of this particular fetch request, we will skip the checking of other fetch requests on the same chain and therefore may delay other fetch requests. It seems that this problem can show up pretty easily. Just upgrade the broker to 0.9.0, start a consumer, wait for more than an hour, then set the consumer quota. If the consumer fetch request is now throttled, we will hit the NPE. Recording 0 on the throttled time sensor probably fixes most of the problem, but I am not sure if it fixes this completely. Since these two sensors are not updated at exactly the same time, it seems that it's still possible for throttled time sensor to expire before quota sensor? > fetch requests can trigger repeated NPE when quota is enabled > - > > Key: KAFKA-3310 > URL: https://issues.apache.org/jira/browse/KAFKA-3310 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Jun Rao > > We saw the following NPE when consumer quota is enabled. NPE is triggered on > every fetch request from the client. > java.lang.NullPointerException > at > kafka.server.ClientQuotaManager.recordAndMaybeThrottle(ClientQuotaManager.scala:122) > at > kafka.server.KafkaApis.kafka$server$KafkaApis$$sendResponseCallback$3(KafkaApis.scala:419) > at > kafka.server.KafkaApis$$anonfun$handleFetchRequest$1.apply(KafkaApis.scala:436) > at > kafka.server.KafkaApis$$anonfun$handleFetchRequest$1.apply(KafkaApis.scala:436) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:431) > at kafka.server.KafkaApis.handle(KafkaApis.scala:69) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > One possible cause of this is the logic of removing inactive sensors. > Currently, in ClientQuotaManager, we create two sensors per clientId: a > throttleTimeSensor and a quotaSensor. Each sensor expires if it's not > actively updated for 1 hour. What can happen is that initially, the quota is > not exceeded. So, quotaSensor is being updated actively, but > throttleTimeSensor is not. At some point, throttleTimeSensor is removed by > the expiring thread. Now, we are in a situation that quotaSensor is > registered, but throttleTimeSensor is not. Later on, if the quota is > exceeded, we will hit the above NPE when trying to update throttleTimeSensor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3310) fetch requests can trigger repeated NPE when quota is enabled
[ https://issues.apache.org/jira/browse/KAFKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173223#comment-15173223 ] Aditya Auradkar commented on KAFKA-3310: [~junrao] - Just making sure, you observe that the response is still delayed right? The throttle time sensor is the last thing that is recorded and the element has been added to the delay queue, so the fetchResponseCallback should fire after the throttle time. > fetch requests can trigger repeated NPE when quota is enabled > - > > Key: KAFKA-3310 > URL: https://issues.apache.org/jira/browse/KAFKA-3310 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Jun Rao > > We saw the following NPE when consumer quota is enabled. NPE is triggered on > every fetch request from the client. > java.lang.NullPointerException > at > kafka.server.ClientQuotaManager.recordAndMaybeThrottle(ClientQuotaManager.scala:122) > at > kafka.server.KafkaApis.kafka$server$KafkaApis$$sendResponseCallback$3(KafkaApis.scala:419) > at > kafka.server.KafkaApis$$anonfun$handleFetchRequest$1.apply(KafkaApis.scala:436) > at > kafka.server.KafkaApis$$anonfun$handleFetchRequest$1.apply(KafkaApis.scala:436) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:431) > at kafka.server.KafkaApis.handle(KafkaApis.scala:69) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > One possible cause of this is the logic of removing inactive sensors. > Currently, in ClientQuotaManager, we create two sensors per clientId: a > throttleTimeSensor and a quotaSensor. Each sensor expires if it's not > actively updated for 1 hour. What can happen is that initially, the quota is > not exceeded. So, quotaSensor is being updated actively, but > throttleTimeSensor is not. At some point, throttleTimeSensor is removed by > the expiring thread. Now, we are in a situation that quotaSensor is > registered, but throttleTimeSensor is not. Later on, if the quota is > exceeded, we will hit the above NPE when trying to update throttleTimeSensor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3310) fetch requests can trigger repeated NPE when quota is enabled
[ https://issues.apache.org/jira/browse/KAFKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173204#comment-15173204 ] Aditya Auradkar commented on KAFKA-3310: [~junrao] - Let me investigate this. If this is a problem, it should be easy to fix by recording 0 on the throttle time sensor everytime. > fetch requests can trigger repeated NPE when quota is enabled > - > > Key: KAFKA-3310 > URL: https://issues.apache.org/jira/browse/KAFKA-3310 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Jun Rao > > We saw the following NPE when consumer quota is enabled. NPE is triggered on > every fetch request from the client. > java.lang.NullPointerException > at > kafka.server.ClientQuotaManager.recordAndMaybeThrottle(ClientQuotaManager.scala:122) > at > kafka.server.KafkaApis.kafka$server$KafkaApis$$sendResponseCallback$3(KafkaApis.scala:419) > at > kafka.server.KafkaApis$$anonfun$handleFetchRequest$1.apply(KafkaApis.scala:436) > at > kafka.server.KafkaApis$$anonfun$handleFetchRequest$1.apply(KafkaApis.scala:436) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:431) > at kafka.server.KafkaApis.handle(KafkaApis.scala:69) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > One possible cause of this is the logic of removing inactive sensors. > Currently, in ClientQuotaManager, we create two sensors per clientId: a > throttleTimeSensor and a quotaSensor. Each sensor expires if it's not > actively updated for 1 hour. What can happen is that initially, the quota is > not exceeded. So, quotaSensor is being updated actively, but > throttleTimeSensor is not. At some point, throttleTimeSensor is removed by > the expiring thread. Now, we are in a situation that quotaSensor is > registered, but throttleTimeSensor is not. Later on, if the quota is > exceeded, we will hit the above NPE when trying to update throttleTimeSensor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3310) fetch requests can trigger repeated NPE when quota is enabled
[ https://issues.apache.org/jira/browse/KAFKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172950#comment-15172950 ] Jun Rao commented on KAFKA-3310: [~aauradkar], do you think this is a problem? > fetch requests can trigger repeated NPE when quota is enabled > - > > Key: KAFKA-3310 > URL: https://issues.apache.org/jira/browse/KAFKA-3310 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Jun Rao > > We saw the following NPE when consumer quota is enabled. NPE is triggered on > every fetch request from the client. > java.lang.NullPointerException > at > kafka.server.ClientQuotaManager.recordAndMaybeThrottle(ClientQuotaManager.scala:122) > at > kafka.server.KafkaApis.kafka$server$KafkaApis$$sendResponseCallback$3(KafkaApis.scala:419) > at > kafka.server.KafkaApis$$anonfun$handleFetchRequest$1.apply(KafkaApis.scala:436) > at > kafka.server.KafkaApis$$anonfun$handleFetchRequest$1.apply(KafkaApis.scala:436) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:431) > at kafka.server.KafkaApis.handle(KafkaApis.scala:69) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > One possible cause of this is the logic of removing inactive sensors. > Currently, in ClientQuotaManager, we create two sensors per clientId: a > throttleTimeSensor and a quotaSensor. Each sensor expires if it's not > actively updated for 1 hour. What can happen is that initially, the quota is > not exceeded. So, quotaSensor is being updated actively, but > throttleTimeSensor is not. At some point, throttleTimeSensor is removed by > the expiring thread. Now, we are in a situation that quotaSensor is > registered, but throttleTimeSensor is not. Later on, if the quota is > exceeded, we will hit the above NPE when trying to update throttleTimeSensor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)