[ 
https://issues.apache.org/jira/browse/KAFKA-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram updated KAFKA-7136:
----------------------------------
    Description: 
We noticed a deadlock in {{PushHttpMetricsReporter}}. Locking for metrics was 
changed under KAFKA-6765 to avoid {{NullPointerException}} in metrics reporters 
due to concurrent read and updates. {{PushHttpMetricsReporter}} requires a lock 
to process metrics registration that is invoked while holding the sensor lock. 
It also reads metrics attempting to acquire sensor lock while holding its lock 
(inverse order). This resulted in the deadlock below.
{quote}Found one Java-level deadlock:
 Java stack information for the threads listed above:
 ===================================================
 "StreamThread-7":
 at 
org.apache.kafka.tools.PushHttpMetricsReporter.metricChange(PushHttpMetricsReporter.java:144)
 - waiting to lock <0x0000000655a54310> (a java.lang.Object)
 at org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:563)
 - locked <0x0000000655a44a28> (a org.apache.kafka.common.metrics.Metrics)
 at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:236)
 - locked <0x000000065629c170> (a org.apache.kafka.common.metrics.Sensor)
 at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:217)
 at 
org.apache.kafka.common.network.Selector$SelectorMetrics.maybeRegisterConnectionMetrics(Selector.java:1016)
 at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:462)
 at org.apache.kafka.common.network.Selector.poll(Selector.java:425)
 at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:510)
 at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:271)
 at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
 at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
 at 
org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:274)
 at 
org.apache.kafka.clients.consumer.internals.Fetcher.getAllTopicMetadata(Fetcher.java:254)
 at 
org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1820)
 at 
org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1798)
 at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.refreshChangelogInfo(StoreChangelogReader.java:224)
 at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.initialize(StoreChangelogReader.java:121)
 at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:74)
 at 
org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:317)
 at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:824)
 at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
 at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)


 "pool-17-thread-1":
 at 
org.apache.kafka.common.metrics.KafkaMetric.measurableValue(KafkaMetric.java:82)
 - waiting to lock <0x000000065629c170> (a 
org.apache.kafka.common.metrics.Sensor)
 at org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:58)
 at 
org.apache.kafka.tools.PushHttpMetricsReporter$HttpReporter.run(PushHttpMetricsReporter.java:177)
 - locked <0x0000000655a54310> (a java.lang.Object)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)

Found 1 deadlock.
{quote}

  was:
We noticed a deadlock in {{PushHttpMetricsReporter}}. Locking for metrics was 
changed under KAFKA-6765 to avoid {{NullPointerException}} in metrics reporters 
due to concurrent read and updates. {{PushHttpMetricsReporter}} requires a lock 
to process metrics registration that is invoked while holding the sensor lock. 
It also reads metrics attempting to acquire sensor lock while holding its lock 
(inverse order). This resulted in the deadlock below. 

{quote}
Found one Java-level deadlock:
Java stack information for the threads listed above:
===================================================
"StreamThread-7":
        at 
org.apache.kafka.tools.PushHttpMetricsReporter.metricChange(PushHttpMetricsReporter.java:144)
        - waiting to lock <0x0000000655a54310> (a java.lang.Object)
        at 
org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:563)
        - locked <0x0000000655a44a28> (a 
org.apache.kafka.common.metrics.Metrics)
        at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:236)
        - locked <0x000000065629c170> (a org.apache.kafka.common.metrics.Sensor)
        at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:217)
        at 
org.apache.kafka.common.network.Selector$SelectorMetrics.maybeRegisterConnectionMetrics(Selector.java:1016)
        at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:462)
        at org.apache.kafka.common.network.Selector.poll(Selector.java:425)
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:510)
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:271)
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
        at 
org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:274)
        at 
org.apache.kafka.clients.consumer.internals.Fetcher.getAllTopicMetadata(Fetcher.java:254)
        at 
org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1820)
        at 
org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1798)
        at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.refreshChangelogInfo(StoreChangelogReader.java:224)
        at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.initialize(StoreChangelogReader.java:121)
        at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:74)
        at 
org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:317)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:824)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
"pool-17-thread-1":
        at 
org.apache.kafka.common.metrics.KafkaMetric.measurableValue(KafkaMetric.java:82)
        - waiting to lock <0x000000065629c170> (a 
org.apache.kafka.common.metrics.Sensor)
        at 
org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:58)
        at 
org.apache.kafka.tools.PushHttpMetricsReporter$HttpReporter.run(PushHttpMetricsReporter.java:177)
        - locked <0x0000000655a54310> (a java.lang.Object)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Found 1 deadlock.
{quote}


> PushHttpMetricsReporter may deadlock when processing metrics changes
> --------------------------------------------------------------------
>
>                 Key: KAFKA-7136
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7136
>             Project: Kafka
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 1.1.0, 2.0.0
>            Reporter: Rajini Sivaram
>            Assignee: Rajini Sivaram
>            Priority: Blocker
>             Fix For: 2.0.0
>
>
> We noticed a deadlock in {{PushHttpMetricsReporter}}. Locking for metrics was 
> changed under KAFKA-6765 to avoid {{NullPointerException}} in metrics 
> reporters due to concurrent read and updates. {{PushHttpMetricsReporter}} 
> requires a lock to process metrics registration that is invoked while holding 
> the sensor lock. It also reads metrics attempting to acquire sensor lock 
> while holding its lock (inverse order). This resulted in the deadlock below.
> {quote}Found one Java-level deadlock:
>  Java stack information for the threads listed above:
>  ===================================================
>  "StreamThread-7":
>  at 
> org.apache.kafka.tools.PushHttpMetricsReporter.metricChange(PushHttpMetricsReporter.java:144)
>  - waiting to lock <0x0000000655a54310> (a java.lang.Object)
>  at org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:563)
>  - locked <0x0000000655a44a28> (a org.apache.kafka.common.metrics.Metrics)
>  at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:236)
>  - locked <0x000000065629c170> (a org.apache.kafka.common.metrics.Sensor)
>  at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:217)
>  at 
> org.apache.kafka.common.network.Selector$SelectorMetrics.maybeRegisterConnectionMetrics(Selector.java:1016)
>  at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:462)
>  at org.apache.kafka.common.network.Selector.poll(Selector.java:425)
>  at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:510)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:271)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
>  at 
> org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:274)
>  at 
> org.apache.kafka.clients.consumer.internals.Fetcher.getAllTopicMetadata(Fetcher.java:254)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1820)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1798)
>  at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.refreshChangelogInfo(StoreChangelogReader.java:224)
>  at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.initialize(StoreChangelogReader.java:121)
>  at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:74)
>  at 
> org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:317)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:824)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
>  "pool-17-thread-1":
>  at 
> org.apache.kafka.common.metrics.KafkaMetric.measurableValue(KafkaMetric.java:82)
>  - waiting to lock <0x000000065629c170> (a 
> org.apache.kafka.common.metrics.Sensor)
>  at org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:58)
>  at 
> org.apache.kafka.tools.PushHttpMetricsReporter$HttpReporter.run(PushHttpMetricsReporter.java:177)
>  - locked <0x0000000655a54310> (a java.lang.Object)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Found 1 deadlock.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to