[ https://issues.apache.org/jira/browse/KAFKA-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535907#comment-16535907 ]
Dong Lin commented on KAFKA-7136: --------------------------------- [~guozhang] [~ijuma] [~rsivaram] Not sure if it is required to re-run system tests for every new RC. If so, could you please help re-run system tests using [https://github.com/apache/kafka/tree/1.1.1-rc3] ? Thanks! > PushHttpMetricsReporter may deadlock when processing metrics changes > -------------------------------------------------------------------- > > Key: KAFKA-7136 > URL: https://issues.apache.org/jira/browse/KAFKA-7136 > Project: Kafka > Issue Type: Bug > Components: metrics > Affects Versions: 1.1.0, 2.0.0 > Reporter: Rajini Sivaram > Assignee: Rajini Sivaram > Priority: Blocker > Fix For: 2.0.0, 1.1.1 > > > We noticed a deadlock in {{PushHttpMetricsReporter}}. Locking for metrics was > changed under KAFKA-6765 to avoid {{NullPointerException}} in metrics > reporters due to concurrent read and updates. {{PushHttpMetricsReporter}} > requires a lock to process metrics registration that is invoked while holding > the sensor lock. It also reads metrics attempting to acquire sensor lock > while holding its lock (inverse order). This resulted in the deadlock below. > {quote}Found one Java-level deadlock: > Java stack information for the threads listed above: > =================================================== > "StreamThread-7": > at > org.apache.kafka.tools.PushHttpMetricsReporter.metricChange(PushHttpMetricsReporter.java:144) > - waiting to lock <0x0000000655a54310> (a java.lang.Object) > at org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:563) > - locked <0x0000000655a44a28> (a org.apache.kafka.common.metrics.Metrics) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:236) > - locked <0x000000065629c170> (a org.apache.kafka.common.metrics.Sensor) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:217) > at > org.apache.kafka.common.network.Selector$SelectorMetrics.maybeRegisterConnectionMetrics(Selector.java:1016) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:462) > at org.apache.kafka.common.network.Selector.poll(Selector.java:425) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:510) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:271) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218) > at > org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:274) > at > org.apache.kafka.clients.consumer.internals.Fetcher.getAllTopicMetadata(Fetcher.java:254) > at > org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1820) > at > org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1798) > at > org.apache.kafka.streams.processor.internals.StoreChangelogReader.refreshChangelogInfo(StoreChangelogReader.java:224) > at > org.apache.kafka.streams.processor.internals.StoreChangelogReader.initialize(StoreChangelogReader.java:121) > at > org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:74) > at > org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:317) > at > org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:824) > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767) > at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736) > "pool-17-thread-1": > at > org.apache.kafka.common.metrics.KafkaMetric.measurableValue(KafkaMetric.java:82) > - waiting to lock <0x000000065629c170> (a > org.apache.kafka.common.metrics.Sensor) > at org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:58) > at > org.apache.kafka.tools.PushHttpMetricsReporter$HttpReporter.run(PushHttpMetricsReporter.java:177) > - locked <0x0000000655a54310> (a java.lang.Object) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Found 1 deadlock. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)