RivenSun created KAFKA-13663:
--------------------------------

             Summary: IllegalMonitorStateException in 
ProducerMetadata.awaitUpdate
                 Key: KAFKA-13663
                 URL: https://issues.apache.org/jira/browse/KAFKA-13663
             Project: Kafka
          Issue Type: Bug
          Components: clients
    Affects Versions: 3.0.0
            Reporter: RivenSun
            Assignee: Guozhang Wang


Since our service kafka-clients was upgraded to {*}2.8.1{*}, sometimes the 
producer will throw an exception when sending a message, and once this 
exception occurs, it will continue and will not recover by itself.
{code:java}
java.lang.IllegalMonitorStateException: current thread is not owner
        at java.base/java.lang.Object.wait(Native Method)
        at 
org.apache.kafka.common.utils.SystemTime.waitObject(SystemTime.java:55)
        at 
org.apache.kafka.clients.producer.internals.ProducerMetadata.awaitUpdate(ProducerMetadata.java:119)
        at 
org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata(KafkaProducer.java:1047)
        at 
org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:906)
        at 
org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:885)
 {code}
 

 

The version of kafka-clients used before is {*}2.2.2{*}, and this exception has 
never occurred
Comparing the changes in the kafkaClient source code, I found that there is a 
change, it is very likely that this is a *regression bug.*
In version 2.2.2, the `Metadata#awaitUpdate` method is called in 
`KafkaProducer#waitOnMetadata`
But since version {*}2.3.0{*}, `ProducerMetadata#awaitUpdate` is called in 
`KafkaProducer#waitOnMetadata`

The most crucial point is:
`Object#wait(long timeoutMillis)` method,
In the `Metadata#awaitUpdate` method, thread safety is ensured by the 
`synchronized` keyword on the `Metadata#awaitUpdate` method;
In the `ProducerMetadata#awaitUpdate` method, in addition to the `synchronized` 
on the `ProducerMetadata#awaitUpdate` method, {color:#FF0000}*there is a second 
`synchronized` in the `SystemTime#waitObject` method;*{color}

Although we all know that `synchronized` is reentrant, it is not clear whether 
the implementation of `ProducerMetadata#awaitUpdate` is inconsistent in 
different versions of JDK, resulting in this exception; what is certain is that 
this exception has never occurred before the 2.2.2 version of KafkaProducer

 
h2. Suggestion:


We can keep only one `synchronized` and remove the `synchronized` on the 
`ProducerMetadata#awaitUpdate` method



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to