[ 
https://issues.apache.org/jira/browse/KAFKA-7459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16632886#comment-16632886
 ] 

ASF GitHub Bot commented on KAFKA-7459:
---------------------------------------

hzxa21 opened a new pull request #5717: KAFKA-7459: Use thread-safe Pool 
instead of non-thread-safe mutable.HashMap for requestRateInternal
URL: https://github.com/apache/kafka/pull/5717
 
 
   After KAFKA-6514, we add API version as a tag for the RequestsPerSec metric 
but in the implementation, we use the non-threadsafe mutable.HashMap to store 
the version -> metric mapping without any protection 
(https://github.com/apache/kafka/pull/4506/files#diff-d0332a0ff31df50afce3809d90505b25R357
 ). This can mess up the data structure and cause unexpected behavior 
(https://github.com/scala/bug/issues/10436 ). This PR changes 
requestRateInternal to use the thread-safe Pool instead.
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Concurrency bug in updating RequestsPerSec metric 
> --------------------------------------------------
>
>                 Key: KAFKA-7459
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7459
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 2.0.1
>            Reporter: Zhanxiang (Patrick) Huang
>            Assignee: Zhanxiang (Patrick) Huang
>            Priority: Critical
>
> After KAFKA-6514, we add API version as a tag for the RequestsPerSec metric 
> but in the implementation, we use the non-threadsafe mutable.HashMap to store 
> the version -> metric mapping without any protection 
> ([https://github.com/apache/kafka/pull/4506/files#diff-d0332a0ff31df50afce3809d90505b25R357|https://github.com/apache/kafka/pull/4506/files#diff-d0332a0ff31df50afce3809d90505b25R357).]
>  ). This can mess up the data structure and cause unexpected behavior 
> ([https://github.com/scala/bug/issues/10436|https://github.com/scala/bug/issues/10436).]
>  ). We should use ConcurrentHashMap instead.
>  
> In our case, clean shutdown a 2.0 broker takes forever because of this 
> concurrency bug leading to an infinite loop in HapMap resize.
>  Thread-1 is doing the clean shutdown but stuck on waiting for one of the 
> network thread to shutdown: 
> {noformat}
> "Thread-1" #25 prio=5 os_prio=0 tid=0x00007f05c4010000 nid=0x79f4 waiting on 
> condition [0x00007f0597cfb000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000001ffad1500> (a 
> java.util.concurrent.CountDownLatch$Sync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>         at kafka.network.AbstractServerThread.shutdown(SocketServer.scala:282)
>         at kafka.network.Processor.shutdown(SocketServer.scala:873)
>         at 
> kafka.network.Acceptor$$anonfun$shutdown$3.apply(SocketServer.scala:368)
>         at 
> kafka.network.Acceptor$$anonfun$shutdown$3.apply(SocketServer.scala:368)
>         at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>         at kafka.network.Acceptor.shutdown(SocketServer.scala:368)
>         - locked <0x00000001fdcf1000> (a kafka.network.Acceptor)
>         at 
> kafka.network.SocketServer$$anonfun$stopProcessingRequests$2.apply(SocketServer.scala:178)
>         at 
> kafka.network.SocketServer$$anonfun$stopProcessingRequests$2.apply(SocketServer.scala:178)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>         at 
> scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206)
>         at 
> kafka.network.SocketServer.stopProcessingRequests(SocketServer.scala:178)
>         - locked <0x00000001fba0e610> (a kafka.network.SocketServer)
>         at 
> kafka.server.KafkaServer$$anonfun$shutdown$3.apply$mcV$sp(KafkaServer.scala:595)
>         at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:86)
>         at kafka.server.KafkaServer.shutdown(KafkaServer.scala:595){noformat}
> The network thread is always in HashTable.resize and never finishes 
> updateRequestMetrics:
> {noformat}
> "kafka-network-thread-13673-ListenerName(SSL)-SSL-2" #201 prio=5 os_prio=0 
> tid=0x00007f441dae4000 nid=0x4cdc runnable [0x00007f2404189000]
>    java.lang.Thread.State: RUNNABLE
>         at 
> scala.collection.mutable.HashTable$class.resize(HashTable.scala:268)
>         at 
> scala.collection.mutable.HashTable$class.scala$collection$mutable$HashTable$$addEntry0(HashTable.scala:157)
>         at 
> scala.collection.mutable.HashTable$class.addEntry(HashTable.scala:148)
>         at scala.collection.mutable.HashMap.addEntry(HashMap.scala:40)
>         at scala.collection.mutable.HashMap.addEntry(HashMap.scala:93)
>         at scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:79)
>         at kafka.network.RequestMetrics.requestRate(RequestChannel.scala:438)
>         at 
> kafka.network.RequestChannel$Request$$anonfun$updateRequestMetrics$1.apply(RequestChannel.scala:161)
>         at 
> kafka.network.RequestChannel$Request$$anonfun$updateRequestMetrics$1.apply(RequestChannel.scala:159)
>         at scala.collection.immutable.List.foreach(List.scala:392)
>         at 
> kafka.network.RequestChannel$Request.updateRequestMetrics(RequestChannel.scala:159)
>         at 
> kafka.network.Processor.kafka$network$Processor$$updateRequestMetrics(SocketServer.scala:740)
>         at 
> kafka.network.Processor$$anonfun$processCompletedSends$1.apply(SocketServer.scala:720)
>         at 
> kafka.network.Processor$$anonfun$processCompletedSends$1.apply(SocketServer.scala:715)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>         at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>         at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>         at 
> kafka.network.Processor.processCompletedSends(SocketServer.scala:715)
>         at kafka.network.Processor.run(SocketServer.scala:585)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to