[jira] [Commented] (KAFKA-9963) High CPU during replication of replaced (empty) broker

2020-05-12 Thread Ishwor Gurung (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105606#comment-17105606
 ] 

Ishwor Gurung commented on KAFKA-9963:
--

[~blodsbror] Just out of curiosity, what was the issue?

> High CPU during replication of replaced (empty) broker
> --
>
> Key: KAFKA-9963
> URL: https://issues.apache.org/jira/browse/KAFKA-9963
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.4.1
>Reporter: Evan Williams
>Priority: Major
>
> When replacing a broker, with an empty data dir, and the same broker ID - we 
> are seeing very high CPU usage during replication, generally up to 100% for 
> some time, on a 4 VCPU (EC2 R5) host.  This is a 6 host cluster, with approx 
> 1000 topics and 3000 partitions.
>  
> There is of course traffic being served as well, as it catches up and becomes 
> leader of partitions, however due to the high replication CPU usage - 
> client's start to have connection issue. Normal traffic on this host is is 
> around 40% CPU, when it's completely replicated. I'm not sure if this is 
> 'normal', and we just need to throw more resources at the hosts. Or if 
> replication should not really use this much CPU.
> replica.fetchers=2 (not high). And java version is: OpenJDK Runtime 
> Environment Corretto-8.252.09.1 (build 1.8.0_252-b09) (AWS's own build).
> CPU profiling (2 minutes ) during this 'replace' scenario, shows this:
>  
> {code:java}
>  5473000   19.43% 5473  
> java.util.TreeMap$PrivateEntryIterator.nextEntry
>  4975000   17.66% 4975  
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext
>  4417000   15.68% 4417  java.util.TreeMap.successor
>  17730006.29% 1773  java.util.TreeMap$ValueIterator.next
>  1706.03% 1700  java.util.TreeMap$PrivateEntryIterator.hasNext
>   6010002.13%  601  
> scala.collection.convert.Wrappers$JIteratorWrapper.next
>   5160001.83%  516  writev
> --- 3885000 ns (13.79%), 3885 samples
>   [ 0] java.util.TreeMap$PrivateEntryIterator.nextEntry
>   [ 1] java.util.TreeMap$ValueIterator.next
>   [ 2] scala.collection.convert.Wrappers$JIteratorWrapper.next
>   [ 3] scala.collection.Iterator.find
>   [ 4] scala.collection.Iterator.find$
>   [ 5] scala.collection.AbstractIterator.find
>   [ 6] scala.collection.IterableLike.find
>   [ 7] scala.collection.IterableLike.find$
>   [ 8] scala.collection.AbstractIterable.find
>   [ 9] kafka.log.ProducerStateManager.lastStableOffset
>   [10] kafka.log.Log.$anonfun$append$12
>   [11] kafka.log.Log.$anonfun$append$2
>   [12] kafka.log.Log.append
>   [13] kafka.log.Log.appendAsFollower
>   [14] 
> kafka.cluster.Partition.$anonfun$doAppendRecordsToFollowerOrFutureReplica$1
>   [15] kafka.cluster.Partition.doAppendRecordsToFollowerOrFutureReplica
>   [16] kafka.cluster.Partition.appendRecordsToFollowerOrFutureReplica
>   [17] kafka.server.ReplicaFetcherThread.processPartitionData
>   [18] kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7
>   [19] kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6
>   [20] 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6$adapted
>   [21] kafka.server.AbstractFetcherThread$$Lambda$552.191789933.apply
>   [22] scala.collection.mutable.ResizableArray.foreach
>   [23] scala.collection.mutable.ResizableArray.foreach$
>   [24] scala.collection.mutable.ArrayBuffer.foreach
>   [25] kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$5
>   [26] kafka.server.AbstractFetcherThread.processFetchRequest
>   [27] kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3
>   [28] kafka.server.AbstractFetcherThread.maybeFetch
>   [29] kafka.server.AbstractFetcherThread.doWork
>   [30] kafka.utils.ShutdownableThread.run
>
> {code}
> {code:java}
>   --- 3632000 ns (12.89%), 3632 samples
>   [ 0] scala.collection.convert.Wrappers$JIteratorWrapper.hasNext
>   [ 1] scala.collection.Iterator.find
>   [ 2] scala.collection.Iterator.find$
>   [ 3] scala.collection.AbstractIterator.find
>   [ 4] scala.collection.IterableLike.find
>   [ 5] scala.collection.IterableLike.find$
>   [ 6] scala.collection.AbstractIterable.find
>   [ 7] kafka.log.ProducerStateManager.lastStableOffset
>   [ 8] kafka.log.Log.$anonfun$append$12
>   [ 9] kafka.log.Log.$anonfun$append$2
>   [10] kafka.log.Log.append
>   [11] kafka.log.Log.appendAsFollower
>   [12] 
> kafka.cluster.Partition.$anonfun$doAppendRecordsToFollowerOrFutureReplica$1
>   [13] kafka.cluster.Partition.doAppendRecordsToFollowerOrFutureReplica
>   [14] kafka.cluster.Partition.appendRecordsToFollowerOrFutureReplica
>   [15] kafka.server.ReplicaFetcherThread.processPartitionData
>   [16] kafka.server.AbstractFetcherThread.$anonfun$processFetchReque

[jira] [Commented] (KAFKA-9963) High CPU during replication of replaced (empty) broker

2020-05-12 Thread Evan Williams (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105612#comment-17105612
 ] 

Evan Williams commented on KAFKA-9963:
--

[~ishworgurung] Essentially on a 4vcpu (EC2 R5) instance, we have always had 
num.network.threads and num.io.threads left as default. (3 and 8). However I 
just changed them both to 3 (ie one less than we have cores), and we managed to 
never go past 80% CPU. And it seems that the avg idle network/io threads werent 
depleted either. So, it's perhaps kind of an obvious solution (or workaround). 

> High CPU during replication of replaced (empty) broker
> --
>
> Key: KAFKA-9963
> URL: https://issues.apache.org/jira/browse/KAFKA-9963
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.4.1
>Reporter: Evan Williams
>Priority: Major
>
> When replacing a broker, with an empty data dir, and the same broker ID - we 
> are seeing very high CPU usage during replication, generally up to 100% for 
> some time, on a 4 VCPU (EC2 R5) host.  This is a 6 host cluster, with approx 
> 1000 topics and 3000 partitions.
>  
> There is of course traffic being served as well, as it catches up and becomes 
> leader of partitions, however due to the high replication CPU usage - 
> client's start to have connection issue. Normal traffic on this host is is 
> around 40% CPU, when it's completely replicated. I'm not sure if this is 
> 'normal', and we just need to throw more resources at the hosts. Or if 
> replication should not really use this much CPU.
> replica.fetchers=2 (not high). And java version is: OpenJDK Runtime 
> Environment Corretto-8.252.09.1 (build 1.8.0_252-b09) (AWS's own build).
> CPU profiling (2 minutes ) during this 'replace' scenario, shows this:
>  
> {code:java}
>  5473000   19.43% 5473  
> java.util.TreeMap$PrivateEntryIterator.nextEntry
>  4975000   17.66% 4975  
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext
>  4417000   15.68% 4417  java.util.TreeMap.successor
>  17730006.29% 1773  java.util.TreeMap$ValueIterator.next
>  1706.03% 1700  java.util.TreeMap$PrivateEntryIterator.hasNext
>   6010002.13%  601  
> scala.collection.convert.Wrappers$JIteratorWrapper.next
>   5160001.83%  516  writev
> --- 3885000 ns (13.79%), 3885 samples
>   [ 0] java.util.TreeMap$PrivateEntryIterator.nextEntry
>   [ 1] java.util.TreeMap$ValueIterator.next
>   [ 2] scala.collection.convert.Wrappers$JIteratorWrapper.next
>   [ 3] scala.collection.Iterator.find
>   [ 4] scala.collection.Iterator.find$
>   [ 5] scala.collection.AbstractIterator.find
>   [ 6] scala.collection.IterableLike.find
>   [ 7] scala.collection.IterableLike.find$
>   [ 8] scala.collection.AbstractIterable.find
>   [ 9] kafka.log.ProducerStateManager.lastStableOffset
>   [10] kafka.log.Log.$anonfun$append$12
>   [11] kafka.log.Log.$anonfun$append$2
>   [12] kafka.log.Log.append
>   [13] kafka.log.Log.appendAsFollower
>   [14] 
> kafka.cluster.Partition.$anonfun$doAppendRecordsToFollowerOrFutureReplica$1
>   [15] kafka.cluster.Partition.doAppendRecordsToFollowerOrFutureReplica
>   [16] kafka.cluster.Partition.appendRecordsToFollowerOrFutureReplica
>   [17] kafka.server.ReplicaFetcherThread.processPartitionData
>   [18] kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7
>   [19] kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6
>   [20] 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6$adapted
>   [21] kafka.server.AbstractFetcherThread$$Lambda$552.191789933.apply
>   [22] scala.collection.mutable.ResizableArray.foreach
>   [23] scala.collection.mutable.ResizableArray.foreach$
>   [24] scala.collection.mutable.ArrayBuffer.foreach
>   [25] kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$5
>   [26] kafka.server.AbstractFetcherThread.processFetchRequest
>   [27] kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3
>   [28] kafka.server.AbstractFetcherThread.maybeFetch
>   [29] kafka.server.AbstractFetcherThread.doWork
>   [30] kafka.utils.ShutdownableThread.run
>
> {code}
> {code:java}
>   --- 3632000 ns (12.89%), 3632 samples
>   [ 0] scala.collection.convert.Wrappers$JIteratorWrapper.hasNext
>   [ 1] scala.collection.Iterator.find
>   [ 2] scala.collection.Iterator.find$
>   [ 3] scala.collection.AbstractIterator.find
>   [ 4] scala.collection.IterableLike.find
>   [ 5] scala.collection.IterableLike.find$
>   [ 6] scala.collection.AbstractIterable.find
>   [ 7] kafka.log.ProducerStateManager.lastStableOffset
>   [ 8] kafka.log.Log.$anonfun$append$12
>   [ 9] kafka.log.Log.$anonfun$append$2
>   [10] kafka.log.Log.append
>   [11] kafka.log.Log.appendAsFollower
>   [12] 
> kafka.cluster.