[ 
https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973313#comment-13973313
 ] 

Andrew Olson commented on KAFKA-493:
------------------------------------

I built and deployed the latest trunk code to test the benefit of KAFKA-1356. 
CPU usage was reduced some, but unfortunately still relatively high, around 
35-40% of a core. The sun.nio.ch.SocketChannelImpl.write(ByteBuffer) and 
sun.nio.ch.SocketChannelImpl.read(ByteBuffer) methods were the top two hot 
spots, and profiling didn't reveal any obvious opportunities for optimization. 
Looking at the ReplicaFetcherThread metrics, the fetch response message size is 
20-25kb.

Changing replica.fetch.wait.max.ms from the default value of 500 to 10000 
(along with setting replica.lag.time.max.ms=30000) cut CPU to about 5-10% of a 
core, which seems reasonable. Predictably, there were clear spikes on the CPU 
graph up to 30-50% every 10 seconds that appear to account for the vast 
majority of overall CPU use. What are the disadvantages of significantly 
decreasing the replica fetch frequency like this? Would it affect synchronous 
producer performance? With a large number of partitions being replicated, it 
seems like a configuration change that might be beneficial.

> High CPU usage on inactive server
> ---------------------------------
>
>                 Key: KAFKA-493
>                 URL: https://issues.apache.org/jira/browse/KAFKA-493
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.0
>            Reporter: Jay Kreps
>             Fix For: 0.8.2
>
>         Attachments: Kafka-sampling1.zip, Kafka-sampling2.zip, 
> Kafka-sampling3.zip, Kafka-trace1.zip, Kafka-trace2.zip, Kafka-trace3.zip, 
> backtraces.txt, stacktrace.txt
>
>
> > I've been playing with the 0.8 branch of Kafka and noticed that idle CPU 
> > usage is fairly high (13% of a 
> > core). Is that to be expected? I did look at the stack, but didn't see 
> > anything obvious. A background 
> > task?
> > I wanted to mention how I am getting into this state. I've set up two 
> > machines with the latest 0.8 
> > code base and am using a replication factor of 2. On starting the brokers 
> > there is no idle CPU activity. 
> > Then I run a test that essential does 10k publish operations followed by 
> > immediate consume operations 
> > (I was measuring latency). Once this has run the kafka nodes seem to 
> > consistently be consuming CPU 
> > essentially forever.
> hprof results:
> THREAD START (obj=500003ae, id = 200004, name="RMI TCP Accept-0", 
> group="system")
> THREAD START (obj=500003ae, id = 200005, name="RMI TCP Accept-9999", 
> group="system")
> THREAD START (obj=500003ae, id = 200006, name="RMI TCP Accept-0", 
> group="system")
> THREAD START (obj=500003ae, id = 200001, name="main", group="main")
> THREAD START (obj=500003ae, id = 200007, name="Thread-2", group="main")
> THREAD START (obj=500003ae, id = 200008, name="Thread-3", group="main")
> THREAD START (obj=500003ae, id = 200009, name="kafka-processor-9092-0", 
> group="main")
> THREAD START (obj=500003ae, id = 200010, name="kafka-processor-9092-1", 
> group="main")
> THREAD START (obj=500003ae, id = 200011, name="kafka-acceptor", group="main")
> THREAD START (obj=5000074b, id = 200012, 
> name="ZkClient-EventThread-20-localhost:2181", group="main")
> THREAD START (obj=5000076e, id = 200014, name="main-SendThread()", 
> group="main")
> THREAD START (obj=5000076d, id = 200013, name="main-EventThread", 
> group="main")
> THREAD START (obj=500003ae, id = 200015, name="metrics-meter-tick-thread-1", 
> group="main")
> THREAD START (obj=500003ae, id = 200016, name="metrics-meter-tick-thread-2", 
> group="main")
> THREAD START (obj=500003ae, id = 200017, name="request-expiration-task", 
> group="main")
> THREAD START (obj=500003ae, id = 200018, name="request-expiration-task", 
> group="main")
> THREAD START (obj=500003ae, id = 200019, name="kafka-request-handler-0", 
> group="main")
> THREAD START (obj=500003ae, id = 200020, name="kafka-request-handler-1", 
> group="main")
> THREAD START (obj=500003ae, id = 200021, name="Thread-6", group="main")
> THREAD START (obj=500003ae, id = 200022, name="Thread-7", group="main")
> THREAD START (obj=50000899, id = 200023, name="ReplicaFetcherThread-0-2 on 
> broker 1, ", group="main")
> THREAD START (obj=50000899, id = 200024, name="ReplicaFetcherThread-0-3 on 
> broker 1, ", group="main")
> THREAD START (obj=50000899, id = 200025, name="ReplicaFetcherThread-0-0 on 
> broker 1, ", group="main")
> THREAD START (obj=50000899, id = 200026, name="ReplicaFetcherThread-0-1 on 
> broker 1, ", group="main")
> THREAD START (obj=500003ae, id = 200028, name="SIGINT handler", 
> group="system")
> THREAD START (obj=500003ae, id = 200029, name="Thread-5", group="main")
> THREAD START (obj=5000074b, id = 200030, name="Thread-1", group="main")
> THREAD START (obj=5000074b, id = 200031, name="Thread-0", group="main")
> THREAD END (id = 200031)
> THREAD END (id = 200029)
> THREAD END (id = 200020)
> THREAD END (id = 200019)
> THREAD END (id = 200008)
> THREAD END (id = 200021)
> THREAD END (id = 200007)
> THREAD END (id = 200022)
> THREAD END (id = 200018)
> THREAD END (id = 200017)
> THREAD END (id = 200012)
> THREAD END (id = 200013)
> THREAD END (id = 200014)
> THREAD END (id = 200025)
> THREAD END (id = 200023)
> THREAD END (id = 200026)
> THREAD END (id = 200024)
> THREAD END (id = 200011)
> THREAD END (id = 200009)
> THREAD END (id = 200010)
> THREAD END (id = 200030)
> THREAD END (id = 200028)
> TRACE 301281:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown 
> line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         
> sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:218)
>         sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
>         
> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
>         kafka.utils.Utils$.read(Utils.scala:453)
>         
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>         kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>         
> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
>         kafka.network.BlockingChannel.receive(BlockingChannel.scala:92)
>         kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:67)
>         kafka.consumer.SimpleConsumer.sendRequest(SimpleConsumer.scala:65)
>         kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:95)
>         
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:71)
>         kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:50)
> TRACE 300427:
>         java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown 
> line)
>         
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>         java.net.ServerSocket.implAccept(ServerSocket.java:522)
>         java.net.ServerSocket.accept(ServerSocket.java:490)
>         
> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:387)
>         
> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:359)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 300963:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown 
> line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         kafka.network.Processor.run(SocketServer.scala:202)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 300558:
>         java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown 
> line)
>         
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>         java.net.ServerSocket.implAccept(ServerSocket.java:522)
>         java.net.ServerSocket.accept(ServerSocket.java:490)
>         
> sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:52)
>         
> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:387)
>         
> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:359)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 300978:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown 
> line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         kafka.network.Acceptor.run(SocketServer.scala:142)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 301024:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown 
> line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1134)
> TRACE 301830:
>         sun.misc.Unsafe.park(Unsafe.java:Unknown line)
>         java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>         
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
>         java.util.concurrent.DelayQueue.take(DelayQueue.java:220)
>         
> kafka.server.RequestPurgatory$ExpiredRequestReaper.pollExpired(RequestPurgatory.scala:282)
>         
> kafka.server.RequestPurgatory$ExpiredRequestReaper.run(RequestPurgatory.scala:235)
>         java.lang.Thread.run(Thread.java:722)
> CPU SAMPLES BEGIN (total = 1878003) Thu Aug 30 22:26:05 2012
> rank   self  accum   count trace method
>    1 36.22% 36.22%  680139 301281 sun.nio.ch.EPollArrayWrapper.epollWait
>    2 18.15% 54.36%  340770 300427 java.net.PlainSocketImpl.socketAccept
>    3 18.13% 72.49%  340458 300963 sun.nio.ch.EPollArrayWrapper.epollWait
>    4  9.07% 81.56%  170382 300558 java.net.PlainSocketImpl.socketAccept
>    5  9.07% 90.63%  170340 300978 sun.nio.ch.EPollArrayWrapper.epollWait
>    6  9.06% 99.69%  170139 301024 sun.nio.ch.EPollArrayWrapper.epollWait
>    7  0.18% 99.88%    3429 301830 sun.misc.Unsafe.park
> CPU SAMPLES END



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to