[ 
https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973365#comment-13973365
 ] 

Jay Kreps commented on KAFKA-493:
---------------------------------

Ideally there should be no downside of longer expiration times. However there 
are two bugs that lead to us setting that lower timeout:
1. I think there is a race condition that effects messages with acks=-1 (full 
commit) which can lead to waiting until the full fetch timeout to acknowledge 
the producer in rare cases. Not sure if this issue still exists.
2. The second issue was that when the fetcher disconnects we don't proactively 
purge any asynchronous requests it has. So very long timeouts combined with a 
consumer with high error rate can accumulate memory.

> High CPU usage on inactive server
> ---------------------------------
>
>                 Key: KAFKA-493
>                 URL: https://issues.apache.org/jira/browse/KAFKA-493
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.0
>            Reporter: Jay Kreps
>             Fix For: 0.8.2
>
>         Attachments: Kafka-sampling1.zip, Kafka-sampling2.zip, 
> Kafka-sampling3.zip, Kafka-trace1.zip, Kafka-trace2.zip, Kafka-trace3.zip, 
> backtraces.txt, stacktrace.txt
>
>
> > I've been playing with the 0.8 branch of Kafka and noticed that idle CPU 
> > usage is fairly high (13% of a 
> > core). Is that to be expected? I did look at the stack, but didn't see 
> > anything obvious. A background 
> > task?
> > I wanted to mention how I am getting into this state. I've set up two 
> > machines with the latest 0.8 
> > code base and am using a replication factor of 2. On starting the brokers 
> > there is no idle CPU activity. 
> > Then I run a test that essential does 10k publish operations followed by 
> > immediate consume operations 
> > (I was measuring latency). Once this has run the kafka nodes seem to 
> > consistently be consuming CPU 
> > essentially forever.
> hprof results:
> THREAD START (obj=500003ae, id = 200004, name="RMI TCP Accept-0", 
> group="system")
> THREAD START (obj=500003ae, id = 200005, name="RMI TCP Accept-9999", 
> group="system")
> THREAD START (obj=500003ae, id = 200006, name="RMI TCP Accept-0", 
> group="system")
> THREAD START (obj=500003ae, id = 200001, name="main", group="main")
> THREAD START (obj=500003ae, id = 200007, name="Thread-2", group="main")
> THREAD START (obj=500003ae, id = 200008, name="Thread-3", group="main")
> THREAD START (obj=500003ae, id = 200009, name="kafka-processor-9092-0", 
> group="main")
> THREAD START (obj=500003ae, id = 200010, name="kafka-processor-9092-1", 
> group="main")
> THREAD START (obj=500003ae, id = 200011, name="kafka-acceptor", group="main")
> THREAD START (obj=5000074b, id = 200012, 
> name="ZkClient-EventThread-20-localhost:2181", group="main")
> THREAD START (obj=5000076e, id = 200014, name="main-SendThread()", 
> group="main")
> THREAD START (obj=5000076d, id = 200013, name="main-EventThread", 
> group="main")
> THREAD START (obj=500003ae, id = 200015, name="metrics-meter-tick-thread-1", 
> group="main")
> THREAD START (obj=500003ae, id = 200016, name="metrics-meter-tick-thread-2", 
> group="main")
> THREAD START (obj=500003ae, id = 200017, name="request-expiration-task", 
> group="main")
> THREAD START (obj=500003ae, id = 200018, name="request-expiration-task", 
> group="main")
> THREAD START (obj=500003ae, id = 200019, name="kafka-request-handler-0", 
> group="main")
> THREAD START (obj=500003ae, id = 200020, name="kafka-request-handler-1", 
> group="main")
> THREAD START (obj=500003ae, id = 200021, name="Thread-6", group="main")
> THREAD START (obj=500003ae, id = 200022, name="Thread-7", group="main")
> THREAD START (obj=50000899, id = 200023, name="ReplicaFetcherThread-0-2 on 
> broker 1, ", group="main")
> THREAD START (obj=50000899, id = 200024, name="ReplicaFetcherThread-0-3 on 
> broker 1, ", group="main")
> THREAD START (obj=50000899, id = 200025, name="ReplicaFetcherThread-0-0 on 
> broker 1, ", group="main")
> THREAD START (obj=50000899, id = 200026, name="ReplicaFetcherThread-0-1 on 
> broker 1, ", group="main")
> THREAD START (obj=500003ae, id = 200028, name="SIGINT handler", 
> group="system")
> THREAD START (obj=500003ae, id = 200029, name="Thread-5", group="main")
> THREAD START (obj=5000074b, id = 200030, name="Thread-1", group="main")
> THREAD START (obj=5000074b, id = 200031, name="Thread-0", group="main")
> THREAD END (id = 200031)
> THREAD END (id = 200029)
> THREAD END (id = 200020)
> THREAD END (id = 200019)
> THREAD END (id = 200008)
> THREAD END (id = 200021)
> THREAD END (id = 200007)
> THREAD END (id = 200022)
> THREAD END (id = 200018)
> THREAD END (id = 200017)
> THREAD END (id = 200012)
> THREAD END (id = 200013)
> THREAD END (id = 200014)
> THREAD END (id = 200025)
> THREAD END (id = 200023)
> THREAD END (id = 200026)
> THREAD END (id = 200024)
> THREAD END (id = 200011)
> THREAD END (id = 200009)
> THREAD END (id = 200010)
> THREAD END (id = 200030)
> THREAD END (id = 200028)
> TRACE 301281:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown 
> line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         
> sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:218)
>         sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
>         
> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
>         kafka.utils.Utils$.read(Utils.scala:453)
>         
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
>         kafka.network.Receive$class.readCompletely(Transmission.scala:56)
>         
> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
>         kafka.network.BlockingChannel.receive(BlockingChannel.scala:92)
>         kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:67)
>         kafka.consumer.SimpleConsumer.sendRequest(SimpleConsumer.scala:65)
>         kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:95)
>         
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:71)
>         kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:50)
> TRACE 300427:
>         java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown 
> line)
>         
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>         java.net.ServerSocket.implAccept(ServerSocket.java:522)
>         java.net.ServerSocket.accept(ServerSocket.java:490)
>         
> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:387)
>         
> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:359)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 300963:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown 
> line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         kafka.network.Processor.run(SocketServer.scala:202)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 300558:
>         java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown 
> line)
>         
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>         java.net.ServerSocket.implAccept(ServerSocket.java:522)
>         java.net.ServerSocket.accept(ServerSocket.java:490)
>         
> sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:52)
>         
> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:387)
>         
> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:359)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 300978:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown 
> line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         kafka.network.Acceptor.run(SocketServer.scala:142)
>         java.lang.Thread.run(Thread.java:722)
> TRACE 301024:
>         sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown 
> line)
>         sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
>         sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
>         sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>         org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1134)
> TRACE 301830:
>         sun.misc.Unsafe.park(Unsafe.java:Unknown line)
>         java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>         
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
>         java.util.concurrent.DelayQueue.take(DelayQueue.java:220)
>         
> kafka.server.RequestPurgatory$ExpiredRequestReaper.pollExpired(RequestPurgatory.scala:282)
>         
> kafka.server.RequestPurgatory$ExpiredRequestReaper.run(RequestPurgatory.scala:235)
>         java.lang.Thread.run(Thread.java:722)
> CPU SAMPLES BEGIN (total = 1878003) Thu Aug 30 22:26:05 2012
> rank   self  accum   count trace method
>    1 36.22% 36.22%  680139 301281 sun.nio.ch.EPollArrayWrapper.epollWait
>    2 18.15% 54.36%  340770 300427 java.net.PlainSocketImpl.socketAccept
>    3 18.13% 72.49%  340458 300963 sun.nio.ch.EPollArrayWrapper.epollWait
>    4  9.07% 81.56%  170382 300558 java.net.PlainSocketImpl.socketAccept
>    5  9.07% 90.63%  170340 300978 sun.nio.ch.EPollArrayWrapper.epollWait
>    6  9.06% 99.69%  170139 301024 sun.nio.ch.EPollArrayWrapper.epollWait
>    7  0.18% 99.88%    3429 301830 sun.misc.Unsafe.park
> CPU SAMPLES END



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to