[
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234297#comment-14234297
]
Bhavesh Mistry commented on KAFKA-1642:
---------------------------------------
[~ewencp],
1) I will posted toward KAFKA-1788 and perhaps link the issue.
2) True , some sort of measure would be great 5,10...25 50, 95 and 99
percentile would be great of execution time. The point is just measure the
duration report the rate of execution.
3) Agree with what you are saying and I have observed same behavior. But only
recommendation is to add some intelligence to *timeouts* to detect if for long
period and consecutive timeout is zero then there is problem. (Little more
defensive)
4) Again I agree with you point, but based in your previous comments you had
mentioned that you may consider having back-off logic further up the chain. So
I was just checking run() is best place to do that check. Again, may be add
intelligence here if you get consecutive “Exception” then likelihood of high
CPU is high.
5) Ok. I agree what you are saying is data needs to be de-queue so more data
can be en-queue even in event of network lost. Is my understanding correct ?
6) All I am saying is network firewall rule (such as only 2 TCP connections per
source host) or Brokers running out of File Descriptor so new connection to
broker is not established but Client have live and active TCP connection to
same broker. But based on what I see in the method * initiateConnect* will
mark the entire Broker or Node status as disconnected. Is this expected
behavior? So question is: will client continue to send data ?
Thank you very much for entertaining my questions so far and I will test out
the patch next week.
Thanks,
Bhavesh
> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network
> connection is lost
> ---------------------------------------------------------------------------------------
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
> Issue Type: Bug
> Components: producer
> Affects Versions: 0.8.2
> Reporter: Bhavesh Mistry
> Assignee: Ewen Cheslack-Postava
> Priority: Blocker
> Fix For: 0.8.2
>
> Attachments:
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch,
> KAFKA-1642.patch, KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch,
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while. It
> seems network IO thread are very busy logging following error message. Is
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka
> producer I/O thread:
> java.lang.IllegalStateException: No entry found for node -2
> at
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)