[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234297#comment-14234297
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---------------------------------------

[~ewencp],

1) I will posted toward KAFKA-1788 and perhaps link the issue.
2) True , some sort of measure would be great 5,10...25 50, 95 and 99 
percentile would be great of execution time.  The point is just measure the 
duration report the rate of execution. 
3) Agree with what you are saying and I have observed same behavior.  But only 
recommendation is to add some intelligence to *timeouts* to detect if for long 
period and consecutive timeout is zero then there is problem. (Little more 
defensive) 
4) Again I agree with you point, but based in your previous comments you had 
mentioned that you may consider having back-off logic further up the chain. So 
I was just checking run() is best place to do that check.  Again, may be add 
intelligence here if you get consecutive “Exception” then likelihood of high 
CPU is high.  
 
5) Ok.  I agree what you are saying is data needs to be de-queue so more data 
can be en-queue even in event of network lost.  Is my understanding correct ?

6) All I am saying is network firewall rule (such as only 2 TCP connections per 
source host) or Brokers running out of File Descriptor so new connection to 
broker is not established but Client have live and active TCP connection to 
same broker.  But based on what I see in the method * initiateConnect* will 
mark the entire Broker or Node status as disconnected.  Is this expected 
behavior?  So question is: will client continue to send data ?

Thank you very much for entertaining my questions so far and I will test out 
the patch next week.

Thanks,

Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1642
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1642
>             Project: Kafka
>          Issue Type: Bug
>          Components: producer 
>    Affects Versions: 0.8.2
>            Reporter: Bhavesh Mistry
>            Assignee: Ewen Cheslack-Postava
>            Priority: Blocker
>             Fix For: 0.8.2
>
>         Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to