[ 
https://issues.apache.org/jira/browse/KAFKA-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504977#comment-16504977
 ] 

rajadayalan perumalsamy edited comment on KAFKA-7012 at 6/7/18 6:29 PM:
------------------------------------------------------------------------

Thanks, my comments added inline

the attached profile screenshot shows only 210ms (out of >650k ms) of self time 
in pool.tryAllocate().
 a possible time drain with the patch is determineHandlingOrder() - it may 
shuffle the sockets in order to guarantee fairness under low memory conditions 
- but it doesnt appear in the screenshot.
     - did not notice issue with memory, i can do profiling with memory and 
include it.
 there's also no equivalent screenshot of a detailed profile breakdown for the 
results without the patch, so no base to compare to.
     - attaching screenshot Commit-f15cdbc91b-profile2 for comparing.
 was the profiler using cpu time or wall clock time for measurements?
     - In the attached screenshot Total time is the wall clock time and Total 
time(CPU) is the cpu time
 were the ssl settings (cipher used etc) the same across measurements?
    - ssl settings and cipher used are same across measurements.


was (Author: rajadayalanvdms):
My comments inline

the attached profile screenshot shows only 210ms (out of >650k ms) of self time 
in pool.tryAllocate().
a possible time drain with the patch is determineHandlingOrder() - it may 
shuffle the sockets in order to guarantee fairness under low memory conditions 
- but it doesnt appear in the screenshot.
memory was not an issue, that is reason i did not include it in screenshot. i 
can do profiling with memory and include it.
there's also no equivalent screenshot of a detailed profile breakdown for the 
results without the patch, so no base to compare to.
attaching screenshot Commit-f15cdbc91b-profile2 for comparing.
was the profiler using cpu time or wall clock time for measurements?
In the screenshot Total time is the wall clock time and Total time(CPU) is the 
cpu time
were the ssl settings (cipher used etc) the same across measurements?
ssl settings and cipher used are same across measurements.

> Performance issue upgrading to kafka 1.0.1 or 1.1
> -------------------------------------------------
>
>                 Key: KAFKA-7012
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7012
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 1.1.0, 1.0.1
>            Reporter: rajadayalan perumalsamy
>            Assignee: praveen
>            Priority: Major
>         Attachments: Commit-47ee8e954-profile.png, 
> Commit-47ee8e954-profile2.png, Commit-f15cdbc91b-profile.png, 
> Commit-f15cdbc91b-profile2.png
>
>
> We are trying to upgrade kafka cluster from Kafka 0.11.0.1 to Kafka 1.0.1. 
> After upgrading 1 node on the cluster, we notice that network threads use 
> most of the cpu. It is a 3 node cluster with 15k messages/sec on each node. 
> With Kafka 0.11.0.1 typical usage of the servers is around 50 to 60% 
> vcpu(using less than 1 vcpu). After upgrade we are noticing that cpu usage is 
> high depending on the number of network threads used. If networks threads is 
> set to 8, then the cpu usage is around 850%(9 vcpus) and if it is set to 4 
> then the cpu usage is around 450%(5 vcpus). Using the same kafka 
> server.properties for both.
> Did further analysis with git bisect, couple of build and deploys, traced the 
> issue to commit 47ee8e954df62b9a79099e944ec4be29afe046f6. CPU usage is fine 
> for commit f15cdbc91b240e656d9a2aeb6877e94624b21f8d. But with commit 
> 47ee8e954df62b9a79099e944ec4be29afe046f6 cpu usage has increased. Have 
> attached screenshots of profiling done with both the commits. Screenshot 
> Commit-f15cdbc91b-profile shows less cpu usage by network threads and 
> Screenshots Commit-47ee8e954-profile and Commit-47ee8e954-profile2 show 
> higher cpu usage(almost entire cpu usage) by network threads. Also noticed 
> that kafka.network.Processor.poll() method is invoked 10 times more with 
> commit 47ee8e954df62b9a79099e944ec4be29afe046f6.
> We need the issue to be resolved to upgrade the cluster. Please let me know 
> if you need any additional information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to