[jira] [Commented] (KAFKA-5781) Frequent long produce latency periods that result in reduced produce rate.

Raoufeh Hashemian (JIRA) Thu, 31 Aug 2017 23:10:15 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150090#comment-16150090
 ]


Raoufeh Hashemian commented on KAFKA-5781:
------------------------------------------

Just attached the log files and a plot of the produce latency. The times in the 
plot are -6 hours behind the UTC time in the logs. So the peaks happened at 
05:22 , 05:36 and 05:49 UTC

> Frequent long produce latency periods that result in reduced produce rate.
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-5781
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5781
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.11.0.0
>         Environment: CentOS Linux release 7.3.1611 , Kernel 3.10, java 
> version "1.8.0_121"
>            Reporter: Raoufeh Hashemian
>         Attachments: controler.log, 
> frequent_latency_increase_diskactivity.png, frequent_latency_increase.png, 
> frequent_latency_increase_zoomed.png, gc0.log, GC time.png, 
> produce_delay.png, server.log, state-change.log.zip
>
>
> When we upgraded from Kafka 0.10,2 to 0.11.0 , I started to see frequent 
> throughput drops with a predictable pattern (attached file shows the pattern 
> in a 14 hour period). This resulted in an a degradation of up to 30% in our 
> overall produce throughput.
> The drops can be correlated to the significant increase in 99th percentile 
> latency (up to 4 seconds). We have a cluster of 6 brokers and a single topic. 
> The problem happens both with/without consumers running so I only included a 
> case without consumers.
> There is no specific message in the broker logs when the latency surge 
> happens.  However, I found a correlation between the log rotation messages in 
> the log and the the longer cycles in the pattern (details shown in the 
> attached graph:frequent_latency_increase.png)
> Each increased latency period takes 5 to 20 minutes to finish (shown in the 
> zoomed graph in the attached files). 
> The broker cpu utilization goes down during this time and some read disk 
> activity is observed (see attached graph)
> This pattern started to appear in our environment exactly at the time when we 
> switched to kafka 0.11.0. We kept the idempotence as false and didn`t make 
> any configuration change as we switched. So I was wondering if it could be a 
> bug or configuration that needs to be changed after upgrade?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-5781) Frequent long produce latency periods that result in reduced produce rate.

Reply via email to