[ https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15788686#comment-15788686 ]
Jeff Widman commented on KAFKA-3135: ------------------------------------ It's not currently a critical issue for my company. Typically when we're considering upgrading we look at outstanding bugs to evaluate whether to upgrade or wait, so I just wanted the tags to be corrected. Thanks [~ewencp] for handling. > Unexpected delay before fetch response transmission > --------------------------------------------------- > > Key: KAFKA-3135 > URL: https://issues.apache.org/jira/browse/KAFKA-3135 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.9.0.0, 0.10.1.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.1 > Reporter: Jason Gustafson > Assignee: Jason Gustafson > Priority: Critical > Fix For: 0.10.2.0 > > > From the user list, Krzysztof Ciesielski reports the following: > {quote} > Scenario description: > First, a producer writes 500000 elements into a topic > Then, a consumer starts to read, polling in a loop. > When "max.partition.fetch.bytes" is set to a relatively small value, each > "consumer.poll()" returns a batch of messages. > If this value is left as default, the output tends to look like this: > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 13793 elements > Poll returned 13793 elements > As we can see, there are weird "gaps" when poll returns 0 elements for some > time. What is the reason for that? Maybe there are some good practices > about setting "max.partition.fetch.bytes" which I don't follow? > {quote} > The gist to reproduce this problem is here: > https://gist.github.com/kciesielski/054bb4359a318aa17561. > After some initial investigation, the delay appears to be in the server's > networking layer. Basically I see a delay of 5 seconds from the time that > Selector.send() is invoked in SocketServer.Processor with the fetch response > to the time that the send is completed. Using netstat in the middle of the > delay shows the following output: > {code} > tcp4 0 0 10.191.0.30.55455 10.191.0.30.9092 ESTABLISHED > tcp4 0 102400 10.191.0.30.9092 10.191.0.30.55454 ESTABLISHED > {code} > From this, it looks like the data reaches the send buffer, but needs to be > flushed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)