[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission
[ https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15788686#comment-15788686 ] Jeff Widman commented on KAFKA-3135: It's not currently a critical issue for my company. Typically when we're considering upgrading we look at outstanding bugs to evaluate whether to upgrade or wait, so I just wanted the tags to be corrected. Thanks [~ewencp] for handling. > Unexpected delay before fetch response transmission > --- > > Key: KAFKA-3135 > URL: https://issues.apache.org/jira/browse/KAFKA-3135 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.10.1.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.1 >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Critical > Fix For: 0.10.2.0 > > > From the user list, Krzysztof Ciesielski reports the following: > {quote} > Scenario description: > First, a producer writes 50 elements into a topic > Then, a consumer starts to read, polling in a loop. > When "max.partition.fetch.bytes" is set to a relatively small value, each > "consumer.poll()" returns a batch of messages. > If this value is left as default, the output tends to look like this: > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 13793 elements > Poll returned 13793 elements > As we can see, there are weird "gaps" when poll returns 0 elements for some > time. What is the reason for that? Maybe there are some good practices > about setting "max.partition.fetch.bytes" which I don't follow? > {quote} > The gist to reproduce this problem is here: > https://gist.github.com/kciesielski/054bb4359a318aa17561. > After some initial investigation, the delay appears to be in the server's > networking layer. Basically I see a delay of 5 seconds from the time that > Selector.send() is invoked in SocketServer.Processor with the fetch response > to the time that the send is completed. Using netstat in the middle of the > delay shows the following output: > {code} > tcp4 0 0 10.191.0.30.55455 10.191.0.30.9092 ESTABLISHED > tcp4 0 102400 10.191.0.30.9092 10.191.0.30.55454 ESTABLISHED > {code} > From this, it looks like the data reaches the send buffer, but needs to be > flushed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission
[ https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15788679#comment-15788679 ] Ewen Cheslack-Postava commented on KAFKA-3135: -- Also, /cc [~hachikuji] since he was the one that made the most progress on this. > Unexpected delay before fetch response transmission > --- > > Key: KAFKA-3135 > URL: https://issues.apache.org/jira/browse/KAFKA-3135 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.10.1.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.1 >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Critical > Fix For: 0.10.2.0 > > > From the user list, Krzysztof Ciesielski reports the following: > {quote} > Scenario description: > First, a producer writes 50 elements into a topic > Then, a consumer starts to read, polling in a loop. > When "max.partition.fetch.bytes" is set to a relatively small value, each > "consumer.poll()" returns a batch of messages. > If this value is left as default, the output tends to look like this: > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 13793 elements > Poll returned 13793 elements > As we can see, there are weird "gaps" when poll returns 0 elements for some > time. What is the reason for that? Maybe there are some good practices > about setting "max.partition.fetch.bytes" which I don't follow? > {quote} > The gist to reproduce this problem is here: > https://gist.github.com/kciesielski/054bb4359a318aa17561. > After some initial investigation, the delay appears to be in the server's > networking layer. Basically I see a delay of 5 seconds from the time that > Selector.send() is invoked in SocketServer.Processor with the fetch response > to the time that the send is completed. Using netstat in the middle of the > delay shows the following output: > {code} > tcp4 0 0 10.191.0.30.55455 10.191.0.30.9092 ESTABLISHED > tcp4 0 102400 10.191.0.30.9092 10.191.0.30.55454 ESTABLISHED > {code} > From this, it looks like the data reaches the send buffer, but needs to be > flushed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission
[ https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15788677#comment-15788677 ] Ewen Cheslack-Postava commented on KAFKA-3135: -- [~jeffwidman] I've updated the affected versions, and I'll track this for the 0.10.2.0 release. Do we have an idea of how critical this issue is? It seems like it should be hit fairly commonly (based on ideas that it's due to TCP persist timer issues), but we've seen few comments since Mar/Apr of earlier this year. Is this still a critical issue or should we downgrade it to something that would be a good improvement and make it a major/minor/nice-to-have? > Unexpected delay before fetch response transmission > --- > > Key: KAFKA-3135 > URL: https://issues.apache.org/jira/browse/KAFKA-3135 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.10.1.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.1 >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Critical > Fix For: 0.10.2.0 > > > From the user list, Krzysztof Ciesielski reports the following: > {quote} > Scenario description: > First, a producer writes 50 elements into a topic > Then, a consumer starts to read, polling in a loop. > When "max.partition.fetch.bytes" is set to a relatively small value, each > "consumer.poll()" returns a batch of messages. > If this value is left as default, the output tends to look like this: > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 13793 elements > Poll returned 13793 elements > As we can see, there are weird "gaps" when poll returns 0 elements for some > time. What is the reason for that? Maybe there are some good practices > about setting "max.partition.fetch.bytes" which I don't follow? > {quote} > The gist to reproduce this problem is here: > https://gist.github.com/kciesielski/054bb4359a318aa17561. > After some initial investigation, the delay appears to be in the server's > networking layer. Basically I see a delay of 5 seconds from the time that > Selector.send() is invoked in SocketServer.Processor with the fetch response > to the time that the send is completed. Using netstat in the middle of the > delay shows the following output: > {code} > tcp4 0 0 10.191.0.30.55455 10.191.0.30.9092 ESTABLISHED > tcp4 0 102400 10.191.0.30.9092 10.191.0.30.55454 ESTABLISHED > {code} > From this, it looks like the data reaches the send buffer, but needs to be > flushed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission
[ https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750082#comment-15750082 ] Jeff Widman commented on KAFKA-3135: Can the tags on this issue be updated to note that it applies to 0.10.x versions as well? At least, that's what it seems from reading through the issue description. > Unexpected delay before fetch response transmission > --- > > Key: KAFKA-3135 > URL: https://issues.apache.org/jira/browse/KAFKA-3135 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0 >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Critical > Fix For: 0.10.2.0 > > > From the user list, Krzysztof Ciesielski reports the following: > {quote} > Scenario description: > First, a producer writes 50 elements into a topic > Then, a consumer starts to read, polling in a loop. > When "max.partition.fetch.bytes" is set to a relatively small value, each > "consumer.poll()" returns a batch of messages. > If this value is left as default, the output tends to look like this: > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 13793 elements > Poll returned 13793 elements > As we can see, there are weird "gaps" when poll returns 0 elements for some > time. What is the reason for that? Maybe there are some good practices > about setting "max.partition.fetch.bytes" which I don't follow? > {quote} > The gist to reproduce this problem is here: > https://gist.github.com/kciesielski/054bb4359a318aa17561. > After some initial investigation, the delay appears to be in the server's > networking layer. Basically I see a delay of 5 seconds from the time that > Selector.send() is invoked in SocketServer.Processor with the fetch response > to the time that the send is completed. Using netstat in the middle of the > delay shows the following output: > {code} > tcp4 0 0 10.191.0.30.55455 10.191.0.30.9092 ESTABLISHED > tcp4 0 102400 10.191.0.30.9092 10.191.0.30.55454 ESTABLISHED > {code} > From this, it looks like the data reaches the send buffer, but needs to be > flushed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission
[ https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234170#comment-15234170 ] Jay Kreps commented on KAFKA-3135: -- This seems like an important one to get to the bottom of... > Unexpected delay before fetch response transmission > --- > > Key: KAFKA-3135 > URL: https://issues.apache.org/jira/browse/KAFKA-3135 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0 >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Critical > Fix For: 0.10.0.1 > > > From the user list, Krzysztof Ciesielski reports the following: > {quote} > Scenario description: > First, a producer writes 50 elements into a topic > Then, a consumer starts to read, polling in a loop. > When "max.partition.fetch.bytes" is set to a relatively small value, each > "consumer.poll()" returns a batch of messages. > If this value is left as default, the output tends to look like this: > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 13793 elements > Poll returned 13793 elements > As we can see, there are weird "gaps" when poll returns 0 elements for some > time. What is the reason for that? Maybe there are some good practices > about setting "max.partition.fetch.bytes" which I don't follow? > {quote} > The gist to reproduce this problem is here: > https://gist.github.com/kciesielski/054bb4359a318aa17561. > After some initial investigation, the delay appears to be in the server's > networking layer. Basically I see a delay of 5 seconds from the time that > Selector.send() is invoked in SocketServer.Processor with the fetch response > to the time that the send is completed. Using netstat in the middle of the > delay shows the following output: > {code} > tcp4 0 0 10.191.0.30.55455 10.191.0.30.9092 ESTABLISHED > tcp4 0 102400 10.191.0.30.9092 10.191.0.30.55454 ESTABLISHED > {code} > From this, it looks like the data reaches the send buffer, but needs to be > flushed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission
[ https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214993#comment-15214993 ] Jason Gustafson commented on KAFKA-3135: I made a little progress understanding this problem. The 5 second pause that we're seeing is caused by the TCP persist timer, which kicks in when the remote window size gets small enough. You can see this by watching the traffic through tcpdump and confirming the persist timeout event with 'netstat -s'. Clearly there are some cases where the client is not reading the data off the socket fast enough, which is causing the window size to fall and the server to back off. > Unexpected delay before fetch response transmission > --- > > Key: KAFKA-3135 > URL: https://issues.apache.org/jira/browse/KAFKA-3135 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0 >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Critical > Fix For: 0.10.0.1 > > > From the user list, Krzysztof Ciesielski reports the following: > {quote} > Scenario description: > First, a producer writes 50 elements into a topic > Then, a consumer starts to read, polling in a loop. > When "max.partition.fetch.bytes" is set to a relatively small value, each > "consumer.poll()" returns a batch of messages. > If this value is left as default, the output tends to look like this: > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 13793 elements > Poll returned 13793 elements > As we can see, there are weird "gaps" when poll returns 0 elements for some > time. What is the reason for that? Maybe there are some good practices > about setting "max.partition.fetch.bytes" which I don't follow? > {quote} > The gist to reproduce this problem is here: > https://gist.github.com/kciesielski/054bb4359a318aa17561. > After some initial investigation, the delay appears to be in the server's > networking layer. Basically I see a delay of 5 seconds from the time that > Selector.send() is invoked in SocketServer.Processor with the fetch response > to the time that the send is completed. Using netstat in the middle of the > delay shows the following output: > {code} > tcp4 0 0 10.191.0.30.55455 10.191.0.30.9092 ESTABLISHED > tcp4 0 102400 10.191.0.30.9092 10.191.0.30.55454 ESTABLISHED > {code} > From this, it looks like the data reaches the send buffer, but needs to be > flushed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission
[ https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208740#comment-15208740 ] Jason Gustafson commented on KAFKA-3135: [~rangadi] [~kciesielski] Thanks for letting me know. I tried to reproduce previously on Linux, but wasn't able to. Going to bump this to critical and see if I can isolate the problem. > Unexpected delay before fetch response transmission > --- > > Key: KAFKA-3135 > URL: https://issues.apache.org/jira/browse/KAFKA-3135 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0 >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Critical > Fix For: 0.10.0.1 > > > From the user list, Krzysztof Ciesielski reports the following: > {quote} > Scenario description: > First, a producer writes 50 elements into a topic > Then, a consumer starts to read, polling in a loop. > When "max.partition.fetch.bytes" is set to a relatively small value, each > "consumer.poll()" returns a batch of messages. > If this value is left as default, the output tends to look like this: > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 13793 elements > Poll returned 13793 elements > As we can see, there are weird "gaps" when poll returns 0 elements for some > time. What is the reason for that? Maybe there are some good practices > about setting "max.partition.fetch.bytes" which I don't follow? > {quote} > The gist to reproduce this problem is here: > https://gist.github.com/kciesielski/054bb4359a318aa17561. > After some initial investigation, the delay appears to be in the server's > networking layer. Basically I see a delay of 5 seconds from the time that > Selector.send() is invoked in SocketServer.Processor with the fetch response > to the time that the send is completed. Using netstat in the middle of the > delay shows the following output: > {code} > tcp4 0 0 10.191.0.30.55455 10.191.0.30.9092 ESTABLISHED > tcp4 0 102400 10.191.0.30.9092 10.191.0.30.55454 ESTABLISHED > {code} > From this, it looks like the data reaches the send buffer, but needs to be > flushed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission
[ https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208512#comment-15208512 ] Raghu Angadi commented on KAFKA-3135: - [~hachikuji], mine is also on Linux with Java8 (docker containers on GCE). In one of the tests with default receive buffers, select() took alternated between waiting 0 ms and 20-30ms.. reading a few KB each time. > Unexpected delay before fetch response transmission > --- > > Key: KAFKA-3135 > URL: https://issues.apache.org/jira/browse/KAFKA-3135 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0 >Reporter: Jason Gustafson >Assignee: Jason Gustafson > Fix For: 0.10.0.1 > > > From the user list, Krzysztof Ciesielski reports the following: > {quote} > Scenario description: > First, a producer writes 50 elements into a topic > Then, a consumer starts to read, polling in a loop. > When "max.partition.fetch.bytes" is set to a relatively small value, each > "consumer.poll()" returns a batch of messages. > If this value is left as default, the output tends to look like this: > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 13793 elements > Poll returned 13793 elements > As we can see, there are weird "gaps" when poll returns 0 elements for some > time. What is the reason for that? Maybe there are some good practices > about setting "max.partition.fetch.bytes" which I don't follow? > {quote} > The gist to reproduce this problem is here: > https://gist.github.com/kciesielski/054bb4359a318aa17561. > After some initial investigation, the delay appears to be in the server's > networking layer. Basically I see a delay of 5 seconds from the time that > Selector.send() is invoked in SocketServer.Processor with the fetch response > to the time that the send is completed. Using netstat in the middle of the > delay shows the following output: > {code} > tcp4 0 0 10.191.0.30.55455 10.191.0.30.9092 ESTABLISHED > tcp4 0 102400 10.191.0.30.9092 10.191.0.30.55454 ESTABLISHED > {code} > From this, it looks like the data reaches the send buffer, but needs to be > flushed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission
[ https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208129#comment-15208129 ] Krzysiek Ciesielski commented on KAFKA-3135: [~hachikuji] Actually, this error has been originally spotted on Linux. To be precise, a colleague of mine was preparing some Kafka evaluation for his customer on EC2 with AWS Linux AMI (64 bit). Kafka cluster has been configured with Docker using "java8" as base image. After noticing strange lags, we were able to reproduce this on OSX with code from attached gist. > Unexpected delay before fetch response transmission > --- > > Key: KAFKA-3135 > URL: https://issues.apache.org/jira/browse/KAFKA-3135 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0 >Reporter: Jason Gustafson >Assignee: Jason Gustafson > > From the user list, Krzysztof Ciesielski reports the following: > {quote} > Scenario description: > First, a producer writes 50 elements into a topic > Then, a consumer starts to read, polling in a loop. > When "max.partition.fetch.bytes" is set to a relatively small value, each > "consumer.poll()" returns a batch of messages. > If this value is left as default, the output tends to look like this: > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 13793 elements > Poll returned 13793 elements > As we can see, there are weird "gaps" when poll returns 0 elements for some > time. What is the reason for that? Maybe there are some good practices > about setting "max.partition.fetch.bytes" which I don't follow? > {quote} > The gist to reproduce this problem is here: > https://gist.github.com/kciesielski/054bb4359a318aa17561. > After some initial investigation, the delay appears to be in the server's > networking layer. Basically I see a delay of 5 seconds from the time that > Selector.send() is invoked in SocketServer.Processor with the fetch response > to the time that the send is completed. Using netstat in the middle of the > delay shows the following output: > {code} > tcp4 0 0 10.191.0.30.55455 10.191.0.30.9092 ESTABLISHED > tcp4 0 102400 10.191.0.30.9092 10.191.0.30.55454 ESTABLISHED > {code} > From this, it looks like the data reaches the send buffer, but needs to be > flushed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission
[ https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200041#comment-15200041 ] Jason Gustafson commented on KAFKA-3135: [~rangadi] Are you on OSX? So far, I've only seen this reproduced on macs. > Unexpected delay before fetch response transmission > --- > > Key: KAFKA-3135 > URL: https://issues.apache.org/jira/browse/KAFKA-3135 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0 >Reporter: Jason Gustafson >Assignee: Jason Gustafson > > From the user list, Krzysztof Ciesielski reports the following: > {quote} > Scenario description: > First, a producer writes 50 elements into a topic > Then, a consumer starts to read, polling in a loop. > When "max.partition.fetch.bytes" is set to a relatively small value, each > "consumer.poll()" returns a batch of messages. > If this value is left as default, the output tends to look like this: > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 13793 elements > Poll returned 13793 elements > As we can see, there are weird "gaps" when poll returns 0 elements for some > time. What is the reason for that? Maybe there are some good practices > about setting "max.partition.fetch.bytes" which I don't follow? > {quote} > The gist to reproduce this problem is here: > https://gist.github.com/kciesielski/054bb4359a318aa17561. > After some initial investigation, the delay appears to be in the server's > networking layer. Basically I see a delay of 5 seconds from the time that > Selector.send() is invoked in SocketServer.Processor with the fetch response > to the time that the send is completed. Using netstat in the middle of the > delay shows the following output: > {code} > tcp4 0 0 10.191.0.30.55455 10.191.0.30.9092 ESTABLISHED > tcp4 0 102400 10.191.0.30.9092 10.191.0.30.55454 ESTABLISHED > {code} > From this, it looks like the data reaches the send buffer, but needs to be > flushed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission
[ https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199004#comment-15199004 ] Raghu Angadi commented on KAFKA-3135: - {quote} {code} tcp4 0 0 10.191.0.30.55455 10.191.0.30.9092 ESTABLISHED tcp4 0 102400 10.191.0.30.9092 10.191.0.30.55454 ESTABLISHED {code} {quote} This shows server is ok. It is trying to send and already flushed/wrote on its end of the tcp connection. If you check netstat for the side of the connection (local port 55454) you should see data in receive buffers. Still don't know why consumer is not reading. I am also seeing similar weird behavior. with 64K receiver buffer reduces the intensity of the problem, but does not fix it. > Unexpected delay before fetch response transmission > --- > > Key: KAFKA-3135 > URL: https://issues.apache.org/jira/browse/KAFKA-3135 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0 >Reporter: Jason Gustafson >Assignee: Jason Gustafson > > From the user list, Krzysztof Ciesielski reports the following: > {quote} > Scenario description: > First, a producer writes 50 elements into a topic > Then, a consumer starts to read, polling in a loop. > When "max.partition.fetch.bytes" is set to a relatively small value, each > "consumer.poll()" returns a batch of messages. > If this value is left as default, the output tends to look like this: > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 13793 elements > Poll returned 13793 elements > As we can see, there are weird "gaps" when poll returns 0 elements for some > time. What is the reason for that? Maybe there are some good practices > about setting "max.partition.fetch.bytes" which I don't follow? > {quote} > The gist to reproduce this problem is here: > https://gist.github.com/kciesielski/054bb4359a318aa17561. > After some initial investigation, the delay appears to be in the server's > networking layer. Basically I see a delay of 5 seconds from the time that > Selector.send() is invoked in SocketServer.Processor with the fetch response > to the time that the send is completed. Using netstat in the middle of the > delay shows the following output: > {code} > tcp4 0 0 10.191.0.30.55455 10.191.0.30.9092 ESTABLISHED > tcp4 0 102400 10.191.0.30.9092 10.191.0.30.55454 ESTABLISHED > {code} > From this, it looks like the data reaches the send buffer, but needs to be > flushed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission
[ https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147379#comment-15147379 ] Ismael Juma commented on KAFKA-3135: [~hachikuji], do you know why the receive buffer size was changed to 32k in the new consumer? > Unexpected delay before fetch response transmission > --- > > Key: KAFKA-3135 > URL: https://issues.apache.org/jira/browse/KAFKA-3135 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0 >Reporter: Jason Gustafson >Assignee: Jason Gustafson > > From the user list, Krzysztof Ciesielski reports the following: > {quote} > Scenario description: > First, a producer writes 50 elements into a topic > Then, a consumer starts to read, polling in a loop. > When "max.partition.fetch.bytes" is set to a relatively small value, each > "consumer.poll()" returns a batch of messages. > If this value is left as default, the output tends to look like this: > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 13793 elements > Poll returned 13793 elements > As we can see, there are weird "gaps" when poll returns 0 elements for some > time. What is the reason for that? Maybe there are some good practices > about setting "max.partition.fetch.bytes" which I don't follow? > {quote} > The gist to reproduce this problem is here: > https://gist.github.com/kciesielski/054bb4359a318aa17561. > After some initial investigation, the delay appears to be in the server's > networking layer. Basically I see a delay of 5 seconds from the time that > Selector.send() is invoked in SocketServer.Processor with the fetch response > to the time that the send is completed. Using netstat in the middle of the > delay shows the following output: > {code} > tcp4 0 0 10.191.0.30.55455 10.191.0.30.9092 ESTABLISHED > tcp4 0 102400 10.191.0.30.9092 10.191.0.30.55454 ESTABLISHED > {code} > From this, it looks like the data reaches the send buffer, but needs to be > flushed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission
[ https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116561#comment-15116561 ] Jason Gustafson commented on KAFKA-3135: Interestingly, I can reproduce this pause even using the old consumer on 0.8.2 if I change the receive buffer size to 32K (as is the default for the new consumer). Similarly, by changing the new consumer's receive buffer to 64K (as is the default for the old consumer), the problem seems to go away. This may be a viable workaround until the root cause is known. Note I have only tested this on my macbook (OS-X 10.10.5) with Oracle's jdk (1.8.0_65-b17). > Unexpected delay before fetch response transmission > --- > > Key: KAFKA-3135 > URL: https://issues.apache.org/jira/browse/KAFKA-3135 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0 >Reporter: Jason Gustafson >Assignee: Jason Gustafson > > From the user list, Krzysztof Ciesielski reports the following: > {quote} > Scenario description: > First, a producer writes 50 elements into a topic > Then, a consumer starts to read, polling in a loop. > When "max.partition.fetch.bytes" is set to a relatively small value, each > "consumer.poll()" returns a batch of messages. > If this value is left as default, the output tends to look like this: > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 13793 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 0 elements > Poll returned 13793 elements > Poll returned 13793 elements > As we can see, there are weird "gaps" when poll returns 0 elements for some > time. What is the reason for that? Maybe there are some good practices > about setting "max.partition.fetch.bytes" which I don't follow? > {quote} > The gist to reproduce this problem is here: > https://gist.github.com/kciesielski/054bb4359a318aa17561. > After some initial investigation, the delay appears to be in the server's > networking layer. Basically I see a delay of 5 seconds from the time that > Selector.send() is invoked in SocketServer.Processor with the fetch response > to the time that the send is completed. Using netstat in the middle of the > delay shows the following output: > {code} > tcp4 0 0 10.191.0.30.55455 10.191.0.30.9092 ESTABLISHED > tcp4 0 102400 10.191.0.30.9092 10.191.0.30.55454 ESTABLISHED > {code} > From this, it looks like the data reaches the send buffer, but needs to be > flushed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)