[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets
[ https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136193#comment-13136193 ] Jay Kreps commented on KAFKA-171: - Attached is a draft patch which turns the request into a single write. This is just a draft if this actually improves performance we should change Receive to use ScatteringByteChannel for consistency and also clean up a few more files with the same trick. On my mac laptop I do see a change in tcpdump which seems to eliminate the 4 byte send. However I don't see any positive result in performance for synchronous single-threaded sends of 10 byte messages (which should be the worst case for this). I think this may just be because I am testing over localhost. Here are the details on the results I have: TRUNK: jkreps-mn:kafka-git jkreps$ sudo tcpdump -i lo0 port 9093 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lo0, link-type NULL (BSD loopback), capture size 96 bytes 10:32:30.128938 IP jkreps-mn.linkedin.biz.56953 > jkreps-mn.linkedin.biz.9093: S 323648854:323648854(0) win 65535 10:32:30.129004 IP jkreps-mn.linkedin.biz.9093 > jkreps-mn.linkedin.biz.56953: S 526915069:526915069(0) ack 323648855 win 65535 10:32:30.129013 IP jkreps-mn.linkedin.biz.56953 > jkreps-mn.linkedin.biz.9093: . ack 1 win 65535 10:32:30.129022 IP jkreps-mn.linkedin.biz.9093 > jkreps-mn.linkedin.biz.56953: . ack 1 win 65535 10:32:30.129306 IP jkreps-mn.linkedin.biz.56953 > jkreps-mn.linkedin.biz.9093: P 1:5(4) ack 1 win 65535 10:32:30.129319 IP jkreps-mn.linkedin.biz.9093 > jkreps-mn.linkedin.biz.56953: . ack 5 win 65535 10:32:30.129339 IP jkreps-mn.linkedin.biz.56953 > jkreps-mn.linkedin.biz.9093: P 5:41(36) ack 1 win 65535 10:32:30.129350 IP jkreps-mn.linkedin.biz.9093 > jkreps-mn.linkedin.biz.56953: . ack 41 win 65535 10:32:30.151892 IP jkreps-mn.linkedin.biz.56953 > jkreps-mn.linkedin.biz.9093: F 41:41(0) ack 1 win 65535 10:32:30.151938 IP jkreps-mn.linkedin.biz.9093 > jkreps-mn.linkedin.biz.56953: . ack 42 win 65535 10:32:30.151946 IP jkreps-mn.linkedin.biz.56953 > jkreps-mn.linkedin.biz.9093: . ack 1 win 65535 10:32:30.152554 IP jkreps-mn.linkedin.biz.9093 > jkreps-mn.linkedin.biz.56953: F 1:1(0) ack 42 win 65535 10:32:30.152571 IP jkreps-mn.linkedin.biz.56953 > jkreps-mn.linkedin.biz.9093: . ack 2 win 65535 PATCHED: jkreps-mn:kafka-git jkreps$ sudo tcpdump -i lo0 port 9093 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lo0, link-type NULL (BSD loopback), capture size 96 bytes 10:35:40.637220 IP jkreps-mn.linkedin.biz.56993 > jkreps-mn.linkedin.biz.9093: S 1456363353:1456363353(0) win 65535 10:35:40.637287 IP jkreps-mn.linkedin.biz.9093 > jkreps-mn.linkedin.biz.56993: S 1260172914:1260172914(0) ack 1456363354 win 65535 10:35:40.637296 IP jkreps-mn.linkedin.biz.56993 > jkreps-mn.linkedin.biz.9093: . ack 1 win 65535 10:35:40.637306 IP jkreps-mn.linkedin.biz.9093 > jkreps-mn.linkedin.biz.56993: . ack 1 win 65535 10:35:40.657848 IP jkreps-mn.linkedin.biz.56993 > jkreps-mn.linkedin.biz.9093: P 1:41(40) ack 1 win 65535 10:35:40.657886 IP jkreps-mn.linkedin.biz.9093 > jkreps-mn.linkedin.biz.56993: . ack 41 win 65535 10:35:40.711399 IP jkreps-mn.linkedin.biz.56993 > jkreps-mn.linkedin.biz.9093: F 41:41(0) ack 1 win 65535 10:35:40.711430 IP jkreps-mn.linkedin.biz.9093 > jkreps-mn.linkedin.biz.56993: . ack 42 win 65535 10:35:40.711437 IP jkreps-mn.linkedin.biz.56993 > jkreps-mn.linkedin.biz.9093: . ack 1 win 65535 10:35:40.762640 IP jkreps-mn.linkedin.biz.9093 > jkreps-mn.linkedin.biz.56993: F 1:1(0) ack 42 win 65535 10:35:40.762678 IP jkreps-mn.linkedin.biz.56993 > jkreps-mn.linkedin.biz.9093: . ack 2 win 65535 TRUNK: bin/kafka-producer-perf-test.sh --topic test --brokerinfo zk.connect=localhost:2181 --messages 30 --message-size 10 --batch-size 1 --threads 1 ... [2011-10-26 10:33:58,458] INFO Total Num Messages: 30 bytes: 300 in 13.636 secs (kafka.tools.ProducerPerformance$) [2011-10-26 10:33:58,459] INFO Messages/sec: 22000.5867 (kafka.tools.ProducerPerformance$) [2011-10-26 10:33:58,459] INFO MB/sec: 0.2098 (kafka.tools.ProducerPerformance$) PATCHED: jkreps-mn:kafka-git jkreps$ bin/kafka-producer-perf-test.sh --topic test --brokerinfo zk.connect=localhost:2181 --messages 30 --message-size 10 --batch-size 1 --threads 1 ... [2011-10-26 10:38:03,965] INFO Total Num Messages: 30 bytes: 300 in 13.254 secs (kafka.tools.ProducerPerformance$) [2011-10-26 10:38:03,965] INFO Messages/sec: 22634.6763 (kafka.tools.ProducerPerformance$) [2011-10-26 10:38:03,966] INFO MB/sec: 0.2159 (kafka.tools.ProducerPerformance$) > Kafka producer should do a single write to send message sets > > > Key: KAFKA-171 > URL: https://issues.apache.or
[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets
[ https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136201#comment-13136201 ] Neha Narkhede commented on KAFKA-171: - This is a good change to make. A couple of comments - 1. Since we are changing WritableByteChannel to GatheringByteChannel, it is better to change the return type of writeTo and writeCompletely to return long, instead of int. This will avoid the coercion to Int in BoundedByteBufferSend.scala. 2. There are a couple of other places, where we do these double writes, e.g. OffsetArraySend, MessageSetSend etc. We might as well fix those ? > Kafka producer should do a single write to send message sets > > > Key: KAFKA-171 > URL: https://issues.apache.org/jira/browse/KAFKA-171 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.7, 0.8 >Reporter: Jay Kreps >Assignee: Jay Kreps > Fix For: 0.8 > > Attachments: KAFKA-171-draft.patch > > > From email thread: > http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e > > Before sending an actual message, kafka producer do send a (control) > > message of 4 bytes to the server. Kafka producer always does this action > > before send some message to the server. > I think this is because in BoundedByteBufferSend.scala we do essentially > channel.write(sizeBuffer) > channel.write(dataBuffer) > The correct solution is to use vector I/O and instead do > channel.write(Array(sizeBuffer, dataBuffer)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets
[ https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136223#comment-13136223 ] Jay Kreps commented on KAFKA-171: - Moving off localhost between my mac laptop and dev workstation (linux) I see similar results: TRUNK: jkreps-mn:kafka-git jkreps$ bin/kafka-producer-perf-test.sh --topic test --brokerinfo zk.connect=jkreps-ld:2181 --messages 50 --message-size 10 --batch-size 1 --threads 1 [2011-10-26 11:59:51,795] INFO Total Num Messages: 50 bytes: 500 in 13.046 secs (kafka.tools.ProducerPerformance$) [2011-10-26 11:59:51,795] INFO Messages/sec: 38325.9237 (kafka.tools.ProducerPerformance$) [2011-10-26 11:59:51,795] INFO MB/sec: 0.3655 (kafka.tools.ProducerPerformance$) PATCHED: jkreps-mn:kafka-git jkreps$ bin/kafka-producer-perf-test.sh --topic test --brokerinfo zk.connect=jkreps-ld:2181 --messages 50 --message-size 10 --batch-size 1 --threads 1 [2011-10-26 11:58:42,335] INFO Total Num Messages: 50 bytes: 500 in 13.125 secs (kafka.tools.ProducerPerformance$) [2011-10-26 11:58:42,335] INFO Messages/sec: 38095.2381 (kafka.tools.ProducerPerformance$) [2011-10-26 11:58:42,335] INFO MB/sec: 0.3633 (kafka.tools.ProducerPerformance$) > Kafka producer should do a single write to send message sets > > > Key: KAFKA-171 > URL: https://issues.apache.org/jira/browse/KAFKA-171 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.7, 0.8 >Reporter: Jay Kreps >Assignee: Jay Kreps > Fix For: 0.8 > > Attachments: KAFKA-171-draft.patch > > > From email thread: > http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e > > Before sending an actual message, kafka producer do send a (control) > > message of 4 bytes to the server. Kafka producer always does this action > > before send some message to the server. > I think this is because in BoundedByteBufferSend.scala we do essentially > channel.write(sizeBuffer) > channel.write(dataBuffer) > The correct solution is to use vector I/O and instead do > channel.write(Array(sizeBuffer, dataBuffer)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets
[ https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136622#comment-13136622 ] Chris Burroughs commented on KAFKA-171: --- Even if this doesn't measurably improve node to node performance (and I'm not sure we should expect it to since we don't have to wait for an ACK to send the next packet), isn't it definitely making life better for network engineer? > Kafka producer should do a single write to send message sets > > > Key: KAFKA-171 > URL: https://issues.apache.org/jira/browse/KAFKA-171 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.7, 0.8 >Reporter: Jay Kreps >Assignee: Jay Kreps > Fix For: 0.8 > > Attachments: KAFKA-171-draft.patch > > > From email thread: > http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e > > Before sending an actual message, kafka producer do send a (control) > > message of 4 bytes to the server. Kafka producer always does this action > > before send some message to the server. > I think this is because in BoundedByteBufferSend.scala we do essentially > channel.write(sizeBuffer) > channel.write(dataBuffer) > The correct solution is to use vector I/O and instead do > channel.write(Array(sizeBuffer, dataBuffer)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets
[ https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136662#comment-13136662 ] Jay Kreps commented on KAFKA-171: - Yes, I think we should do it. My concern was just that I might be misunderstanding tcpdump or something since I find this a little counter-intuitive.. > Kafka producer should do a single write to send message sets > > > Key: KAFKA-171 > URL: https://issues.apache.org/jira/browse/KAFKA-171 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.7, 0.8 >Reporter: Jay Kreps >Assignee: Jay Kreps > Fix For: 0.8 > > Attachments: KAFKA-171-draft.patch > > > From email thread: > http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e > > Before sending an actual message, kafka producer do send a (control) > > message of 4 bytes to the server. Kafka producer always does this action > > before send some message to the server. > I think this is because in BoundedByteBufferSend.scala we do essentially > channel.write(sizeBuffer) > channel.write(dataBuffer) > The correct solution is to use vector I/O and instead do > channel.write(Array(sizeBuffer, dataBuffer)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets
[ https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140857#comment-13140857 ] Neha Narkhede commented on KAFKA-171: - Since we are changing WritableByteChannel to GatheringByteChannel, would it be better to change the return type of writeTo and writeCompletely to return long, instead of int. This will avoid the coercion to Int in BoundedByteBufferSend.scala. > Kafka producer should do a single write to send message sets > > > Key: KAFKA-171 > URL: https://issues.apache.org/jira/browse/KAFKA-171 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.7, 0.8 >Reporter: Jay Kreps >Assignee: Jay Kreps > Fix For: 0.8 > > Attachments: KAFKA-171-draft.patch, KAFKA-171.patch > > > From email thread: > http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e > > Before sending an actual message, kafka producer do send a (control) > > message of 4 bytes to the server. Kafka producer always does this action > > before send some message to the server. > I think this is because in BoundedByteBufferSend.scala we do essentially > channel.write(sizeBuffer) > channel.write(dataBuffer) > The correct solution is to use vector I/O and instead do > channel.write(Array(sizeBuffer, dataBuffer)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets
[ https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140935#comment-13140935 ] Neha Narkhede commented on KAFKA-171: - nnarkhed-mn:kafka-171 nnarkhed$ find . -name "*scala" -exec grep -Hi "asInstanceOf\[Int\]" {} \; ./core/src/main/scala/kafka/api/OffsetRequest.scala: header.putInt(size.asInstanceOf[Int] + 2) ./core/src/main/scala/kafka/api/ProducerRequest.scala: def sizeInBytes(): Int = 2 + topic.length + 4 + 4 + messages.sizeInBytes.asInstanceOf[Int] ./core/src/main/scala/kafka/network/BoundedByteBufferSend.scala: written.asInstanceOf[Int] ./core/src/main/scala/kafka/producer/SyncProducer.scala:val setSize = messages.sizeInBytes.asInstanceOf[Int] ./core/src/main/scala/kafka/server/MessageSetSend.scala: header.putInt(size.asInstanceOf[Int] + 2) ./core/src/main/scala/kafka/server/MessageSetSend.scala: written += fileBytesSent.asInstanceOf[Int] ./core/src/main/scala/kafka/server/MessageSetSend.scala: def sendSize: Int = size.asInstanceOf[Int] + header.capacity ./core/src/main/scala/kafka/utils/Utils.scala:buffer.putInt((value & 0xL).asInstanceOf[Int]) ./core/src/main/scala/kafka/utils/Utils.scala:buffer.putInt(index, (value & 0xL).asInstanceOf[Int]) Its not great that we have so many places where we need to worry about coercion, but we can clean this up the next time we change the on wire protocol. +1 on the latest patch > Kafka producer should do a single write to send message sets > > > Key: KAFKA-171 > URL: https://issues.apache.org/jira/browse/KAFKA-171 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.7, 0.8 >Reporter: Jay Kreps >Assignee: Jay Kreps > Fix For: 0.8 > > Attachments: KAFKA-171-draft.patch, KAFKA-171-v2.patch, > KAFKA-171.patch > > > From email thread: > http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e > > Before sending an actual message, kafka producer do send a (control) > > message of 4 bytes to the server. Kafka producer always does this action > > before send some message to the server. > I think this is because in BoundedByteBufferSend.scala we do essentially > channel.write(sizeBuffer) > channel.write(dataBuffer) > The correct solution is to use vector I/O and instead do > channel.write(Array(sizeBuffer, dataBuffer)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets
[ https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141272#comment-13141272 ] Jun Rao commented on KAFKA-171: --- MessageSet has a couple of unused imports. Other than that, the patch looks good. > Kafka producer should do a single write to send message sets > > > Key: KAFKA-171 > URL: https://issues.apache.org/jira/browse/KAFKA-171 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.7, 0.8 >Reporter: Jay Kreps >Assignee: Jay Kreps > Fix For: 0.8 > > Attachments: KAFKA-171-draft.patch, KAFKA-171-v2.patch, > KAFKA-171.patch > > > From email thread: > http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e > > Before sending an actual message, kafka producer do send a (control) > > message of 4 bytes to the server. Kafka producer always does this action > > before send some message to the server. > I think this is because in BoundedByteBufferSend.scala we do essentially > channel.write(sizeBuffer) > channel.write(dataBuffer) > The correct solution is to use vector I/O and instead do > channel.write(Array(sizeBuffer, dataBuffer)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets
[ https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141295#comment-13141295 ] Jay Kreps commented on KAFKA-171: - Cool, will clean up imports before checking in. I am going to hold off on this until after 0.7 goes out. > Kafka producer should do a single write to send message sets > > > Key: KAFKA-171 > URL: https://issues.apache.org/jira/browse/KAFKA-171 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.7, 0.8 >Reporter: Jay Kreps >Assignee: Jay Kreps > Fix For: 0.8 > > Attachments: KAFKA-171-draft.patch, KAFKA-171-v2.patch, > KAFKA-171.patch > > > From email thread: > http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e > > Before sending an actual message, kafka producer do send a (control) > > message of 4 bytes to the server. Kafka producer always does this action > > before send some message to the server. > I think this is because in BoundedByteBufferSend.scala we do essentially > channel.write(sizeBuffer) > channel.write(dataBuffer) > The correct solution is to use vector I/O and instead do > channel.write(Array(sizeBuffer, dataBuffer)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets
[ https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141374#comment-13141374 ] Neha Narkhede commented on KAFKA-171: - You can check it into trunk. 0.7 is going off its own branch > Kafka producer should do a single write to send message sets > > > Key: KAFKA-171 > URL: https://issues.apache.org/jira/browse/KAFKA-171 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.7, 0.8 >Reporter: Jay Kreps >Assignee: Jay Kreps > Fix For: 0.8 > > Attachments: KAFKA-171-draft.patch, KAFKA-171-v2.patch, > KAFKA-171.patch > > > From email thread: > http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e > > Before sending an actual message, kafka producer do send a (control) > > message of 4 bytes to the server. Kafka producer always does this action > > before send some message to the server. > I think this is because in BoundedByteBufferSend.scala we do essentially > channel.write(sizeBuffer) > channel.write(dataBuffer) > The correct solution is to use vector I/O and instead do > channel.write(Array(sizeBuffer, dataBuffer)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira