[jira] [Created] (KAFKA-170) Support for non-blocking polling on multiple streams
Support for non-blocking polling on multiple streams Key: KAFKA-170 URL: https://issues.apache.org/jira/browse/KAFKA-170 Project: Kafka Issue Type: New Feature Components: core Affects Versions: 0.8 Reporter: Jay Kreps Currently we provide a blocking iterator in the consumer. This is a good mechanism for consuming data from a single topic, but is limited as a mechanism for polling multiple streams. For example if one wants to implement a non-blocking union across multiple streams this is hard to do because calls may block indefinitely. A similar situation arrises if trying to implement a streaming join of between two streams. I would propose two changes: 1. Implement a next(timeout) interface on KafkaMessageStream. This will easily handle some simple cases with minimal change. This handles certain limited cases nicely and is easy to implement, but doesn't actually cover the two cases above. 2. Add an interface to poll streams. I don't know the best approach for the later api, but it is important to get it right. One option would be to add a ConsumerConnector.drainTopics(topic1, topic2, ...) which blocks until there is at least one message and then returns a list of triples (topic, partition, message). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-170) Support for non-blocking polling on multiple streams
[ https://issues.apache.org/jira/browse/KAFKA-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135764#comment-13135764 ] Taylor Gautier commented on KAFKA-170: -- Hi - I solved this problem in the NodeJS client here: https://github.com/tagged/node-kafka You may or may not like the approach - but at least you can see which solution I went with. Node of course is an event based system so it's more natural to use callbacks which may not necessarily be appropriate for Java. Support for non-blocking polling on multiple streams Key: KAFKA-170 URL: https://issues.apache.org/jira/browse/KAFKA-170 Project: Kafka Issue Type: New Feature Components: core Affects Versions: 0.8 Reporter: Jay Kreps Currently we provide a blocking iterator in the consumer. This is a good mechanism for consuming data from a single topic, but is limited as a mechanism for polling multiple streams. For example if one wants to implement a non-blocking union across multiple streams this is hard to do because calls may block indefinitely. A similar situation arrises if trying to implement a streaming join of between two streams. I would propose two changes: 1. Implement a next(timeout) interface on KafkaMessageStream. This will easily handle some simple cases with minimal change. This handles certain limited cases nicely and is easy to implement, but doesn't actually cover the two cases above. 2. Add an interface to poll streams. I don't know the best approach for the later api, but it is important to get it right. One option would be to add a ConsumerConnector.drainTopics(topic1, topic2, ...) which blocks until there is at least one message and then returns a list of triples (topic, partition, message). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: kafka message format
Yes, there is a 10 byte overhead per message. Here is the exact message format. message length : 4 bytes (value: 1+1+4+n) magic value : 1 byte compression attr: 1 byte crc : 4 bytes payload : n bytes The data sent over the wire also include topic name, partition name, total length of the request, in addition to the message. Those are shared by messages in the same message set. So, if you send more messages in a single produce request, you should see overhead per message closer to 10 bytes. Thanks, Jun On Wed, Oct 26, 2011 at 12:12 AM, Bao Thai Ngo baothai...@gmail.com wrote: Hi, We have just made inspection on the message format and found some information unclear to us. Before the investigation, I did know that each Kafka message has an overhead of only 10 bytes. Below is what we have done with Kafka 0.7 RC2 on a CentOS 5.6 server: - start kafka server on port - tcpdump -i lo -w test.pcap port and host localhost -v - start kafka producer - have kafka producer send some sort of message (hello, aaa) to the server We found 2 important items: 1. Before sending an actual message, kafka producer do send a (control) message of 4 bytes to the server. Kafka producer always does this action before send some message to the server. 2. Sending a message of 15 bytes (10 bytes for overhead + 5 bytes for the message payload) from the producer to the server gives us an IP packet of 83 bytes (IP header: 20, TCP header: 32, Data: 31 bytes). The data of the IP packet is 31 instead of 15 bytes Could you please give us a detailed explanation about these 2 items? Please also see the attachment for further information. Thanks, ~Thai
[jira] [Commented] (KAFKA-170) Support for non-blocking polling on multiple streams
[ https://issues.apache.org/jira/browse/KAFKA-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136013#comment-13136013 ] Jun Rao commented on KAFKA-170: --- 1. KafkaMessageStream currently does have a timeout controlled by consumer.timeout.ms. If the next call on the iterator doesn't get any message within that time, an exception is thrown during the next call. This is probably not what you want since the iterator may never timeout if one topic has new messages constantly. 2. Do we really need to return the triple? Do users care about the topic/partition that a messages comes from or do they just want to simply consumer messages from all topics? If they don't, we can probably implement a special fetcher that put messages from different topics into a shared in-memory queue for the end user to iterate. The interface for KafkaMessageStream may not need to be changed. Taylor, Could you elaborate your approach a bit more here? Support for non-blocking polling on multiple streams Key: KAFKA-170 URL: https://issues.apache.org/jira/browse/KAFKA-170 Project: Kafka Issue Type: New Feature Components: core Affects Versions: 0.8 Reporter: Jay Kreps Currently we provide a blocking iterator in the consumer. This is a good mechanism for consuming data from a single topic, but is limited as a mechanism for polling multiple streams. For example if one wants to implement a non-blocking union across multiple streams this is hard to do because calls may block indefinitely. A similar situation arrises if trying to implement a streaming join of between two streams. I would propose two changes: 1. Implement a next(timeout) interface on KafkaMessageStream. This will easily handle some simple cases with minimal change. This handles certain limited cases nicely and is easy to implement, but doesn't actually cover the two cases above. 2. Add an interface to poll streams. I don't know the best approach for the later api, but it is important to get it right. One option would be to add a ConsumerConnector.drainTopics(topic1, topic2, ...) which blocks until there is at least one message and then returns a list of triples (topic, partition, message). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: kafka message format
Jun is correct about the request size. One thing you point out that is a concern is that the send of the request size is actually turning into a seperate packet (because we write it separately and have enabled tcpnodelay. I have implemented a patch which turns this into a single write using vector i/o (and actually shortens code too). So far I don't see any perf improvement from this. Will post the results I see so far. Let's move the discussion for a JIRA: https://issues.apache.org/jira/browse/KAFKA-171 On Wed, Oct 26, 2011 at 12:12 AM, Bao Thai Ngo baothai...@gmail.com wrote: Hi, We have just made inspection on the message format and found some information unclear to us. Before the investigation, I did know that each Kafka message has an overhead of only 10 bytes. Below is what we have done with Kafka 0.7 RC2 on a CentOS 5.6 server: - start kafka server on port - tcpdump -i lo -w test.pcap port and host localhost -v - start kafka producer - have kafka producer send some sort of message (hello, aaa) to the server We found 2 important items: 1. Before sending an actual message, kafka producer do send a (control) message of 4 bytes to the server. Kafka producer always does this action before send some message to the server. 2. Sending a message of 15 bytes (10 bytes for overhead + 5 bytes for the message payload) from the producer to the server gives us an IP packet of 83 bytes (IP header: 20, TCP header: 32, Data: 31 bytes). The data of the IP packet is 31 instead of 15 bytes Could you please give us a detailed explanation about these 2 items? Please also see the attachment for further information. Thanks, ~Thai
[jira] [Created] (KAFKA-171) Kafka producer should do a single write to send message sets
Kafka producer should do a single write to send message sets Key: KAFKA-171 URL: https://issues.apache.org/jira/browse/KAFKA-171 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.7, 0.8 Reporter: Jay Kreps Assignee: Jay Kreps Fix For: 0.8 From email thread: http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e Before sending an actual message, kafka producer do send a (control) message of 4 bytes to the server. Kafka producer always does this action before send some message to the server. I think this is because in BoundedByteBufferSend.scala we do essentially channel.write(sizeBuffer) channel.write(dataBuffer) The correct solution is to use vector I/O and instead do channel.write(Array(sizeBuffer, dataBuffer)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (KAFKA-172) The existing perf tools are buggy
The existing perf tools are buggy - Key: KAFKA-172 URL: https://issues.apache.org/jira/browse/KAFKA-172 Project: Kafka Issue Type: Bug Reporter: Neha Narkhede Assignee: Neha Narkhede Fix For: 0.8 The existing perf tools - ProducerPerformance.scala, ConsumerPerformance.scala and SimpleConsumerPerformance.scala are buggy. It will be good to - 1. move them to a perf directory, along with helper scripts 2. fix the bugs, so that they measure throughput correctly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-171) Kafka producer should do a single write to send message sets
[ https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Kreps updated KAFKA-171: Attachment: KAFKA-171-draft.patch Kafka producer should do a single write to send message sets Key: KAFKA-171 URL: https://issues.apache.org/jira/browse/KAFKA-171 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.7, 0.8 Reporter: Jay Kreps Assignee: Jay Kreps Fix For: 0.8 Attachments: KAFKA-171-draft.patch From email thread: http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e Before sending an actual message, kafka producer do send a (control) message of 4 bytes to the server. Kafka producer always does this action before send some message to the server. I think this is because in BoundedByteBufferSend.scala we do essentially channel.write(sizeBuffer) channel.write(dataBuffer) The correct solution is to use vector I/O and instead do channel.write(Array(sizeBuffer, dataBuffer)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets
[ https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136193#comment-13136193 ] Jay Kreps commented on KAFKA-171: - Attached is a draft patch which turns the request into a single write. This is just a draft if this actually improves performance we should change Receive to use ScatteringByteChannel for consistency and also clean up a few more files with the same trick. On my mac laptop I do see a change in tcpdump which seems to eliminate the 4 byte send. However I don't see any positive result in performance for synchronous single-threaded sends of 10 byte messages (which should be the worst case for this). I think this may just be because I am testing over localhost. Here are the details on the results I have: TRUNK: jkreps-mn:kafka-git jkreps$ sudo tcpdump -i lo0 port 9093 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lo0, link-type NULL (BSD loopback), capture size 96 bytes 10:32:30.128938 IP jkreps-mn.linkedin.biz.56953 jkreps-mn.linkedin.biz.9093: S 323648854:323648854(0) win 65535 mss 16344,nop,wscale 3,nop,nop,timestamp 377871870 0,sackOK,eol 10:32:30.129004 IP jkreps-mn.linkedin.biz.9093 jkreps-mn.linkedin.biz.56953: S 526915069:526915069(0) ack 323648855 win 65535 mss 16344,nop,wscale 3,nop,nop,timestamp 377871870 377871870,sackOK,eol 10:32:30.129013 IP jkreps-mn.linkedin.biz.56953 jkreps-mn.linkedin.biz.9093: . ack 1 win 65535 nop,nop,timestamp 377871870 377871870 10:32:30.129022 IP jkreps-mn.linkedin.biz.9093 jkreps-mn.linkedin.biz.56953: . ack 1 win 65535 nop,nop,timestamp 377871870 377871870 10:32:30.129306 IP jkreps-mn.linkedin.biz.56953 jkreps-mn.linkedin.biz.9093: P 1:5(4) ack 1 win 65535 nop,nop,timestamp 377871870 377871870 10:32:30.129319 IP jkreps-mn.linkedin.biz.9093 jkreps-mn.linkedin.biz.56953: . ack 5 win 65535 nop,nop,timestamp 377871870 377871870 10:32:30.129339 IP jkreps-mn.linkedin.biz.56953 jkreps-mn.linkedin.biz.9093: P 5:41(36) ack 1 win 65535 nop,nop,timestamp 377871870 377871870 10:32:30.129350 IP jkreps-mn.linkedin.biz.9093 jkreps-mn.linkedin.biz.56953: . ack 41 win 65535 nop,nop,timestamp 377871870 377871870 10:32:30.151892 IP jkreps-mn.linkedin.biz.56953 jkreps-mn.linkedin.biz.9093: F 41:41(0) ack 1 win 65535 nop,nop,timestamp 377871870 377871870 10:32:30.151938 IP jkreps-mn.linkedin.biz.9093 jkreps-mn.linkedin.biz.56953: . ack 42 win 65535 nop,nop,timestamp 377871870 377871870 10:32:30.151946 IP jkreps-mn.linkedin.biz.56953 jkreps-mn.linkedin.biz.9093: . ack 1 win 65535 nop,nop,timestamp 377871870 377871870 10:32:30.152554 IP jkreps-mn.linkedin.biz.9093 jkreps-mn.linkedin.biz.56953: F 1:1(0) ack 42 win 65535 nop,nop,timestamp 377871870 377871870 10:32:30.152571 IP jkreps-mn.linkedin.biz.56953 jkreps-mn.linkedin.biz.9093: . ack 2 win 65535 nop,nop,timestamp 377871870 377871870 PATCHED: jkreps-mn:kafka-git jkreps$ sudo tcpdump -i lo0 port 9093 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lo0, link-type NULL (BSD loopback), capture size 96 bytes 10:35:40.637220 IP jkreps-mn.linkedin.biz.56993 jkreps-mn.linkedin.biz.9093: S 1456363353:1456363353(0) win 65535 mss 16344,nop,wscale 3,nop,nop,timestamp 377873772 0,sackOK,eol 10:35:40.637287 IP jkreps-mn.linkedin.biz.9093 jkreps-mn.linkedin.biz.56993: S 1260172914:1260172914(0) ack 1456363354 win 65535 mss 16344,nop,wscale 3,nop,nop,timestamp 377873772 377873772,sackOK,eol 10:35:40.637296 IP jkreps-mn.linkedin.biz.56993 jkreps-mn.linkedin.biz.9093: . ack 1 win 65535 nop,nop,timestamp 377873772 377873772 10:35:40.637306 IP jkreps-mn.linkedin.biz.9093 jkreps-mn.linkedin.biz.56993: . ack 1 win 65535 nop,nop,timestamp 377873772 377873772 10:35:40.657848 IP jkreps-mn.linkedin.biz.56993 jkreps-mn.linkedin.biz.9093: P 1:41(40) ack 1 win 65535 nop,nop,timestamp 377873773 377873772 10:35:40.657886 IP jkreps-mn.linkedin.biz.9093 jkreps-mn.linkedin.biz.56993: . ack 41 win 65535 nop,nop,timestamp 377873773 377873773 10:35:40.711399 IP jkreps-mn.linkedin.biz.56993 jkreps-mn.linkedin.biz.9093: F 41:41(0) ack 1 win 65535 nop,nop,timestamp 377873773 377873773 10:35:40.711430 IP jkreps-mn.linkedin.biz.9093 jkreps-mn.linkedin.biz.56993: . ack 42 win 65535 nop,nop,timestamp 377873773 377873773 10:35:40.711437 IP jkreps-mn.linkedin.biz.56993 jkreps-mn.linkedin.biz.9093: . ack 1 win 65535 nop,nop,timestamp 377873773 377873773 10:35:40.762640 IP jkreps-mn.linkedin.biz.9093 jkreps-mn.linkedin.biz.56993: F 1:1(0) ack 42 win 65535 nop,nop,timestamp 377873774 377873773 10:35:40.762678 IP jkreps-mn.linkedin.biz.56993 jkreps-mn.linkedin.biz.9093: . ack 2 win 65535 nop,nop,timestamp 377873774 377873774 TRUNK: bin/kafka-producer-perf-test.sh --topic test --brokerinfo zk.connect=localhost:2181 --messages 30 --message-size 10 --batch-size 1 --threads 1 ... [2011-10-26
[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets
[ https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136201#comment-13136201 ] Neha Narkhede commented on KAFKA-171: - This is a good change to make. A couple of comments - 1. Since we are changing WritableByteChannel to GatheringByteChannel, it is better to change the return type of writeTo and writeCompletely to return long, instead of int. This will avoid the coercion to Int in BoundedByteBufferSend.scala. 2. There are a couple of other places, where we do these double writes, e.g. OffsetArraySend, MessageSetSend etc. We might as well fix those ? Kafka producer should do a single write to send message sets Key: KAFKA-171 URL: https://issues.apache.org/jira/browse/KAFKA-171 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.7, 0.8 Reporter: Jay Kreps Assignee: Jay Kreps Fix For: 0.8 Attachments: KAFKA-171-draft.patch From email thread: http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e Before sending an actual message, kafka producer do send a (control) message of 4 bytes to the server. Kafka producer always does this action before send some message to the server. I think this is because in BoundedByteBufferSend.scala we do essentially channel.write(sizeBuffer) channel.write(dataBuffer) The correct solution is to use vector I/O and instead do channel.write(Array(sizeBuffer, dataBuffer)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets
[ https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136223#comment-13136223 ] Jay Kreps commented on KAFKA-171: - Moving off localhost between my mac laptop and dev workstation (linux) I see similar results: TRUNK: jkreps-mn:kafka-git jkreps$ bin/kafka-producer-perf-test.sh --topic test --brokerinfo zk.connect=jkreps-ld:2181 --messages 50 --message-size 10 --batch-size 1 --threads 1 [2011-10-26 11:59:51,795] INFO Total Num Messages: 50 bytes: 500 in 13.046 secs (kafka.tools.ProducerPerformance$) [2011-10-26 11:59:51,795] INFO Messages/sec: 38325.9237 (kafka.tools.ProducerPerformance$) [2011-10-26 11:59:51,795] INFO MB/sec: 0.3655 (kafka.tools.ProducerPerformance$) PATCHED: jkreps-mn:kafka-git jkreps$ bin/kafka-producer-perf-test.sh --topic test --brokerinfo zk.connect=jkreps-ld:2181 --messages 50 --message-size 10 --batch-size 1 --threads 1 [2011-10-26 11:58:42,335] INFO Total Num Messages: 50 bytes: 500 in 13.125 secs (kafka.tools.ProducerPerformance$) [2011-10-26 11:58:42,335] INFO Messages/sec: 38095.2381 (kafka.tools.ProducerPerformance$) [2011-10-26 11:58:42,335] INFO MB/sec: 0.3633 (kafka.tools.ProducerPerformance$) Kafka producer should do a single write to send message sets Key: KAFKA-171 URL: https://issues.apache.org/jira/browse/KAFKA-171 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.7, 0.8 Reporter: Jay Kreps Assignee: Jay Kreps Fix For: 0.8 Attachments: KAFKA-171-draft.patch From email thread: http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e Before sending an actual message, kafka producer do send a (control) message of 4 bytes to the server. Kafka producer always does this action before send some message to the server. I think this is because in BoundedByteBufferSend.scala we do essentially channel.write(sizeBuffer) channel.write(dataBuffer) The correct solution is to use vector I/O and instead do channel.write(Array(sizeBuffer, dataBuffer)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (KAFKA-173) Support encoding for non ascii characters
Support encoding for non ascii characters - Key: KAFKA-173 URL: https://issues.apache.org/jira/browse/KAFKA-173 Project: Kafka Issue Type: Bug Components: clients Reporter: Alejandro Priority: Minor See attached patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-173) Support encoding for non ascii characters
[ https://issues.apache.org/jira/browse/KAFKA-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro updated KAFKA-173: Attachment: 0001-Several-fixes-for-Ruby-client.patch Support encoding for non ascii characters - Key: KAFKA-173 URL: https://issues.apache.org/jira/browse/KAFKA-173 Project: Kafka Issue Type: Bug Components: clients Reporter: Alejandro Priority: Minor Attachments: 0001-Several-fixes-for-Ruby-client.patch See attached patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (KAFKA-174) Add performance suite for Kafka
Add performance suite for Kafka --- Key: KAFKA-174 URL: https://issues.apache.org/jira/browse/KAFKA-174 Project: Kafka Issue Type: New Feature Reporter: Neha Narkhede Fix For: 0.8 This is a placeholder JIRA for adding a perf suite to Kafka. The high level proposal is here - https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing There will be more JIRAs covering smaller tasks to fully implement this. They will be linked to this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-91) zkclient does not show up in pom
[ https://issues.apache.org/jira/browse/KAFKA-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Burroughs updated KAFKA-91: - Attachment: k91-v2.txt Sorry missed your reply, rebased as of 76957e5e3a748b59525e5e7934f93721eb8f4c38 zkclient does not show up in pom Key: KAFKA-91 URL: https://issues.apache.org/jira/browse/KAFKA-91 Project: Kafka Issue Type: Bug Components: packaging Reporter: Chris Burroughs Assignee: Chris Burroughs Priority: Minor Fix For: 0.8 Attachments: k91-v1.txt, k91-v2.txt The pom from created by `make-pom`. Does not include zkclient, which is of course a key dependency. Not sure yet how to pull in zkclient while excluding sbt itself. $ cat core/target/scala_2.8.0/kafka-0.7.pom | grep -i zkclient | wc -l 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-170) Support for non-blocking polling on multiple streams
[ https://issues.apache.org/jira/browse/KAFKA-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136630#comment-13136630 ] Chris Burroughs commented on KAFKA-170: --- Do users care about the topic/partition I think yes. If we allow users to provide an arbitrary partitioner, the partitioning may be meaningful through their pipeline. Support for non-blocking polling on multiple streams Key: KAFKA-170 URL: https://issues.apache.org/jira/browse/KAFKA-170 Project: Kafka Issue Type: New Feature Components: core Affects Versions: 0.8 Reporter: Jay Kreps Currently we provide a blocking iterator in the consumer. This is a good mechanism for consuming data from a single topic, but is limited as a mechanism for polling multiple streams. For example if one wants to implement a non-blocking union across multiple streams this is hard to do because calls may block indefinitely. A similar situation arrises if trying to implement a streaming join of between two streams. I would propose two changes: 1. Implement a next(timeout) interface on KafkaMessageStream. This will easily handle some simple cases with minimal change. This handles certain limited cases nicely and is easy to implement, but doesn't actually cover the two cases above. 2. Add an interface to poll streams. I don't know the best approach for the later api, but it is important to get it right. One option would be to add a ConsumerConnector.drainTopics(topic1, topic2, ...) which blocks until there is at least one message and then returns a list of triples (topic, partition, message). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (KAFKA-176) Fix existing perf tools
Fix existing perf tools --- Key: KAFKA-176 URL: https://issues.apache.org/jira/browse/KAFKA-176 Project: Kafka Issue Type: Sub-task Reporter: Neha Narkhede Assignee: Neha Narkhede Fix For: 0.8 The existing perf tools - ProducerPerformance.scala, ConsumerPerformance.scala and SimpleConsumerPerformance.scala are slightly buggy. It will be good to - 1. move them to a perf directory from the existing kafka/tools location 2. fix the bugs, so that they measure throughput correctly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-176) Fix existing perf tools
[ https://issues.apache.org/jira/browse/KAFKA-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-176: Status: Patch Available (was: Open) Fix existing perf tools --- Key: KAFKA-176 URL: https://issues.apache.org/jira/browse/KAFKA-176 Project: Kafka Issue Type: Sub-task Reporter: Neha Narkhede Assignee: Neha Narkhede Fix For: 0.8 Attachments: kafka-176.patch The existing perf tools - ProducerPerformance.scala, ConsumerPerformance.scala and SimpleConsumerPerformance.scala are slightly buggy. It will be good to - 1. move them to a perf directory from the existing kafka/tools location 2. fix the bugs, so that they measure throughput correctly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-176) Fix existing perf tools
[ https://issues.apache.org/jira/browse/KAFKA-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-176: Attachment: kafka-176.patch This patch fixes a bunch of problems in the perf tools. I'll try to explain the changes at a high level, since I don't remember the exact list of little bugs that existed - 1. Refactoring ProducerPerformance to include just one code path for both sync and async producer 2. Fix bugs in ProducerPerformance to fix the throughput calculation and the logic for having multiple threads 3. Fix various bugs in the consumer performance tools 4. Fix the output of the tools to include useful info in the csv format 5. Move all of the above to a perf sub project Fix existing perf tools --- Key: KAFKA-176 URL: https://issues.apache.org/jira/browse/KAFKA-176 Project: Kafka Issue Type: Sub-task Reporter: Neha Narkhede Assignee: Neha Narkhede Fix For: 0.8 Attachments: kafka-176.patch The existing perf tools - ProducerPerformance.scala, ConsumerPerformance.scala and SimpleConsumerPerformance.scala are slightly buggy. It will be good to - 1. move them to a perf directory from the existing kafka/tools location 2. fix the bugs, so that they measure throughput correctly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: kafka message format
Jun, Jay: Thanks for the info and the patch. ~Thai On Thu, Oct 27, 2011 at 12:54 AM, Jay Kreps jay.kr...@gmail.com wrote: Jun is correct about the request size. One thing you point out that is a concern is that the send of the request size is actually turning into a seperate packet (because we write it separately and have enabled tcpnodelay. I have implemented a patch which turns this into a single write using vector i/o (and actually shortens code too). So far I don't see any perf improvement from this. Will post the results I see so far. Let's move the discussion for a JIRA: https://issues.apache.org/jira/browse/KAFKA-171 On Wed, Oct 26, 2011 at 12:12 AM, Bao Thai Ngo baothai...@gmail.comwrote: Hi, We have just made inspection on the message format and found some information unclear to us. Before the investigation, I did know that each Kafka message has an overhead of only 10 bytes. Below is what we have done with Kafka 0.7 RC2 on a CentOS 5.6 server: - start kafka server on port - tcpdump -i lo -w test.pcap port and host localhost -v - start kafka producer - have kafka producer send some sort of message (hello, aaa) to the server We found 2 important items: 1. Before sending an actual message, kafka producer do send a (control) message of 4 bytes to the server. Kafka producer always does this action before send some message to the server. 2. Sending a message of 15 bytes (10 bytes for overhead + 5 bytes for the message payload) from the producer to the server gives us an IP packet of 83 bytes (IP header: 20, TCP header: 32, Data: 31 bytes). The data of the IP packet is 31 instead of 15 bytes Could you please give us a detailed explanation about these 2 items? Please also see the attachment for further information. Thanks, ~Thai
[jira] [Commented] (KAFKA-176) Fix existing perf tools
[ https://issues.apache.org/jira/browse/KAFKA-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136761#comment-13136761 ] Jun Rao commented on KAFKA-176: --- ProducerPerformance: 1. The following code seems to be used to enable/disable the header. Is it better to control that in config instead using debug logging (so that it's not mixed with other debug logging)? Also, the header info is not complete, missing the first few fields. The header is probably useful for stats printed out periodically in each thread. So it should be printed out early, if enabled. if(logger.isDebugEnabled) logger.debug(message size, batch size, total data sent in MB, MB/sec, total data sent in nMsg, nMsg/sec) 2. Whether avgPerf is specified or not, the user is probably always interested in the aggregated numbers across all threads. How about we always print it out and have a config option showDetails to enable/disable periodic reporting in each thread. Ditto in other perf tools. 3. ProducerThread has multiple bugs: 3.1. Variable-sized messages are not picked up in Async mode 3.2. In Sync mode, messageSet needs to be reset for each batch, if messages are of variable size (seems to be an existing bug) 4. It's better not to duplicate the following code. Defining it once in a static method seems better. println((%s, %d, %d, %d, %d, %.2f, %.4f, %d, %.4f).format(formattedReportTime, config.compressionCodec.codec, threadId, config.messageSize, config.batchSize, (bytesSent*1.0)/(1024 * 1024), mbPerSec, nSends, numMessagesPerSec)) Fix existing perf tools --- Key: KAFKA-176 URL: https://issues.apache.org/jira/browse/KAFKA-176 Project: Kafka Issue Type: Sub-task Reporter: Neha Narkhede Assignee: Neha Narkhede Fix For: 0.8 Attachments: kafka-176.patch The existing perf tools - ProducerPerformance.scala, ConsumerPerformance.scala and SimpleConsumerPerformance.scala are slightly buggy. It will be good to - 1. move them to a perf directory from the existing kafka/tools location 2. fix the bugs, so that they measure throughput correctly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira