[jira] [Created] (KAFKA-170) Support for non-blocking polling on multiple streams

2011-10-26 Thread Jay Kreps (Created) (JIRA)
Support for non-blocking polling on multiple streams


 Key: KAFKA-170
 URL: https://issues.apache.org/jira/browse/KAFKA-170
 Project: Kafka
  Issue Type: New Feature
  Components: core
Affects Versions: 0.8
Reporter: Jay Kreps


Currently we provide a blocking iterator in the consumer. This is a good 
mechanism for consuming data from a single topic, but is limited as a mechanism 
for polling multiple streams.

For example if one wants to implement a non-blocking union across multiple 
streams this is hard to do because calls may block indefinitely. A similar 
situation arrises if trying to implement a streaming join of between two 
streams.

I would propose two changes:
1. Implement a next(timeout) interface on KafkaMessageStream. This will easily 
handle some simple cases with minimal change. This handles certain limited 
cases nicely and is easy to implement, but doesn't actually cover the two cases 
above.
2. Add an interface to poll streams.

I don't know the best approach for the later api, but it is important to get it 
right. One option would be to add a ConsumerConnector.drainTopics(topic1, 
topic2, ...) which blocks until there is at least one message and then 
returns a list of triples (topic, partition, message).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (KAFKA-170) Support for non-blocking polling on multiple streams

2011-10-26 Thread Taylor Gautier (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135764#comment-13135764
 ] 

Taylor Gautier commented on KAFKA-170:
--

Hi - I solved this problem in the NodeJS client here: 
https://github.com/tagged/node-kafka

You may or may not like the approach - but at least you can see which solution 
I went with.  Node of course is an event based system so it's more natural to 
use callbacks which may not necessarily be appropriate for Java.

 Support for non-blocking polling on multiple streams
 

 Key: KAFKA-170
 URL: https://issues.apache.org/jira/browse/KAFKA-170
 Project: Kafka
  Issue Type: New Feature
  Components: core
Affects Versions: 0.8
Reporter: Jay Kreps

 Currently we provide a blocking iterator in the consumer. This is a good 
 mechanism for consuming data from a single topic, but is limited as a 
 mechanism for polling multiple streams.
 For example if one wants to implement a non-blocking union across multiple 
 streams this is hard to do because calls may block indefinitely. A similar 
 situation arrises if trying to implement a streaming join of between two 
 streams.
 I would propose two changes:
 1. Implement a next(timeout) interface on KafkaMessageStream. This will 
 easily handle some simple cases with minimal change. This handles certain 
 limited cases nicely and is easy to implement, but doesn't actually cover the 
 two cases above.
 2. Add an interface to poll streams.
 I don't know the best approach for the later api, but it is important to get 
 it right. One option would be to add a 
 ConsumerConnector.drainTopics(topic1, topic2, ...) which blocks until 
 there is at least one message and then returns a list of triples (topic, 
 partition, message).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: kafka message format

2011-10-26 Thread Jun Rao
Yes, there is a 10 byte overhead per message. Here is the exact message
format.

message length  : 4 bytes (value: 1+1+4+n)
magic value   : 1 byte
compression attr: 1 byte
crc : 4 bytes
payload : n bytes


The data sent over the wire also include topic name, partition name, total
length of the request, in addition to the message. Those are shared by
messages in the same message set. So, if you send more messages in a single
produce request, you should see overhead per message closer to 10 bytes.

Thanks,

Jun

On Wed, Oct 26, 2011 at 12:12 AM, Bao Thai Ngo baothai...@gmail.com wrote:

 Hi,

 We have just made inspection on the message format and found some
 information unclear to us. Before the investigation, I did know that each
 Kafka message has an overhead of only 10 bytes.

 Below is what we have done with Kafka 0.7 RC2 on a CentOS 5.6 server:
 - start kafka server on port 
 - tcpdump -i lo -w test.pcap port  and host localhost -v
 - start kafka producer
 - have kafka producer send some sort of message (hello, aaa) to the server

 We found 2 important items:
 1. Before sending an actual message, kafka producer do send a (control)
 message of 4 bytes to the server. Kafka producer always does this action
 before send some message to the server.
 2. Sending a message of 15 bytes (10 bytes for overhead + 5 bytes for the
 message payload) from the producer to the server gives us an IP packet of 83
 bytes (IP header: 20, TCP header: 32, Data: 31 bytes). The data of the IP
 packet is 31 instead of 15 bytes

 Could you please give us a detailed explanation about these 2 items? Please
 also see the attachment for further information.

 Thanks,
 ~Thai







[jira] [Commented] (KAFKA-170) Support for non-blocking polling on multiple streams

2011-10-26 Thread Jun Rao (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136013#comment-13136013
 ] 

Jun Rao commented on KAFKA-170:
---

1. KafkaMessageStream currently does have a timeout controlled by 
consumer.timeout.ms. If the next call on the iterator doesn't get any message 
within that time, an exception is thrown during the next call. This is probably 
not what you want since the iterator may never timeout if one topic has new 
messages constantly.

2. Do we really need to return the triple? Do users care about the 
topic/partition that a messages comes from or do they just want to simply 
consumer messages from all topics? If they don't, we can probably implement a 
special fetcher that put messages from different topics into a shared in-memory 
queue for the end user to iterate. The interface for KafkaMessageStream may not 
need to be changed.

Taylor, 

Could you elaborate your approach a bit more here?

 Support for non-blocking polling on multiple streams
 

 Key: KAFKA-170
 URL: https://issues.apache.org/jira/browse/KAFKA-170
 Project: Kafka
  Issue Type: New Feature
  Components: core
Affects Versions: 0.8
Reporter: Jay Kreps

 Currently we provide a blocking iterator in the consumer. This is a good 
 mechanism for consuming data from a single topic, but is limited as a 
 mechanism for polling multiple streams.
 For example if one wants to implement a non-blocking union across multiple 
 streams this is hard to do because calls may block indefinitely. A similar 
 situation arrises if trying to implement a streaming join of between two 
 streams.
 I would propose two changes:
 1. Implement a next(timeout) interface on KafkaMessageStream. This will 
 easily handle some simple cases with minimal change. This handles certain 
 limited cases nicely and is easy to implement, but doesn't actually cover the 
 two cases above.
 2. Add an interface to poll streams.
 I don't know the best approach for the later api, but it is important to get 
 it right. One option would be to add a 
 ConsumerConnector.drainTopics(topic1, topic2, ...) which blocks until 
 there is at least one message and then returns a list of triples (topic, 
 partition, message).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: kafka message format

2011-10-26 Thread Jay Kreps
Jun is correct about the request size. One thing you point out that is a
concern is that the send of the request size is actually turning into a
seperate packet (because we write it separately and have enabled tcpnodelay.

I have implemented a patch which turns this into a single write using vector
i/o (and actually shortens code too).

So far I don't see any perf improvement from this. Will post the results I
see so far. Let's move the discussion for a JIRA:
https://issues.apache.org/jira/browse/KAFKA-171

On Wed, Oct 26, 2011 at 12:12 AM, Bao Thai Ngo baothai...@gmail.com wrote:

 Hi,

 We have just made inspection on the message format and found some
 information unclear to us. Before the investigation, I did know that each
 Kafka message has an overhead of only 10 bytes.

 Below is what we have done with Kafka 0.7 RC2 on a CentOS 5.6 server:
 - start kafka server on port 
 - tcpdump -i lo -w test.pcap port  and host localhost -v
 - start kafka producer
 - have kafka producer send some sort of message (hello, aaa) to the server

 We found 2 important items:
 1. Before sending an actual message, kafka producer do send a (control)
 message of 4 bytes to the server. Kafka producer always does this action
 before send some message to the server.
 2. Sending a message of 15 bytes (10 bytes for overhead + 5 bytes for the
 message payload) from the producer to the server gives us an IP packet of 83
 bytes (IP header: 20, TCP header: 32, Data: 31 bytes). The data of the IP
 packet is 31 instead of 15 bytes

 Could you please give us a detailed explanation about these 2 items? Please
 also see the attachment for further information.

 Thanks,
 ~Thai







[jira] [Created] (KAFKA-171) Kafka producer should do a single write to send message sets

2011-10-26 Thread Jay Kreps (Created) (JIRA)
Kafka producer should do a single write to send message sets


 Key: KAFKA-171
 URL: https://issues.apache.org/jira/browse/KAFKA-171
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.7, 0.8
Reporter: Jay Kreps
Assignee: Jay Kreps
 Fix For: 0.8


From email thread: 
http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e
 Before sending an actual message, kafka producer do send a (control) message 
 of 4 bytes to the server. Kafka producer always does this action before send 
 some message to the server.

I think this is because in BoundedByteBufferSend.scala we do essentially
 channel.write(sizeBuffer)
 channel.write(dataBuffer)

The correct solution is to use vector I/O and instead do
 channel.write(Array(sizeBuffer, dataBuffer))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (KAFKA-172) The existing perf tools are buggy

2011-10-26 Thread Neha Narkhede (Created) (JIRA)
The existing perf tools are buggy
-

 Key: KAFKA-172
 URL: https://issues.apache.org/jira/browse/KAFKA-172
 Project: Kafka
  Issue Type: Bug
Reporter: Neha Narkhede
Assignee: Neha Narkhede
 Fix For: 0.8


The existing perf tools - ProducerPerformance.scala, ConsumerPerformance.scala 
and SimpleConsumerPerformance.scala are buggy. It will be good to -

1. move them to a perf directory, along with helper scripts
2. fix the bugs, so that they measure throughput correctly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (KAFKA-171) Kafka producer should do a single write to send message sets

2011-10-26 Thread Jay Kreps (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-171:


Attachment: KAFKA-171-draft.patch

 Kafka producer should do a single write to send message sets
 

 Key: KAFKA-171
 URL: https://issues.apache.org/jira/browse/KAFKA-171
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.7, 0.8
Reporter: Jay Kreps
Assignee: Jay Kreps
 Fix For: 0.8

 Attachments: KAFKA-171-draft.patch


 From email thread: 
 http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e
  Before sending an actual message, kafka producer do send a (control) 
  message of 4 bytes to the server. Kafka producer always does this action 
  before send some message to the server.
 I think this is because in BoundedByteBufferSend.scala we do essentially
  channel.write(sizeBuffer)
  channel.write(dataBuffer)
 The correct solution is to use vector I/O and instead do
  channel.write(Array(sizeBuffer, dataBuffer))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets

2011-10-26 Thread Jay Kreps (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136193#comment-13136193
 ] 

Jay Kreps commented on KAFKA-171:
-

Attached is a draft patch which turns the request into a single write. This is 
just a draft if this actually improves performance we should change Receive to 
use ScatteringByteChannel for consistency and also clean up a few more files 
with the same trick.

On my mac laptop I do see a change in tcpdump which seems to eliminate the 4 
byte send. However I don't see any positive result in performance for 
synchronous single-threaded sends of 10 byte messages (which should be the 
worst case for this). I think this may just be because I am testing over 
localhost.

Here are the details on the results I have:

TRUNK:
jkreps-mn:kafka-git jkreps$ sudo tcpdump -i lo0 port 9093 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo0, link-type NULL (BSD loopback), capture size 96 bytes
10:32:30.128938 IP jkreps-mn.linkedin.biz.56953  jkreps-mn.linkedin.biz.9093: 
S 323648854:323648854(0) win 65535 mss 16344,nop,wscale 3,nop,nop,timestamp 
377871870 0,sackOK,eol
10:32:30.129004 IP jkreps-mn.linkedin.biz.9093  jkreps-mn.linkedin.biz.56953: 
S 526915069:526915069(0) ack 323648855 win 65535 mss 16344,nop,wscale 
3,nop,nop,timestamp 377871870 377871870,sackOK,eol
10:32:30.129013 IP jkreps-mn.linkedin.biz.56953  jkreps-mn.linkedin.biz.9093: 
. ack 1 win 65535 nop,nop,timestamp 377871870 377871870
10:32:30.129022 IP jkreps-mn.linkedin.biz.9093  jkreps-mn.linkedin.biz.56953: 
. ack 1 win 65535 nop,nop,timestamp 377871870 377871870
10:32:30.129306 IP jkreps-mn.linkedin.biz.56953  jkreps-mn.linkedin.biz.9093: 
P 1:5(4) ack 1 win 65535 nop,nop,timestamp 377871870 377871870
10:32:30.129319 IP jkreps-mn.linkedin.biz.9093  jkreps-mn.linkedin.biz.56953: 
. ack 5 win 65535 nop,nop,timestamp 377871870 377871870
10:32:30.129339 IP jkreps-mn.linkedin.biz.56953  jkreps-mn.linkedin.biz.9093: 
P 5:41(36) ack 1 win 65535 nop,nop,timestamp 377871870 377871870
10:32:30.129350 IP jkreps-mn.linkedin.biz.9093  jkreps-mn.linkedin.biz.56953: 
. ack 41 win 65535 nop,nop,timestamp 377871870 377871870
10:32:30.151892 IP jkreps-mn.linkedin.biz.56953  jkreps-mn.linkedin.biz.9093: 
F 41:41(0) ack 1 win 65535 nop,nop,timestamp 377871870 377871870
10:32:30.151938 IP jkreps-mn.linkedin.biz.9093  jkreps-mn.linkedin.biz.56953: 
. ack 42 win 65535 nop,nop,timestamp 377871870 377871870
10:32:30.151946 IP jkreps-mn.linkedin.biz.56953  jkreps-mn.linkedin.biz.9093: 
. ack 1 win 65535 nop,nop,timestamp 377871870 377871870
10:32:30.152554 IP jkreps-mn.linkedin.biz.9093  jkreps-mn.linkedin.biz.56953: 
F 1:1(0) ack 42 win 65535 nop,nop,timestamp 377871870 377871870
10:32:30.152571 IP jkreps-mn.linkedin.biz.56953  jkreps-mn.linkedin.biz.9093: 
. ack 2 win 65535 nop,nop,timestamp 377871870 377871870

PATCHED:
jkreps-mn:kafka-git jkreps$ sudo tcpdump -i lo0 port 9093 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo0, link-type NULL (BSD loopback), capture size 96 bytes
10:35:40.637220 IP jkreps-mn.linkedin.biz.56993  jkreps-mn.linkedin.biz.9093: 
S 1456363353:1456363353(0) win 65535 mss 16344,nop,wscale 3,nop,nop,timestamp 
377873772 0,sackOK,eol
10:35:40.637287 IP jkreps-mn.linkedin.biz.9093  jkreps-mn.linkedin.biz.56993: 
S 1260172914:1260172914(0) ack 1456363354 win 65535 mss 16344,nop,wscale 
3,nop,nop,timestamp 377873772 377873772,sackOK,eol
10:35:40.637296 IP jkreps-mn.linkedin.biz.56993  jkreps-mn.linkedin.biz.9093: 
. ack 1 win 65535 nop,nop,timestamp 377873772 377873772
10:35:40.637306 IP jkreps-mn.linkedin.biz.9093  jkreps-mn.linkedin.biz.56993: 
. ack 1 win 65535 nop,nop,timestamp 377873772 377873772
10:35:40.657848 IP jkreps-mn.linkedin.biz.56993  jkreps-mn.linkedin.biz.9093: 
P 1:41(40) ack 1 win 65535 nop,nop,timestamp 377873773 377873772
10:35:40.657886 IP jkreps-mn.linkedin.biz.9093  jkreps-mn.linkedin.biz.56993: 
. ack 41 win 65535 nop,nop,timestamp 377873773 377873773
10:35:40.711399 IP jkreps-mn.linkedin.biz.56993  jkreps-mn.linkedin.biz.9093: 
F 41:41(0) ack 1 win 65535 nop,nop,timestamp 377873773 377873773
10:35:40.711430 IP jkreps-mn.linkedin.biz.9093  jkreps-mn.linkedin.biz.56993: 
. ack 42 win 65535 nop,nop,timestamp 377873773 377873773
10:35:40.711437 IP jkreps-mn.linkedin.biz.56993  jkreps-mn.linkedin.biz.9093: 
. ack 1 win 65535 nop,nop,timestamp 377873773 377873773
10:35:40.762640 IP jkreps-mn.linkedin.biz.9093  jkreps-mn.linkedin.biz.56993: 
F 1:1(0) ack 42 win 65535 nop,nop,timestamp 377873774 377873773
10:35:40.762678 IP jkreps-mn.linkedin.biz.56993  jkreps-mn.linkedin.biz.9093: 
. ack 2 win 65535 nop,nop,timestamp 377873774 377873774

TRUNK:
bin/kafka-producer-perf-test.sh --topic test --brokerinfo 
zk.connect=localhost:2181 --messages 30 --message-size 10 --batch-size 1 
--threads 1
...
[2011-10-26 

[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets

2011-10-26 Thread Neha Narkhede (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136201#comment-13136201
 ] 

Neha Narkhede commented on KAFKA-171:
-

This is a good change to make. A couple of comments -

1. Since we are changing WritableByteChannel to GatheringByteChannel, it is 
better to change the return type of writeTo and writeCompletely to return long, 
instead of int. This will avoid the coercion to Int in 
BoundedByteBufferSend.scala.

2. There are a couple of other places, where we do these double writes, e.g. 
OffsetArraySend, MessageSetSend etc.  We might as well fix those ? 

 Kafka producer should do a single write to send message sets
 

 Key: KAFKA-171
 URL: https://issues.apache.org/jira/browse/KAFKA-171
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.7, 0.8
Reporter: Jay Kreps
Assignee: Jay Kreps
 Fix For: 0.8

 Attachments: KAFKA-171-draft.patch


 From email thread: 
 http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e
  Before sending an actual message, kafka producer do send a (control) 
  message of 4 bytes to the server. Kafka producer always does this action 
  before send some message to the server.
 I think this is because in BoundedByteBufferSend.scala we do essentially
  channel.write(sizeBuffer)
  channel.write(dataBuffer)
 The correct solution is to use vector I/O and instead do
  channel.write(Array(sizeBuffer, dataBuffer))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets

2011-10-26 Thread Jay Kreps (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136223#comment-13136223
 ] 

Jay Kreps commented on KAFKA-171:
-

Moving off localhost between my mac laptop and dev workstation (linux) I see 
similar results:

TRUNK:
jkreps-mn:kafka-git jkreps$ bin/kafka-producer-perf-test.sh --topic test 
--brokerinfo zk.connect=jkreps-ld:2181 --messages 50 --message-size 10 
--batch-size 1 --threads 1
[2011-10-26 11:59:51,795] INFO Total Num Messages: 50 bytes: 500 in 
13.046 secs (kafka.tools.ProducerPerformance$)
[2011-10-26 11:59:51,795] INFO Messages/sec: 38325.9237 
(kafka.tools.ProducerPerformance$)
[2011-10-26 11:59:51,795] INFO MB/sec: 0.3655 (kafka.tools.ProducerPerformance$)

PATCHED:
jkreps-mn:kafka-git jkreps$ bin/kafka-producer-perf-test.sh --topic test 
--brokerinfo zk.connect=jkreps-ld:2181 --messages 50 --message-size 10 
--batch-size 1 --threads 1
[2011-10-26 11:58:42,335] INFO Total Num Messages: 50 bytes: 500 in 
13.125 secs (kafka.tools.ProducerPerformance$)
[2011-10-26 11:58:42,335] INFO Messages/sec: 38095.2381 
(kafka.tools.ProducerPerformance$)
[2011-10-26 11:58:42,335] INFO MB/sec: 0.3633 (kafka.tools.ProducerPerformance$)

 Kafka producer should do a single write to send message sets
 

 Key: KAFKA-171
 URL: https://issues.apache.org/jira/browse/KAFKA-171
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.7, 0.8
Reporter: Jay Kreps
Assignee: Jay Kreps
 Fix For: 0.8

 Attachments: KAFKA-171-draft.patch


 From email thread: 
 http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e
  Before sending an actual message, kafka producer do send a (control) 
  message of 4 bytes to the server. Kafka producer always does this action 
  before send some message to the server.
 I think this is because in BoundedByteBufferSend.scala we do essentially
  channel.write(sizeBuffer)
  channel.write(dataBuffer)
 The correct solution is to use vector I/O and instead do
  channel.write(Array(sizeBuffer, dataBuffer))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (KAFKA-173) Support encoding for non ascii characters

2011-10-26 Thread Alejandro (Created) (JIRA)
Support encoding for non ascii characters
-

 Key: KAFKA-173
 URL: https://issues.apache.org/jira/browse/KAFKA-173
 Project: Kafka
  Issue Type: Bug
  Components: clients
Reporter: Alejandro
Priority: Minor


See attached patch.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (KAFKA-173) Support encoding for non ascii characters

2011-10-26 Thread Alejandro (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro updated KAFKA-173:


Attachment: 0001-Several-fixes-for-Ruby-client.patch

 Support encoding for non ascii characters
 -

 Key: KAFKA-173
 URL: https://issues.apache.org/jira/browse/KAFKA-173
 Project: Kafka
  Issue Type: Bug
  Components: clients
Reporter: Alejandro
Priority: Minor
 Attachments: 0001-Several-fixes-for-Ruby-client.patch


 See attached patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (KAFKA-174) Add performance suite for Kafka

2011-10-26 Thread Neha Narkhede (Created) (JIRA)
Add performance suite for Kafka
---

 Key: KAFKA-174
 URL: https://issues.apache.org/jira/browse/KAFKA-174
 Project: Kafka
  Issue Type: New Feature
Reporter: Neha Narkhede
 Fix For: 0.8


This is a placeholder JIRA for adding a perf suite to Kafka. The high level 
proposal is here -
https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing

There will be more JIRAs covering smaller tasks to fully implement this. They 
will be linked to this JIRA. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (KAFKA-91) zkclient does not show up in pom

2011-10-26 Thread Chris Burroughs (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Burroughs updated KAFKA-91:
-

Attachment: k91-v2.txt

Sorry missed your reply, rebased as of 76957e5e3a748b59525e5e7934f93721eb8f4c38

 zkclient does not show up in pom
 

 Key: KAFKA-91
 URL: https://issues.apache.org/jira/browse/KAFKA-91
 Project: Kafka
  Issue Type: Bug
  Components: packaging
Reporter: Chris Burroughs
Assignee: Chris Burroughs
Priority: Minor
 Fix For: 0.8

 Attachments: k91-v1.txt, k91-v2.txt


 The pom from created by `make-pom`. Does not include zkclient, which is  of 
 course a key dependency.  Not sure yet how to pull in zkclient while 
 excluding sbt itself.
 $ cat core/target/scala_2.8.0/kafka-0.7.pom  | grep -i zkclient | wc -l
 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (KAFKA-170) Support for non-blocking polling on multiple streams

2011-10-26 Thread Chris Burroughs (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136630#comment-13136630
 ] 

Chris Burroughs commented on KAFKA-170:
---

 Do users care about the topic/partition

I think yes. If we allow users to provide an arbitrary partitioner, the 
partitioning may be meaningful through their pipeline.

 Support for non-blocking polling on multiple streams
 

 Key: KAFKA-170
 URL: https://issues.apache.org/jira/browse/KAFKA-170
 Project: Kafka
  Issue Type: New Feature
  Components: core
Affects Versions: 0.8
Reporter: Jay Kreps

 Currently we provide a blocking iterator in the consumer. This is a good 
 mechanism for consuming data from a single topic, but is limited as a 
 mechanism for polling multiple streams.
 For example if one wants to implement a non-blocking union across multiple 
 streams this is hard to do because calls may block indefinitely. A similar 
 situation arrises if trying to implement a streaming join of between two 
 streams.
 I would propose two changes:
 1. Implement a next(timeout) interface on KafkaMessageStream. This will 
 easily handle some simple cases with minimal change. This handles certain 
 limited cases nicely and is easy to implement, but doesn't actually cover the 
 two cases above.
 2. Add an interface to poll streams.
 I don't know the best approach for the later api, but it is important to get 
 it right. One option would be to add a 
 ConsumerConnector.drainTopics(topic1, topic2, ...) which blocks until 
 there is at least one message and then returns a list of triples (topic, 
 partition, message).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (KAFKA-176) Fix existing perf tools

2011-10-26 Thread Neha Narkhede (Created) (JIRA)
Fix existing perf tools
---

 Key: KAFKA-176
 URL: https://issues.apache.org/jira/browse/KAFKA-176
 Project: Kafka
  Issue Type: Sub-task
Reporter: Neha Narkhede
Assignee: Neha Narkhede
 Fix For: 0.8


The existing perf tools - ProducerPerformance.scala, ConsumerPerformance.scala 
and SimpleConsumerPerformance.scala are slightly buggy. It will be good to -

1. move them to a perf directory from the existing kafka/tools location
2. fix the bugs, so that they measure throughput correctly


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (KAFKA-176) Fix existing perf tools

2011-10-26 Thread Neha Narkhede (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-176:


Status: Patch Available  (was: Open)

 Fix existing perf tools
 ---

 Key: KAFKA-176
 URL: https://issues.apache.org/jira/browse/KAFKA-176
 Project: Kafka
  Issue Type: Sub-task
Reporter: Neha Narkhede
Assignee: Neha Narkhede
 Fix For: 0.8

 Attachments: kafka-176.patch


 The existing perf tools - ProducerPerformance.scala, 
 ConsumerPerformance.scala and SimpleConsumerPerformance.scala are slightly 
 buggy. It will be good to -
 1. move them to a perf directory from the existing kafka/tools location
 2. fix the bugs, so that they measure throughput correctly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (KAFKA-176) Fix existing perf tools

2011-10-26 Thread Neha Narkhede (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-176:


Attachment: kafka-176.patch

This patch fixes a bunch of problems in the perf tools. I'll try to explain the 
changes at a high level, since I don't remember the exact list of little bugs 
that existed -

1. Refactoring ProducerPerformance to include just one code path for both sync 
and async producer
2. Fix bugs in ProducerPerformance to fix the throughput calculation and the 
logic for having multiple threads
3. Fix various bugs in the consumer performance tools
4. Fix the output of the tools to include useful info in the csv format
5. Move all of the above to a perf sub project

 Fix existing perf tools
 ---

 Key: KAFKA-176
 URL: https://issues.apache.org/jira/browse/KAFKA-176
 Project: Kafka
  Issue Type: Sub-task
Reporter: Neha Narkhede
Assignee: Neha Narkhede
 Fix For: 0.8

 Attachments: kafka-176.patch


 The existing perf tools - ProducerPerformance.scala, 
 ConsumerPerformance.scala and SimpleConsumerPerformance.scala are slightly 
 buggy. It will be good to -
 1. move them to a perf directory from the existing kafka/tools location
 2. fix the bugs, so that they measure throughput correctly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: kafka message format

2011-10-26 Thread Bao Thai Ngo
Jun, Jay:

Thanks for the info and the patch.

~Thai

On Thu, Oct 27, 2011 at 12:54 AM, Jay Kreps jay.kr...@gmail.com wrote:

 Jun is correct about the request size. One thing you point out that is a
 concern is that the send of the request size is actually turning into a
 seperate packet (because we write it separately and have enabled tcpnodelay.

 I have implemented a patch which turns this into a single write using
 vector i/o (and actually shortens code too).

 So far I don't see any perf improvement from this. Will post the results I
 see so far. Let's move the discussion for a JIRA:
 https://issues.apache.org/jira/browse/KAFKA-171

 On Wed, Oct 26, 2011 at 12:12 AM, Bao Thai Ngo baothai...@gmail.comwrote:

 Hi,

 We have just made inspection on the message format and found some
 information unclear to us. Before the investigation, I did know that each
 Kafka message has an overhead of only 10 bytes.

 Below is what we have done with Kafka 0.7 RC2 on a CentOS 5.6 server:
 - start kafka server on port 
 - tcpdump -i lo -w test.pcap port  and host localhost -v
 - start kafka producer
 - have kafka producer send some sort of message (hello, aaa) to the server

 We found 2 important items:
 1. Before sending an actual message, kafka producer do send a (control)
 message of 4 bytes to the server. Kafka producer always does this action
 before send some message to the server.
 2. Sending a message of 15 bytes (10 bytes for overhead + 5 bytes for the
 message payload) from the producer to the server gives us an IP packet of 83
 bytes (IP header: 20, TCP header: 32, Data: 31 bytes). The data of the IP
 packet is 31 instead of 15 bytes

 Could you please give us a detailed explanation about these 2
 items? Please also see the attachment for further information.

 Thanks,
 ~Thai








[jira] [Commented] (KAFKA-176) Fix existing perf tools

2011-10-26 Thread Jun Rao (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136761#comment-13136761
 ] 

Jun Rao commented on KAFKA-176:
---

ProducerPerformance:
1. The following code seems to be used to enable/disable the header. Is it 
better to control that in config instead using debug logging (so that it's not 
mixed with other debug logging)? Also, the header info is not complete, missing 
the first few fields. The header is probably useful for stats printed out 
periodically in each thread. So it should be printed out early, if enabled.  
if(logger.isDebugEnabled)
logger.debug(message size, batch size, total data sent in MB, MB/sec, 
total data sent in nMsg, nMsg/sec)
2. Whether avgPerf is specified or not, the user is probably always interested 
in the aggregated numbers across all threads. How about we always print it out 
and have a config option showDetails to enable/disable periodic reporting in 
each thread. Ditto in other perf tools.
3. ProducerThread has multiple bugs:
  3.1. Variable-sized messages are not picked up in Async mode
  3.2. In Sync mode, messageSet needs to be reset for each batch, if messages 
are of variable size (seems to be an existing bug)
4. It's better not to duplicate the following code. Defining it once in a 
static method seems better. 
println((%s, %d, %d, %d, %d, %.2f, %.4f, %d, 
%.4f).format(formattedReportTime, config.compressionCodec.codec,
  threadId, config.messageSize, config.batchSize, 
(bytesSent*1.0)/(1024 * 1024), mbPerSec, nSends, numMessagesPerSec))



 Fix existing perf tools
 ---

 Key: KAFKA-176
 URL: https://issues.apache.org/jira/browse/KAFKA-176
 Project: Kafka
  Issue Type: Sub-task
Reporter: Neha Narkhede
Assignee: Neha Narkhede
 Fix For: 0.8

 Attachments: kafka-176.patch


 The existing perf tools - ProducerPerformance.scala, 
 ConsumerPerformance.scala and SimpleConsumerPerformance.scala are slightly 
 buggy. It will be good to -
 1. move them to a perf directory from the existing kafka/tools location
 2. fix the bugs, so that they measure throughput correctly

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira