Abou Kafka 0.8 producer throughput test

2013-01-16 Thread Jun Guo -X (jungu - CIIC at Cisco)
Hi,
  I do producer(Kafka 0.8) throughput test many times. But the average 
value is 3MB/S. Below is my test environment:
   CPU core  :16
   Vendor_id :GenuineIntel
   Cpu family :6
   Cpu MHz  :2899.999
   Cache size:20480 KB
   Cpu level  :13
   MEM :16330832KB=15.57GB
   Disk   : RAID5

   I don’t know the detail information about the disk, such as rotation. 
But I do some test about the I/O performance of the disk. The write rate is 
500MB~600MB/S, the read rate is 180MB/S. The detail is as below.
[cid:image002.png@01CDF4AE.52046900]

And I adjust the broker configuration file as the official document says as 
below. And I adjust the JVM to 5120MB.
I run producer performance test with the script kafka-producer-perf-test.sh, 
with the test command is
bin/kafka-producer-perf-test.sh --broker-list 10.75.167.46:49092 --topics 
topic_perf_46_1,topic_perf_46_2,topic_perf_46_3,topic_perf_46_4, 
topic_perf_46_5,topic_perf_46_6, 
topic_perf_46_7,topic_perf_46_8,topic_perf_46_9,topic_perf_46_10 
--initial-message-id 0 --threads 200 --messages 100 --message-size 200 
--compression-codec 1

But the test result is also not as good as the official document says(50MB/S, 
and that value in your paper is 100MB/S). The test result is as below:
2013-01-17 04:15:24:768, 2013-01-17 04:25:01:637, 0, 200, 200, 1907.35, 3.3064, 
1000, 17334.9582

On the other hand, I do consumer throughput test, the result is about 60MB/S 
while that value in official document is 100MB/S.
I really don’t know why?
You know high throughput is one of the most important features of Kafka. So I 
am really concerned with it.

Thanks and best regards!

From: Jay Kreps [mailto:jkr...@linkedin.com]
Sent: 2013年1月16日 2:22
To: Jun Guo -X (jungu - CIIC at Cisco)
Subject: RE: About acknowledge from broker to producer in your paper.

Not sure which version you are using...

In 0.7 this would happen only if there was a socket level error (i.e. can't 
connect to the host). This covers a lot of cases since in the event of I/O 
errors (disk full, etc) we just have that node shut itself down to let others 
take over.

In 0.8 we send all errors back to the client.

So the difference is that, for example, in the event of a disk error, in 0.7 
the client would send a message, the broker would get an error and shoot itself 
in the head and disconnect its clients, and the client would get the error the 
next time it tried to send a message. So in 0.7 the error might not get passed 
back to the client until the second message send. In 0.8 this would happen with 
the first send, which is an improvement.

-Jay

From: Jun Guo -X (jungu - CIIC at Cisco) [ju...@cisco.com]
Sent: Monday, January 14, 2013 9:45 PM
To: Jay Kreps
Subject: About acknowledge from broker to producer in your paper.
Hi,
   I have read your paper Kafka: a Distributed Messaging System for Log 
Processing .
   In experimental results part. There are some words as below:

   There are a few reasons why Kafka performed much better. First, the 
Kafka producer currently doesn’t wait for acknowledgements from the broker and 
sends messages as faster as the broker can handle. This significantly increased 
the throughput of the publisher. With a batch size of 50, a single Kafka 
producer almost saturated the 1Gb link between the producer and the broker. 
This is a valid optimization for the log aggregation case, as data must be sent 
asynchronously to avoid introducing any latency into the live serving of 
traffic. We note that without acknowledging the producer, there is no guarantee 
that every published message is actually received by the broker. For many types 
of log data, it is desirable to trade durability for throughput, as long as the 
number of dropped messages is relatively small. However, we do plan to
address the durability issue for more critical data in the future.

   But I have done a series of test. I found that ,if I shut down all the 
brokers, when I send a message from producer to broker, the producer will 
report kafka.common.FailedToSendMessageException . It says, Failed to send 
messages after 3 tries.
[cid:image003.png@01CDF4AE.D547ED00]
   If there is no acknowledge from broker, how the producer find the 
sending is failed? And how it try 3 times?

   Maybe, the acknowledge in your paper refers to another thing, if so 
,please tell what is the meaning of acknowledge?

   Many thanks and best regards!

Guo Jun


RE: About kafka 0.8 producer zookeeper-based load balancing on per-request basis

2013-01-14 Thread Jun Guo -X (jungu - CIIC at Cisco)
Thanks for your kindly reply.

From: Jun Rao [mailto:jun...@gmail.com]
Sent: 2013年1月15日 13:53
To: dev@kafka.apache.org; Jun Guo -X (jungu - CIIC at Cisco)
Subject: Re: About kafka 0.8 producer zookeeper-based load balancing on 
per-request basis

Basically, we spread partitions among multiple brokers. If a message is sent 
without a key, the producer picks a random partition to balance the load. If a 
message has a key, the default partitioner hashes the key to one of the 
partitions deterministically. Then, the load may not always be balanced.

Thanks,

Jun
On Mon, Jan 14, 2013 at 9:35 PM, Jun Guo -X (jungu - CIIC at Cisco) 
mailto:ju...@cisco.com>> wrote:
Hi,
We know, in kafka 0.8, producer connect to broker directly, it without 
connecting to zookeeper. Than how it achieve zookeeper-based load balance on 
per-request basis?
Actually, when a topic be created, its partition will distributed in one or 
more brokers. When a message be sent, it will be delivered to a certain 
partition according to its key word. That is to say ,a certain must be sent to 
a fixed partition on a fixed broker. How the so called load balancing works?

Best Regards



RE: About kafka 0.8 producer zookeeper-based load balancing on per-request basis

2013-01-14 Thread Jun Guo -X (jungu - CIIC at Cisco)
In 0.8, we got rid of zookeeper from the producer and replaced it with a 
metadata API.
What is the API like? Can we call the API manually?
On startup and whenever a request fails, the producer refreshes its view of the 
cluster by sending a metadata request to any of the brokers.
But in my test, in Kafka 0.8, if the broker.list for a producer consist of 
broker1 and broker2, and broker 1 and broker 2 is not startup ,only broker 3 
start, then if the producer start, it can’t send any data for a new topic 
correctly.It says, fetching topic metadata for topics from broker 1 and broker 
2 failed.
On the other hand
If the broker.list consist of broker 1 and broker 2, and broker 1 and broker 3 
are working. When the producer start to send some data for a new topic, there 
will be two partitions in broker 1 and two in broker 3.(The num.partitions 
configured in all brokers are 4).
So we must guarantee at least one broker in the broker.list are working 
normally?Right?
Why you get rid of zookeeper from the producer? With Zookeeper(like kafka 
0.7.2), the producer have no this restriction.




From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
Sent: 2013年1月15日 13:59
To: dev@kafka.apache.org
Cc: Jun Guo -X (jungu - CIIC at Cisco)
Subject: Re: About kafka 0.8 producer zookeeper-based load balancing on 
per-request basis


Than how it achieve zookeeper-based load balance on per-request basis?

You are asking two related but different questions. One is how does load 
balancing work and the other is how does broker discovery works. Jun explained 
how load balancing works and how requests are routed to partitons. In 0.7, 
there were 2 options for broker discovery - zookeeper and hardware load 
balancer (VIP). The zookeeper based producer got notified by zookeeper whenever 
a new broker came up or an existing broker went down. In 0.8, we got rid of 
zookeeper from the producer and replaced it with a metadata API. On startup and 
whenever a request fails, the producer refreshes its view of the cluster by 
sending a metadata request to any of the brokers. So if a broker goes down or a 
new broker comes up, the leader for some partitions might change and the 
producer will know since its requests to the older leaders will fail.

Hope that helps,
Neha

On Mon, Jan 14, 2013 at 9:52 PM, Jun Rao 
mailto:jun...@gmail.com>> wrote:
Basically, we spread partitions among multiple brokers. If a message is
sent without a key, the producer picks a random partition to balance the
load. If a message has a key, the default partitioner hashes the key to one
of the partitions deterministically. Then, the load may not always be
balanced.

Thanks,

Jun

On Mon, Jan 14, 2013 at 9:35 PM, Jun Guo -X (jungu - CIIC at Cisco) <
ju...@cisco.com<mailto:ju...@cisco.com>> wrote:

> Hi,
> We know, in kafka 0.8, producer connect to broker directly, it without
> connecting to zookeeper. Than how it achieve zookeeper-based load balance
> on per-request basis?
> Actually, when a topic be created, its partition will distributed in one
> or more brokers. When a message be sent, it will be delivered to a certain
> partition according to its key word. That is to say ,a certain must be sent
> to a fixed partition on a fixed broker. How the so called load balancing
> works?
>
> Best Regards
>



kafka 0.8 producer throughput

2013-01-08 Thread Jun Guo -X (jungu - CIIC at Cisco)
According to Kafka official document, the producer throughput is about 50MB/S. 
But I do some test, the producer throughout is only about 2MB/S. The test 
environment is the same with document says. One producer, One broker, One 
Zookeeper are in independent machine. Message size is 100 bytes, batch size is 
200, flush interval is 600 messages. The test environment is the same, the 
configuration is the same. The why there is such big gap the my test result and 
the document says?


Kafka 0.8 producer can't specify zk.connect without specifying broker.list

2013-01-06 Thread Jun Guo -X (jungu - CIIC at Cisco)
Hi all,
   I find that in Kafka 0.72, we can specify either zk.connect or 
broker.list. But in Kafka 0.8, we can only specify broker.list ,and we can't 
specify zk.connect without specifying broker.list. I think, in this case, we 
can't balance producer through zookeeper. If anyone use Kafka 0.8, or have some 
understanding with that?
   Many thanks!
Best Regard



About Kafka 0.8 producer

2013-01-06 Thread Jun Guo -X (jungu - CIIC at Cisco)
Hi all,
   I find that in Kafka 0.72, we can specify either zk.connect or 
broker.list. But in Kafka 0.8, we can only specify broker.list ,and we can't 
specify zk.connect without specifying broker.list. I think, in this case, we 
can't balance producer through zookeeper. If anyone use Kafka 0.8, or have some 
understanding with that?
   Many thanks!
Best Regard