Abou Kafka 0.8 producer throughput test
Hi, I do producer(Kafka 0.8) throughput test many times. But the average value is 3MB/S. Below is my test environment: CPU core :16 Vendor_id :GenuineIntel Cpu family :6 Cpu MHz :2899.999 Cache size:20480 KB Cpu level :13 MEM :16330832KB=15.57GB Disk : RAID5 I don’t know the detail information about the disk, such as rotation. But I do some test about the I/O performance of the disk. The write rate is 500MB~600MB/S, the read rate is 180MB/S. The detail is as below. [cid:image002.png@01CDF4AE.52046900] And I adjust the broker configuration file as the official document says as below. And I adjust the JVM to 5120MB. I run producer performance test with the script kafka-producer-perf-test.sh, with the test command is bin/kafka-producer-perf-test.sh --broker-list 10.75.167.46:49092 --topics topic_perf_46_1,topic_perf_46_2,topic_perf_46_3,topic_perf_46_4, topic_perf_46_5,topic_perf_46_6, topic_perf_46_7,topic_perf_46_8,topic_perf_46_9,topic_perf_46_10 --initial-message-id 0 --threads 200 --messages 100 --message-size 200 --compression-codec 1 But the test result is also not as good as the official document says(50MB/S, and that value in your paper is 100MB/S). The test result is as below: 2013-01-17 04:15:24:768, 2013-01-17 04:25:01:637, 0, 200, 200, 1907.35, 3.3064, 1000, 17334.9582 On the other hand, I do consumer throughput test, the result is about 60MB/S while that value in official document is 100MB/S. I really don’t know why? You know high throughput is one of the most important features of Kafka. So I am really concerned with it. Thanks and best regards! From: Jay Kreps [mailto:jkr...@linkedin.com] Sent: 2013年1月16日 2:22 To: Jun Guo -X (jungu - CIIC at Cisco) Subject: RE: About acknowledge from broker to producer in your paper. Not sure which version you are using... In 0.7 this would happen only if there was a socket level error (i.e. can't connect to the host). This covers a lot of cases since in the event of I/O errors (disk full, etc) we just have that node shut itself down to let others take over. In 0.8 we send all errors back to the client. So the difference is that, for example, in the event of a disk error, in 0.7 the client would send a message, the broker would get an error and shoot itself in the head and disconnect its clients, and the client would get the error the next time it tried to send a message. So in 0.7 the error might not get passed back to the client until the second message send. In 0.8 this would happen with the first send, which is an improvement. -Jay From: Jun Guo -X (jungu - CIIC at Cisco) [ju...@cisco.com] Sent: Monday, January 14, 2013 9:45 PM To: Jay Kreps Subject: About acknowledge from broker to producer in your paper. Hi, I have read your paper Kafka: a Distributed Messaging System for Log Processing . In experimental results part. There are some words as below: There are a few reasons why Kafka performed much better. First, the Kafka producer currently doesn’t wait for acknowledgements from the broker and sends messages as faster as the broker can handle. This significantly increased the throughput of the publisher. With a batch size of 50, a single Kafka producer almost saturated the 1Gb link between the producer and the broker. This is a valid optimization for the log aggregation case, as data must be sent asynchronously to avoid introducing any latency into the live serving of traffic. We note that without acknowledging the producer, there is no guarantee that every published message is actually received by the broker. For many types of log data, it is desirable to trade durability for throughput, as long as the number of dropped messages is relatively small. However, we do plan to address the durability issue for more critical data in the future. But I have done a series of test. I found that ,if I shut down all the brokers, when I send a message from producer to broker, the producer will report kafka.common.FailedToSendMessageException . It says, Failed to send messages after 3 tries. [cid:image003.png@01CDF4AE.D547ED00] If there is no acknowledge from broker, how the producer find the sending is failed? And how it try 3 times? Maybe, the acknowledge in your paper refers to another thing, if so ,please tell what is the meaning of acknowledge? Many thanks and best regards! Guo Jun
RE: About kafka 0.8 producer zookeeper-based load balancing on per-request basis
Thanks for your kindly reply. From: Jun Rao [mailto:jun...@gmail.com] Sent: 2013年1月15日 13:53 To: dev@kafka.apache.org; Jun Guo -X (jungu - CIIC at Cisco) Subject: Re: About kafka 0.8 producer zookeeper-based load balancing on per-request basis Basically, we spread partitions among multiple brokers. If a message is sent without a key, the producer picks a random partition to balance the load. If a message has a key, the default partitioner hashes the key to one of the partitions deterministically. Then, the load may not always be balanced. Thanks, Jun On Mon, Jan 14, 2013 at 9:35 PM, Jun Guo -X (jungu - CIIC at Cisco) mailto:ju...@cisco.com>> wrote: Hi, We know, in kafka 0.8, producer connect to broker directly, it without connecting to zookeeper. Than how it achieve zookeeper-based load balance on per-request basis? Actually, when a topic be created, its partition will distributed in one or more brokers. When a message be sent, it will be delivered to a certain partition according to its key word. That is to say ,a certain must be sent to a fixed partition on a fixed broker. How the so called load balancing works? Best Regards
RE: About kafka 0.8 producer zookeeper-based load balancing on per-request basis
In 0.8, we got rid of zookeeper from the producer and replaced it with a metadata API. What is the API like? Can we call the API manually? On startup and whenever a request fails, the producer refreshes its view of the cluster by sending a metadata request to any of the brokers. But in my test, in Kafka 0.8, if the broker.list for a producer consist of broker1 and broker2, and broker 1 and broker 2 is not startup ,only broker 3 start, then if the producer start, it can’t send any data for a new topic correctly.It says, fetching topic metadata for topics from broker 1 and broker 2 failed. On the other hand If the broker.list consist of broker 1 and broker 2, and broker 1 and broker 3 are working. When the producer start to send some data for a new topic, there will be two partitions in broker 1 and two in broker 3.(The num.partitions configured in all brokers are 4). So we must guarantee at least one broker in the broker.list are working normally?Right? Why you get rid of zookeeper from the producer? With Zookeeper(like kafka 0.7.2), the producer have no this restriction. From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: 2013年1月15日 13:59 To: dev@kafka.apache.org Cc: Jun Guo -X (jungu - CIIC at Cisco) Subject: Re: About kafka 0.8 producer zookeeper-based load balancing on per-request basis Than how it achieve zookeeper-based load balance on per-request basis? You are asking two related but different questions. One is how does load balancing work and the other is how does broker discovery works. Jun explained how load balancing works and how requests are routed to partitons. In 0.7, there were 2 options for broker discovery - zookeeper and hardware load balancer (VIP). The zookeeper based producer got notified by zookeeper whenever a new broker came up or an existing broker went down. In 0.8, we got rid of zookeeper from the producer and replaced it with a metadata API. On startup and whenever a request fails, the producer refreshes its view of the cluster by sending a metadata request to any of the brokers. So if a broker goes down or a new broker comes up, the leader for some partitions might change and the producer will know since its requests to the older leaders will fail. Hope that helps, Neha On Mon, Jan 14, 2013 at 9:52 PM, Jun Rao mailto:jun...@gmail.com>> wrote: Basically, we spread partitions among multiple brokers. If a message is sent without a key, the producer picks a random partition to balance the load. If a message has a key, the default partitioner hashes the key to one of the partitions deterministically. Then, the load may not always be balanced. Thanks, Jun On Mon, Jan 14, 2013 at 9:35 PM, Jun Guo -X (jungu - CIIC at Cisco) < ju...@cisco.com<mailto:ju...@cisco.com>> wrote: > Hi, > We know, in kafka 0.8, producer connect to broker directly, it without > connecting to zookeeper. Than how it achieve zookeeper-based load balance > on per-request basis? > Actually, when a topic be created, its partition will distributed in one > or more brokers. When a message be sent, it will be delivered to a certain > partition according to its key word. That is to say ,a certain must be sent > to a fixed partition on a fixed broker. How the so called load balancing > works? > > Best Regards >
kafka 0.8 producer throughput
According to Kafka official document, the producer throughput is about 50MB/S. But I do some test, the producer throughout is only about 2MB/S. The test environment is the same with document says. One producer, One broker, One Zookeeper are in independent machine. Message size is 100 bytes, batch size is 200, flush interval is 600 messages. The test environment is the same, the configuration is the same. The why there is such big gap the my test result and the document says?
Kafka 0.8 producer can't specify zk.connect without specifying broker.list
Hi all, I find that in Kafka 0.72, we can specify either zk.connect or broker.list. But in Kafka 0.8, we can only specify broker.list ,and we can't specify zk.connect without specifying broker.list. I think, in this case, we can't balance producer through zookeeper. If anyone use Kafka 0.8, or have some understanding with that? Many thanks! Best Regard
About Kafka 0.8 producer
Hi all, I find that in Kafka 0.72, we can specify either zk.connect or broker.list. But in Kafka 0.8, we can only specify broker.list ,and we can't specify zk.connect without specifying broker.list. I think, in this case, we can't balance producer through zookeeper. If anyone use Kafka 0.8, or have some understanding with that? Many thanks! Best Regard