Re: Kafka Performance Tuning
Hi Yashika, No logs in broker log is not normal, can you verify if you turned off logging in your log4j properties file? If it is please enable it and try again, and see what is in the logs. Tim On Thu, Apr 24, 2014 at 10:53 PM, Yashika Gupta yashika.gu...@impetus.co.in wrote: Jun, I am using Kafka 2.8.0- 0.8.0 version. There are no logs for the past month in the controller and state-change log. Though I can see dome gc logs in the kafka-home-dir/logs folder. zookeeper-gc.log kafkaServer-gc.log Yashika __ From: Jun Rao jun...@gmail.com Sent: Friday, April 25, 2014 9:03 AM To: users@kafka.apache.org Subject: Re: Kafka Performance Tuning Which version of Kafka are you using? Any error in the controller and state-change log? Thanks, Jun On Thu, Apr 24, 2014 at 7:37 PM, Yashika Gupta yashika.gu...@impetus.co.inwrote: I am running a single broker and the leader column has 0 as the value. pushkar priyadarshi priyadarshi.push...@gmail.com wrote: you can use the kafka-list-topic.sh to find out if leader for particual topic is available.-1 in leader column might indicate trouble. On Fri, Apr 25, 2014 at 6:34 AM, Guozhang Wang wangg...@gmail.com wrote: Could you double check if the topic LOGFILE04 is already created on the servers? Guozhang On Thu, Apr 24, 2014 at 10:46 AM, Yashika Gupta yashika.gu...@impetus.co.in wrote: Jun, The detailed logs are as follows: 24.04.2014 13:37:31812 INFO main kafka.producer.SyncProducer - Disconnecting from localhost:9092 24.04.2014 13:37:38612 WARN main kafka.producer.BrokerPartitionInfo - Error while fetching metadata [{TopicMetadata for topic LOGFILE04 - No partition metadata for topic LOGFILE04 due to kafka.common.LeaderNotAvailableException}] for topic [LOGFILE04]: class kafka.common.LeaderNotAvailableException 24.04.2014 13:37:40712 INFO main kafka.client.ClientUtils$ - Fetching metadata from broker id:0,host:localhost,port:9092 with correlation id 1 for 1 topic(s) Set(LOGFILE04) 24.04.2014 13:37:41212 INFO main kafka.producer.SyncProducer - Connected to localhost:9092 for producing 24.04.2014 13:37:48812 INFO main kafka.producer.SyncProducer - Disconnecting from localhost:9092 24.04.2014 13:37:48912 WARN main kafka.producer.BrokerPartitionInfo - Error while fetching metadata [{TopicMetadata for topic LOGFILE04 - No partition metadata for topic LOGFILE04 due to kafka.common.LeaderNotAvailableException}] for topic [LOGFILE04]: class kafka.common.LeaderNotAvailableException 24.04.2014 13:37:49012 ERROR main kafka.producer.async.DefaultEventHandler - Failed to collate messages by topic, partition due to: Failed to fetch topic metadata for topic: LOGFILE04 24.04.2014 13:39:96513 WARN ConsumerFetcherThread-produceLogLine2_vcmd-devanshu-1398361030812-8a0c706e-0-0 kafka.consumer.ConsumerFetcherThread - [ConsumerFetcherThread-produceLogLine2_vcmd-devanshu-1398361030812-8a0c706e-0-0], Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 4; ClientId: produceLogLine2-ConsumerFetcherThread-produceLogLine2_vcmd-devanshu-1398361030812-8a0c706e-0-0; ReplicaId: -1; MaxWait: 6 ms; MinBytes: 1 bytes; RequestInfo: [LOGFILE04,0] - PartitionFetchInfo(2,1048576) java.net.SocketTimeoutException at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) at kafka.utils.Utils$.read(Unknown Source) at kafka.network.BoundedByteBufferReceive.readFrom(Unknown Source) at kafka.network.Receive$class.readCompletely(Unknown Source) at kafka.network.BoundedByteBufferReceive.readCompletely(Unknown Source) at kafka.network.BlockingChannel.receive(Unknown Source) at kafka.consumer.SimpleConsumer.liftedTree1$1(Unknown Source) at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(Unknown Source) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Unknown Source) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(Unknown Source) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(Unknown Source) at kafka.metrics.KafkaTimer.time(Unknown Source) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(Unknown Source) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(Unknown Source) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(Unknown Source) at kafka.metrics.KafkaTimer.time(Unknown Source) at
RE: Kafka Performance Tuning
Timothy, I checked out https://issues.apache.org/jira/browse/KAFKA-1124 So, I created the topic manually and re-ran my test and has enabled log4j properties. There are no ERROR logs in controller and state change logs. And I m still getting the SocketTimeout Exception: 25.04.2014 03:05:10115 WARN ConsumerFetcherThread-consumerDeviceError2_vcmd-devanshu-1398408813021-efe24d49-0-0 kafka.consumer.ConsumerFetcherThread - [ConsumerFetcherThread-consumerDeviceError2_vcmd-devanshu-1398408813021-efe24d49-0-0], Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 26; ClientId: consumerDeviceError2-ConsumerFetcherThread-consumerDeviceError2_vcmd-devanshu-1398408813021-efe24d49-0-0; ReplicaId: -1; MaxWait: 6 ms; MinBytes: 1 bytes; RequestInfo: [LOGLINE10,0] - PartitionFetchInfo(12,1048576) java.net.SocketTimeoutException at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) at kafka.utils.Utils$.read(Unknown Source) at kafka.network.BoundedByteBufferReceive.readFrom(Unknown Source) at kafka.network.Receive$class.readCompletely(Unknown Source) at kafka.network.BoundedByteBufferReceive.readCompletely(Unknown Source) at kafka.network.BlockingChannel.receive(Unknown Source) at kafka.consumer.SimpleConsumer.liftedTree1$1(Unknown Source) at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(Unknown Source) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Unknown Source) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(Unknown Source) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(Unknown Source) at kafka.metrics.KafkaTimer.time(Unknown Source) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(Unknown Source) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(Unknown Source) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(Unknown Source) at kafka.metrics.KafkaTimer.time(Unknown Source) at kafka.consumer.SimpleConsumer.fetch(Unknown Source) at kafka.server.AbstractFetcherThread.processFetchRequest(Unknown Source) at kafka.server.AbstractFetcherThread.doWork(Unknown Source) at kafka.utils.ShutdownableThread.run(Unknown Source) My 8 consumers are running in parallel with different consumer group ids and trying to read from LOGLINE10. 2 of them are able to read the LOGLINE10 topic but the 6 others are failing with the above exception. Yashika From: Timothy Chen tnac...@gmail.com Sent: Friday, April 25, 2014 11:40:33 AM To: users@kafka.apache.org Subject: Re: Kafka Performance Tuning Hi Yashika, No logs in broker log is not normal, can you verify if you turned off logging in your log4j properties file? If it is please enable it and try again, and see what is in the logs. Tim On Thu, Apr 24, 2014 at 10:53 PM, Yashika Gupta yashika.gu...@impetus.co.in wrote: Jun, I am using Kafka 2.8.0- 0.8.0 version. There are no logs for the past month in the controller and state-change log. Though I can see dome gc logs in the kafka-home-dir/logs folder. zookeeper-gc.log kafkaServer-gc.log Yashika __ From: Jun Rao jun...@gmail.com Sent: Friday, April 25, 2014 9:03 AM To: users@kafka.apache.org Subject: Re: Kafka Performance Tuning Which version of Kafka are you using? Any error in the controller and state-change log? Thanks, Jun On Thu, Apr 24, 2014 at 7:37 PM, Yashika Gupta yashika.gu...@impetus.co.inwrote: I am running a single broker and the leader column has 0 as the value. pushkar priyadarshi priyadarshi.push...@gmail.com wrote: you can use the kafka-list-topic.sh to find out if leader for particual topic is available.-1 in leader column might indicate trouble. On Fri, Apr 25, 2014 at 6:34 AM, Guozhang Wang wangg...@gmail.com wrote: Could you double check if the topic LOGFILE04 is already created on the servers? Guozhang On Thu, Apr 24, 2014 at 10:46 AM, Yashika Gupta yashika.gu...@impetus.co.in wrote: Jun, The detailed logs are as follows: 24.04.2014 13:37:31812 INFO main kafka.producer.SyncProducer - Disconnecting from localhost:9092 24.04.2014 13:37:38612 WARN main kafka.producer.BrokerPartitionInfo - Error while fetching metadata [{TopicMetadata for topic LOGFILE04 - No partition metadata for topic LOGFILE04 due to kafka.common.LeaderNotAvailableException}] for topic [LOGFILE04]: class kafka.common.LeaderNotAvailableException 24.04.2014 13:37:40712 INFO main kafka.client.ClientUtils$ - Fetching metadata from broker
Re: performance testing data to share
Bert, Thanks for sharing. Which version of Kafka were you testing? Jun On Fri, Apr 25, 2014 at 3:11 PM, Bert Corderman bertc...@gmail.com wrote: I have been testing kafka for the past week or so and figured I would share my results so far. I am not sure if the formatting will keep in email but here are the results in a google doc...all 1,100 of them https://docs.google.com/spreadsheets/d/1UL-o2MiV0gHZtL4jFWNyqRTQl41LFdM0upjRIwCWNgQ/edit?usp=sharing One thing I found is there appears to be a bottleneck in kafka-producer-perf-test.sh The servers I used for testing have 12 7.2K drives and 16 cores. I was NOT unable to scale the broker past 350MBsec when adding drives even though I was able to get 150MBsec from a single drive. I wanted to determine the source of the low utilization. I tired changing the following ·log.flush.interval.messages on the broker ·log.flush.interval.ms flush on the broker ·num.io.threads on the broker ·thread settings on the producer ·producer message sizes ·producer batch sizes ·different number of topics (which impact the number of drives) None of the above had any impact. The last thing I tried was running multiple producers which had a very noticeable impact. As previously mentioned I had already tested the thread setting of the producer and found it to scale when increasing the thread count from 1,2,4 and 8. After that it plateaued so I had been using 8 threads for each test. To show the impact on number of producers I created 12 topics with partition counts from 1 to 12.I used a single broker with no replication and configured the producer(s) to send 10 million 2200 byte messages in batches of 400 with no ack. Running with three producers has almost double the throughput that one producer will have. Other Key points learned so far ·Ensure you are using correct network interface. ( use advertised.host.name if the servers have multiple interfaces) ·Use batching on the producer – With a single broker sending 2200 byte messages in batches of 200 resulted in 283MBsec vs. a batch size of 1 was 44MBsec ·The message size, the configuration of request.required.acks and the number of replicas (only when ack is set to all) had the most influence on the overall throughput. ·The following table shows results of testing with messages sizes of 200, 300, 1000 and 2200 bytes on a three node cluster. Each message size was tested with the three available ack modes (NONE, LEADER and ALL) and with replication of two and three copies. Having three copies of data is recommended, however both are included for reference. *Replica=2* *Replica=3* *message.size* *acks* *MB.sec* *nMsg.sec* *MB.sec* *nMsg.sec* *Per Server MB.sec* *Per Server nMsg.sec* 200 NONE 251 1,313,888 237 1,242,390 79 414,130 300 NONE 345 1,204,384 320 1,120,197 107 373,399 1000 NONE 522 546,896 515 540,541 172 180,180 2200 NONE 368 175,165 367 174,709 122 58,236 200 LEADER 115 604,376 141 739,754 47 246,585 300 LEADER 186 650,280 192 670,062 64 223,354 1000 LEADER 340 356,659 328 343,808 109 114,603 2200 LEADER 310 147,846 293 139,729 98 46,576 200 ALL 74 385,594 58 304,386 19 101,462 300 ALL 105 367,282 78 272,316 26 90,772 1000 ALL 203 212,400 124 130,305 41 43,435 2200 ALL 212 100,820 136 64,835 45 21,612 Some observations from the above table ·Increasing the number of replicas when request.required.acks is none or leader only has limited impact on overall performance (additional resources are required to replicate data but during tests this did not impact producer throughput) ·Compression is not shown as it was found that the data generated for the test is not realistic to a production workload. (GZIP compressed data 300:1 which is unrealistic ) ·For some reason a message size of 1000 bytes performed the best. Need to look into this more. Thanks Bert