Re: Kafka Performance Tuning

2014-04-25 Thread Timothy Chen
Hi Yashika,

No logs in broker log is not normal, can you verify if you turned off
logging in your log4j properties file?

If it is please enable it and try again, and see what is in the logs.

Tim

On Thu, Apr 24, 2014 at 10:53 PM, Yashika Gupta
yashika.gu...@impetus.co.in wrote:
 Jun,

 I am using Kafka 2.8.0- 0.8.0 version.
 There are no logs for the past month in the controller and state-change log.

 Though I can see dome gc logs in the kafka-home-dir/logs folder.
 zookeeper-gc.log
 kafkaServer-gc.log


 Yashika
 __
 From: Jun Rao jun...@gmail.com
 Sent: Friday, April 25, 2014 9:03 AM
 To: users@kafka.apache.org
 Subject: Re: Kafka Performance Tuning

 Which version of Kafka are you using? Any error in the controller and
 state-change log?

 Thanks,

 Jun


 On Thu, Apr 24, 2014 at 7:37 PM, Yashika Gupta
 yashika.gu...@impetus.co.inwrote:

 I am running a single broker and the leader column has 0 as the value.

 pushkar priyadarshi priyadarshi.push...@gmail.com wrote:


 you can use the kafka-list-topic.sh to find out if leader for particual
 topic is available.-1 in leader column might indicate trouble.


 On Fri, Apr 25, 2014 at 6:34 AM, Guozhang Wang wangg...@gmail.com wrote:

  Could you double check if the topic LOGFILE04 is already created on the
  servers?
 
  Guozhang
 
 
  On Thu, Apr 24, 2014 at 10:46 AM, Yashika Gupta 
  yashika.gu...@impetus.co.in
   wrote:
 
   Jun,
  
   The detailed logs are as follows:
  
   24.04.2014 13:37:31812 INFO main kafka.producer.SyncProducer -
   Disconnecting from localhost:9092
   24.04.2014 13:37:38612 WARN main kafka.producer.BrokerPartitionInfo -
   Error while fetching metadata [{TopicMetadata for topic LOGFILE04 -
   No partition metadata for topic LOGFILE04 due to
   kafka.common.LeaderNotAvailableException}] for topic [LOGFILE04]: class
   kafka.common.LeaderNotAvailableException
   24.04.2014 13:37:40712 INFO main kafka.client.ClientUtils$ - Fetching
   metadata from broker id:0,host:localhost,port:9092 with correlation id
 1
   for 1 topic(s) Set(LOGFILE04)
   24.04.2014 13:37:41212 INFO main kafka.producer.SyncProducer -
 Connected
   to localhost:9092 for producing
   24.04.2014 13:37:48812 INFO main kafka.producer.SyncProducer -
   Disconnecting from localhost:9092
   24.04.2014 13:37:48912 WARN main kafka.producer.BrokerPartitionInfo -
   Error while fetching metadata [{TopicMetadata for topic LOGFILE04 -
   No partition metadata for topic LOGFILE04 due to
   kafka.common.LeaderNotAvailableException}] for topic [LOGFILE04]: class
   kafka.common.LeaderNotAvailableException
   24.04.2014 13:37:49012 ERROR main
  kafka.producer.async.DefaultEventHandler
   - Failed to collate messages by topic, partition due to: Failed to
 fetch
   topic metadata for topic: LOGFILE04
  
  
   24.04.2014 13:39:96513 WARN
  
 
 ConsumerFetcherThread-produceLogLine2_vcmd-devanshu-1398361030812-8a0c706e-0-0
   kafka.consumer.ConsumerFetcherThread -
  
 
 [ConsumerFetcherThread-produceLogLine2_vcmd-devanshu-1398361030812-8a0c706e-0-0],
   Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 4;
  ClientId:
  
 
 produceLogLine2-ConsumerFetcherThread-produceLogLine2_vcmd-devanshu-1398361030812-8a0c706e-0-0;
   ReplicaId: -1; MaxWait: 6 ms; MinBytes: 1 bytes; RequestInfo:
   [LOGFILE04,0] - PartitionFetchInfo(2,1048576)
   java.net.SocketTimeoutException
   at
   sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229)
   at
  sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
   at
  
 
 java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
   at kafka.utils.Utils$.read(Unknown Source)
   at kafka.network.BoundedByteBufferReceive.readFrom(Unknown
  Source)
   at kafka.network.Receive$class.readCompletely(Unknown Source)
   at
 kafka.network.BoundedByteBufferReceive.readCompletely(Unknown
   Source)
   at kafka.network.BlockingChannel.receive(Unknown Source)
   at kafka.consumer.SimpleConsumer.liftedTree1$1(Unknown Source)
   at
  
 
 kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(Unknown
   Source)
   at
  
 
 kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Unknown
   Source)
   at
  
 
 kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(Unknown
   Source)
   at
  
 
 kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(Unknown
   Source)
   at kafka.metrics.KafkaTimer.time(Unknown Source)
   at
   kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(Unknown
  Source)
   at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(Unknown
   Source)
   at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(Unknown
   Source)
   at kafka.metrics.KafkaTimer.time(Unknown Source)
   at 

RE: Kafka Performance Tuning

2014-04-25 Thread Yashika Gupta
Timothy,

I checked out https://issues.apache.org/jira/browse/KAFKA-1124
So, I created the topic manually and re-ran my test and has enabled log4j 
properties.

There are no ERROR logs in controller and state change logs.
And I m still getting the SocketTimeout Exception:

25.04.2014 03:05:10115 WARN 
ConsumerFetcherThread-consumerDeviceError2_vcmd-devanshu-1398408813021-efe24d49-0-0
 kafka.consumer.ConsumerFetcherThread - 
[ConsumerFetcherThread-consumerDeviceError2_vcmd-devanshu-1398408813021-efe24d49-0-0],
 Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 26; ClientId: 
consumerDeviceError2-ConsumerFetcherThread-consumerDeviceError2_vcmd-devanshu-1398408813021-efe24d49-0-0;
 ReplicaId: -1; MaxWait: 6 ms; MinBytes: 1 bytes; RequestInfo: 
[LOGLINE10,0] - PartitionFetchInfo(12,1048576)
java.net.SocketTimeoutException
at 
sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
at 
java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
at kafka.utils.Utils$.read(Unknown Source)
at kafka.network.BoundedByteBufferReceive.readFrom(Unknown Source)
at kafka.network.Receive$class.readCompletely(Unknown Source)
at kafka.network.BoundedByteBufferReceive.readCompletely(Unknown Source)
at kafka.network.BlockingChannel.receive(Unknown Source)
at kafka.consumer.SimpleConsumer.liftedTree1$1(Unknown Source)
at 
kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(Unknown
 Source)
at 
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Unknown
 Source)
at 
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(Unknown
 Source)
at 
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(Unknown
 Source)
at kafka.metrics.KafkaTimer.time(Unknown Source)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(Unknown 
Source)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(Unknown Source)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(Unknown Source)
at kafka.metrics.KafkaTimer.time(Unknown Source)
at kafka.consumer.SimpleConsumer.fetch(Unknown Source)
at kafka.server.AbstractFetcherThread.processFetchRequest(Unknown 
Source)
at kafka.server.AbstractFetcherThread.doWork(Unknown Source)
at kafka.utils.ShutdownableThread.run(Unknown Source)

My 8 consumers are running in parallel with different consumer group ids and 
trying to read from LOGLINE10.
2 of them are able to read the LOGLINE10 topic but the 6 others are failing 
with the above exception.

Yashika




From: Timothy Chen tnac...@gmail.com
Sent: Friday, April 25, 2014 11:40:33 AM
To: users@kafka.apache.org
Subject: Re: Kafka Performance Tuning

Hi Yashika,

No logs in broker log is not normal, can you verify if you turned off
logging in your log4j properties file?

If it is please enable it and try again, and see what is in the logs.

Tim

On Thu, Apr 24, 2014 at 10:53 PM, Yashika Gupta
yashika.gu...@impetus.co.in wrote:
 Jun,

 I am using Kafka 2.8.0- 0.8.0 version.
 There are no logs for the past month in the controller and state-change log.

 Though I can see dome gc logs in the kafka-home-dir/logs folder.
 zookeeper-gc.log
 kafkaServer-gc.log


 Yashika
 __
 From: Jun Rao jun...@gmail.com
 Sent: Friday, April 25, 2014 9:03 AM
 To: users@kafka.apache.org
 Subject: Re: Kafka Performance Tuning

 Which version of Kafka are you using? Any error in the controller and
 state-change log?

 Thanks,

 Jun


 On Thu, Apr 24, 2014 at 7:37 PM, Yashika Gupta
 yashika.gu...@impetus.co.inwrote:

 I am running a single broker and the leader column has 0 as the value.

 pushkar priyadarshi priyadarshi.push...@gmail.com wrote:


 you can use the kafka-list-topic.sh to find out if leader for particual
 topic is available.-1 in leader column might indicate trouble.


 On Fri, Apr 25, 2014 at 6:34 AM, Guozhang Wang wangg...@gmail.com wrote:

  Could you double check if the topic LOGFILE04 is already created on the
  servers?
 
  Guozhang
 
 
  On Thu, Apr 24, 2014 at 10:46 AM, Yashika Gupta 
  yashika.gu...@impetus.co.in
   wrote:
 
   Jun,
  
   The detailed logs are as follows:
  
   24.04.2014 13:37:31812 INFO main kafka.producer.SyncProducer -
   Disconnecting from localhost:9092
   24.04.2014 13:37:38612 WARN main kafka.producer.BrokerPartitionInfo -
   Error while fetching metadata [{TopicMetadata for topic LOGFILE04 -
   No partition metadata for topic LOGFILE04 due to
   kafka.common.LeaderNotAvailableException}] for topic [LOGFILE04]: class
   kafka.common.LeaderNotAvailableException
   24.04.2014 13:37:40712 INFO main kafka.client.ClientUtils$ - Fetching
   metadata from broker 

Re: performance testing data to share

2014-04-25 Thread Jun Rao
Bert,

Thanks for sharing. Which version of Kafka were you testing?


Jun


On Fri, Apr 25, 2014 at 3:11 PM, Bert Corderman bertc...@gmail.com wrote:

 I have been testing kafka for the past week or so and figured I would share
 my results so far.


 I am not sure if the formatting will keep in email but here are the results
 in a google doc...all 1,100 of them



 https://docs.google.com/spreadsheets/d/1UL-o2MiV0gHZtL4jFWNyqRTQl41LFdM0upjRIwCWNgQ/edit?usp=sharing



 One thing I found is there appears to be a bottleneck in
 kafka-producer-perf-test.sh


 The servers I used for testing have 12 7.2K drives and 16 cores.  I was NOT
 unable to scale the broker past 350MBsec when adding drives even though I
 was able to get 150MBsec from a single drive.  I wanted to determine the
 source of the low utilization.


 I tired changing the following

 ·log.flush.interval.messages on the broker

 ·log.flush.interval.ms flush on the broker

 ·num.io.threads on the broker

 ·thread settings on the producer

 ·producer message  sizes

 ·producer batch sizes

 ·different number of topics (which impact the number of drives)

 None of the above had any impact.  The last thing I tried was running
 multiple producers which had a very noticeable impact.  As previously
 mentioned I had already tested the thread setting of the producer and found
 it to scale when increasing the thread count from 1,2,4 and 8.  After that
 it plateaued so I had been using 8 threads for each test.   To show the
 impact on number of producers I created 12 topics with partition counts
 from 1 to 12.I used a single broker with no replication and configured
 the producer(s) to send 10 million 2200 byte messages in batches of 400
 with no ack.


 Running with three producers has almost double the throughput that one
 producer will have.


 Other Key points learned so far

 ·Ensure you are using correct network interface.  ( use
 advertised.host.name if the servers have multiple interfaces)

 ·Use batching on the producer – With a single broker sending 2200
 byte messages in batches of 200 resulted in  283MBsec vs. a batch size of 1
 was 44MBsec

 ·The message size, the configuration of request.required.acks and
 the number of replicas (only when ack is set to all) had the most influence
 on the overall throughput.

 ·The following table shows results of testing with messages sizes
 of 200, 300, 1000 and 2200 bytes on a three node cluster.  Each message
 size was tested with the three available ack modes (NONE, LEADER and ALL)
 and with replication of two and three copies.   Having three copies of data
 is recommended, however both are included for reference.

 *Replica=2*

 *Replica=3*

 *message.size*

 *acks*

 *MB.sec*

 *nMsg.sec*

 *MB.sec*

 *nMsg.sec*

 *Per Server MB.sec*

 *Per Server nMsg.sec*

 200

 NONE

 251

 1,313,888

 237

 1,242,390

 79

 414,130

 300

 NONE

 345

 1,204,384

 320

 1,120,197

 107

 373,399

 1000

 NONE

 522

 546,896

 515

 540,541

 172

 180,180

 2200

 NONE

 368

 175,165

 367

 174,709

 122

 58,236

 200

 LEADER

 115

 604,376

 141

 739,754

 47

 246,585

 300

 LEADER

 186

 650,280

 192

 670,062

 64

 223,354

 1000

 LEADER

 340

 356,659

 328

 343,808

 109

 114,603

 2200

 LEADER

 310

 147,846

 293

 139,729

 98

 46,576

 200

 ALL

 74

 385,594

 58

 304,386

 19

 101,462

 300

 ALL

 105

 367,282

 78

 272,316

 26

 90,772

 1000

 ALL

 203

 212,400

 124

 130,305

 41

 43,435

 2200

 ALL

 212

 100,820

 136

 64,835

 45

 21,612



 Some observations from the above table

 ·Increasing the number of replicas when request.required.acks is
 none or leader only has limited impact on overall performance (additional
 resources are required to replicate data but during tests this did not
 impact producer throughput)

 ·Compression is not shown as it was found that the data generated
 for the test is not realistic to a production workload.  (GZIP compressed
 data 300:1 which is unrealistic )

 ·For some reason a message size of 1000 bytes performed the best.
 Need to look into this more.


 Thanks

 Bert