Re: can producers(from same system) send messages to separate broker systems?

2013-05-20 Thread Chitra Raveendran
Hi Neha

No i haven't experienced any noticeable latency as of now, the high
priority data is too critical for any sort of latency, that's why i wanted
to optimize  everything before deployment.

I'm using 0.7.2 since my consumer is a storm spout, and i read that storm
is most compatible with 0.7.2

Does 0.7.2 have that feature as well ??





On Sat, May 18, 2013 at 2:55 AM, Neha Narkhede neha.narkh...@gmail.comwrote:

 Do you have any tests that measure that your high priority data is being
 delayed ? Assuming you are using 0.8, the end to end latency can be reduced
 by tuning some configs on the consumer (fetch.min.bytes, fetch.wait.max.ms
 ).
 The defaults for these configs are already tuned for low latency though.

 Thanks,
 Neha


 On Tue, May 14, 2013 at 11:46 AM, Chitra Raveendran 
 chitra.raveend...@fluturasolutions.com wrote:

  Thanks for the reply,
 
  I have a 3 node kafka cluster, and i have 2 topics( one of very high
  priority and other normal data), i need to transmit the normal data to
 two
  brokers in the cluster , and the high priority data directly to the 3rd
  broker.
 
  This is so that my high priority data has a clear path and can be
  transmitted without any delay at all.
 
  What should I do to achieve this ? Just specifying appropriate
 broker.list
  for each producer , is enough?
 
 
  On Tue, May 14, 2013 at 9:11 PM, Neha Narkhede neha.narkh...@gmail.com
  wrote:
 
   Yes there can be. You just need to make sure they are configured with
   separate broker.list.
  
   Thanks,
   Neha
   On May 14, 2013 5:28 AM, Andrea Gazzarini 
 andrea.gazzar...@gmail.com
   wrote:
  
Of course...what is the problem? Or maybe you're missing some other
constraint?
   
On 05/14/2013 02:20 PM, Chitra Raveendran wrote:
   
Hi
   
Can  there be 2 producers in the same server, both sending their own
separate topics to separate broker systems?
   
   
   
  
 
 
 
  --
  Chitra Raveendran
  Data Scientist
  *Flutura Business Solutions Pvt. Ltd*
  Tel : +918197563660
  email  : chitra.raveend...@fluturasoutions.com
 




-- 
Chitra Raveendran
Data Scientist
*Flutura Business Solutions Pvt. Ltd*
email  : chitra.raveend...@fluturasoutions.com


Re: can producers(from same system) send messages to separate broker systems?

2013-05-20 Thread Neha Narkhede
The feature I mentioned is only available on 0.8. In 0.7.2, you can tweak
producer batch size and the flush interval on the broker for the high
priority topics. Note that setting those too low will have performance
implications.

Thanks,
Neha
On May 17, 2013 2:25 PM, Neha Narkhede neha.narkh...@gmail.com wrote:

 Do you have any tests that measure that your high priority data is being
 delayed ? Assuming you are using 0.8, the end to end latency can be reduced
 by tuning some configs on the consumer (fetch.min.bytes, fetch.wait.max.ms
 ).
 The defaults for these configs are already tuned for low latency though.

 Thanks,
 Neha


 On Tue, May 14, 2013 at 11:46 AM, Chitra Raveendran 
 chitra.raveend...@fluturasolutions.com wrote:

 Thanks for the reply,

 I have a 3 node kafka cluster, and i have 2 topics( one of very high
 priority and other normal data), i need to transmit the normal data to two
 brokers in the cluster , and the high priority data directly to the 3rd
 broker.

 This is so that my high priority data has a clear path and can be
 transmitted without any delay at all.

 What should I do to achieve this ? Just specifying appropriate broker.list
 for each producer , is enough?


 On Tue, May 14, 2013 at 9:11 PM, Neha Narkhede neha.narkh...@gmail.com
 wrote:

  Yes there can be. You just need to make sure they are configured with
  separate broker.list.
 
  Thanks,
  Neha
  On May 14, 2013 5:28 AM, Andrea Gazzarini andrea.gazzar...@gmail.com
 
  wrote:
 
   Of course...what is the problem? Or maybe you're missing some other
   constraint?
  
   On 05/14/2013 02:20 PM, Chitra Raveendran wrote:
  
   Hi
  
   Can  there be 2 producers in the same server, both sending their own
   separate topics to separate broker systems?
  
  
  
 



 --
 Chitra Raveendran
 Data Scientist
 *Flutura Business Solutions Pvt. Ltd*
 Tel : +918197563660
 email  : chitra.raveend...@fluturasoutions.com





Re: What happens if one broker goes down

2013-05-20 Thread Neha Narkhede
0.7.2 does not support replication. So when a broker goes down, there can
be some data loss. If you are ok with duplicates, you can configure the
producer side retries to be higher.

Thanks,
Neha
On May 19, 2013 11:32 PM, Chitra Raveendran 
chitra.raveend...@fluturasolutions.com wrote:

 HI

 When my broker went down, my producer just stopped reading the file giving
 an exception saying connection refused to kafka broker.
 I tried overcoming the exception by giving a try and catch ! BUT I'M SEEING
 DATA LOSS. What am I doing wrong ??? Please help

 For reference :
 I'm using kafka 0.7.2 since my consumer is a storm spout and i read that
 storm is most compatible with kafka 0.7.2.
 IT DOESNT SUPPORT REPLICATION RIGHT?


 On Fri, May 17, 2013 at 6:56 PM, Neha Narkhede neha.narkh...@gmail.com
 wrote:

  You can read the high level design of kafka replication here
  http://www.slideshare.net/junrao/kafka-replication-apachecon2013
 
  Generally if your replication factor is more than 1 you shouldn't see
 data
  loss in your test. When a broker fails, the producer will get an
 exception
  and it will retry.
 
  Thanks,
  Neha
  On May 16, 2013 10:21 PM, Chitra Raveendran 
  chitra.raveend...@fluturasolutions.com wrote:
 
   Hi
  
   I was just doing some benchmarking with a 3node cluster. If one broker
  goes
   down , what happens? Does the producer go down ? that's what happened
 in
  my
   case.
   Is the data lost? Or does it get distributed amongst the other brokers?
  
   --
   Chitra Raveendran
   Data Scientist
   *Flutura |  Decision Sciences  Analytics*
   mail  : chitra.raveend...@fluturasoutions.com
  
 



 --
 Chitra Raveendran
 Data Scientist
 *Flutura Business Solutions Pvt. Ltd*
 Tel : +918197563660
 email  : chitra.raveend...@fluturasoutions.com



Re: hi plz reple me

2013-05-20 Thread Neha Narkhede
You can do that using Kafka. Please read the design details here -
http://kafka.apache.org/07/design.html

Thanks,
Neha
On May 20, 2013 6:57 AM, satya prakash satyacusa...@gmail.com wrote:

 i am using kafka .i need to create one message on producer side and send to
 multiple consumer...?



RE: are commitOffsets botched to zookeeper?

2013-05-20 Thread Seshadri, Balaji
Hi Neha,

Is moving to zookeeper 3.4.x is a big change ?.

Can you please explain parts it affects consumer API for example ?.

Thanks,

Balaji
-Original Message-
From: Neha Narkhede [mailto:neha.narkh...@gmail.com] 
Sent: Friday, May 17, 2013 7:35 AM
To: users@kafka.apache.org
Subject: RE: are commitOffsets botched to zookeeper?

Upgrading to a new zookeeper version is not an easy change. Also zookeeper
3.3.4 is much more stable compared to 3.4.x. We think it is better not to
club 2 big changes together. So most likely this will be a post 08 item for
stability purposes.

Thanks,
Neha
On May 17, 2013 6:31 AM, Withers, Robert robert.with...@dish.com wrote:

 Awesome!  Thanks for the clarification.  I would like to offer my strong
 vote that this get tackled before a beta, to get it firmly into 0.8.
 Stabilize everything else to the existing use, but make offset updates
 batched.

 thanks,
 rob
 
 From: Neha Narkhede [neha.narkh...@gmail.com]
 Sent: Friday, May 17, 2013 7:17 AM
 To: users@kafka.apache.org
 Subject: RE: are commitOffsets botched to zookeeper?

 Sorry I wasn't clear. Zookeeper 3.4.x has this feature. As soon as 08 is
 stable and released it will be worth looking into when we can use zookeeper
 3.4.x.

 Thanks,
 Neha
 On May 16, 2013 10:32 PM, Rob Withers reefed...@gmail.com wrote:

  Can a request be made to zookeeper for this feature?
 
  Thanks,
  rob
 
   -Original Message-
   From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
   Sent: Thursday, May 16, 2013 9:53 PM
   To: users@kafka.apache.org
   Subject: Re: are commitOffsets botched to zookeeper?
  
   Currently Kafka depends on zookeeper 3.3.4 that doesn't have a batch
  write
   api. So if you commit after every message at a high rate, it will be
 slow
  and
   inefficient. Besides it will cause zookeeper performance to degrade.
  
   Thanks,
   Neha
   On May 16, 2013 6:54 PM, Rob Withers reefed...@gmail.com wrote:
  
We are calling commitOffsets after every message consumption.  It
looks to be ~60% slower, with 29 partitions.  If a single KafkaStream
thread is from a connector, and there are 29 partitions, then
commitOffsets sends 29 offset updates, correct?  Are these offset
updates batched in one send to zookeeper?
   
thanks,
rob
 
 



RE: are commitOffsets botched to zookeeper?

2013-05-20 Thread Neha Narkhede
Zookeeper 3.4.x is API compatible. However to get full benefits, we will
have to change kafka code to use the batch API that zookeeper 3.4.x
provides. Also, we use zkclient library to interface with zookeeper. We
might have to patch that to use zookeeper 3.4.x APIs.

Thanks,
Neha
On May 20, 2013 9:36 AM, Seshadri, Balaji balaji.sesha...@dish.com
wrote:

 Hi Neha,

 Is moving to zookeeper 3.4.x is a big change ?.

 Can you please explain parts it affects consumer API for example ?.

 Thanks,

 Balaji
 -Original Message-
 From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
 Sent: Friday, May 17, 2013 7:35 AM
 To: users@kafka.apache.org
 Subject: RE: are commitOffsets botched to zookeeper?

 Upgrading to a new zookeeper version is not an easy change. Also zookeeper
 3.3.4 is much more stable compared to 3.4.x. We think it is better not to
 club 2 big changes together. So most likely this will be a post 08 item for
 stability purposes.

 Thanks,
 Neha
 On May 17, 2013 6:31 AM, Withers, Robert robert.with...@dish.com
 wrote:

  Awesome!  Thanks for the clarification.  I would like to offer my strong
  vote that this get tackled before a beta, to get it firmly into 0.8.
  Stabilize everything else to the existing use, but make offset updates
  batched.
 
  thanks,
  rob
  
  From: Neha Narkhede [neha.narkh...@gmail.com]
  Sent: Friday, May 17, 2013 7:17 AM
  To: users@kafka.apache.org
  Subject: RE: are commitOffsets botched to zookeeper?
 
  Sorry I wasn't clear. Zookeeper 3.4.x has this feature. As soon as 08 is
  stable and released it will be worth looking into when we can use
 zookeeper
  3.4.x.
 
  Thanks,
  Neha
  On May 16, 2013 10:32 PM, Rob Withers reefed...@gmail.com wrote:
 
   Can a request be made to zookeeper for this feature?
  
   Thanks,
   rob
  
-Original Message-
From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
Sent: Thursday, May 16, 2013 9:53 PM
To: users@kafka.apache.org
Subject: Re: are commitOffsets botched to zookeeper?
   
Currently Kafka depends on zookeeper 3.3.4 that doesn't have a batch
   write
api. So if you commit after every message at a high rate, it will be
  slow
   and
inefficient. Besides it will cause zookeeper performance to degrade.
   
Thanks,
Neha
On May 16, 2013 6:54 PM, Rob Withers reefed...@gmail.com wrote:
   
 We are calling commitOffsets after every message consumption.  It
 looks to be ~60% slower, with 29 partitions.  If a single
 KafkaStream
 thread is from a connector, and there are 29 partitions, then
 commitOffsets sends 29 offset updates, correct?  Are these offset
 updates batched in one send to zookeeper?

 thanks,
 rob
  
  
 



Re: Relationship between Zookeeper and Kafka

2013-05-20 Thread Scott Clasen
My guess, EBS is likely your bottleneck.  Try running on instance local
disks, and compare your results.  Is this 0.8? What replication factor are
you using?


On Mon, May 20, 2013 at 8:11 AM, Jason Weiss jason_we...@rapid7.com wrote:

 I'm trying to maximize my throughput and seem to have hit a ceiling.
 Everything described below is running in AWS.

 I have configured a Kafka cluster with 5 machines, M1.Large, with 600
 provisioned IOPS storage for each EC2 instance. I have a Zookeeper server
 (we aren't in production yet, so I didn't take the time to setup a ZK
 cluster). Publishing to a single topic from 7 different clients, I seem to
 max out at around 20,000 eps with a fixed 2K message size. Each broker
 defines 10 file segments, with a 25000 message / 5 second flush
 configuration in server.properties. I have stuck with 8 threads. My
 producers (Java) are configured with batch.num.messages at 50, and
 queue.buffering.max.messages at 100.

 When I went from 4 servers in the cluster to 5 servers, I only saw an
 increase of about 500 events per second in throughput. In sharp contrast,
 when I run a complete environment on my MacBook Pro, tuned as described
 above but with a single ZK and a single Kafka broker, I am seeing 61,000
 events per second. I don't think I'm network constrained in the AWS
 environment (producer side) because when I add one more client, my MacBook
 Pro, I see a proportionate decrease in EC2 client throughput, and the net
 result is an identical 20,000 eps. Stated differently, my EC2 instance give
 up throughput when my local MacBook Pro joins the array of producers such
 that the throughput is exactly the same.

 Does anyone have any additional suggestions on what else I could tune to
 try and hit our goal, 50,000 eps with a 5 machine cluster? Based on the
 whitepapers published, LinkedIn describes a peak of 170,000 events per
 second across their cluster. My 20,000 seems so far away from their
 production figures.

 What is the relationship, in terms of performance, between ZK and Kafka?
 Do I need to have a more performant ZK cluster, the same, or does it really
 not matter in terms of maximizing throughput.

 Thanks for any suggestions – I've been pulling knobs and turning levers on
 this for several days now.


 Jason

 This electronic message contains information which may be confidential or
 privileged. The information is intended for the use of the individual or
 entity named above. If you are not the intended recipient, be aware that
 any disclosure, copying, distribution or use of the contents of this
 information is prohibited. If you have received this electronic
 transmission in error, please notify us by e-mail at (
 postmas...@rapid7.com) immediately.



Re: Relationship between Zookeeper and Kafka

2013-05-20 Thread Scott Clasen
Ahh, yeah, piops is definitely faster than standard EBS, but still much
slower than local disk.

you could try benchmarking local disk to see what the instances you are
using are capable of, then try tweaking iops etc to see where you get.

  M1.Larges arent super fast so your macbook beating them isnt suprising to
me.


On Mon, May 20, 2013 at 10:01 AM, Jason Weiss jason_we...@rapid7.comwrote:

 Hi Scott.

 I'm using Kafka 0.7.2. I am using the default replication factor, since I
 don't recall changing that configuration at all.

 I'm using provisioned IOPS, which from attending the AWS event in NYC a
 few weeks ago was presented as the fastest storage option for EC2. A
 number of partners presented success stories in terms of throughput with
 provisioned IOPS. I've tried to follow that model.


 Jason

 On 5/20/13 12:56 PM, Scott Clasen sc...@heroku.com wrote:

 My guess, EBS is likely your bottleneck.  Try running on instance local
 disks, and compare your results.  Is this 0.8? What replication factor are
 you using?
 
 
 On Mon, May 20, 2013 at 8:11 AM, Jason Weiss jason_we...@rapid7.com
 wrote:
 
  I'm trying to maximize my throughput and seem to have hit a ceiling.
  Everything described below is running in AWS.
 
  I have configured a Kafka cluster with 5 machines, M1.Large, with 600
  provisioned IOPS storage for each EC2 instance. I have a Zookeeper
 server
  (we aren't in production yet, so I didn't take the time to setup a ZK
  cluster). Publishing to a single topic from 7 different clients, I seem
 to
  max out at around 20,000 eps with a fixed 2K message size. Each broker
  defines 10 file segments, with a 25000 message / 5 second flush
  configuration in server.properties. I have stuck with 8 threads. My
  producers (Java) are configured with batch.num.messages at 50, and
  queue.buffering.max.messages at 100.
 
  When I went from 4 servers in the cluster to 5 servers, I only saw an
  increase of about 500 events per second in throughput. In sharp
 contrast,
  when I run a complete environment on my MacBook Pro, tuned as described
  above but with a single ZK and a single Kafka broker, I am seeing 61,000
  events per second. I don't think I'm network constrained in the AWS
  environment (producer side) because when I add one more client, my
 MacBook
  Pro, I see a proportionate decrease in EC2 client throughput, and the
 net
  result is an identical 20,000 eps. Stated differently, my EC2 instance
 give
  up throughput when my local MacBook Pro joins the array of producers
 such
  that the throughput is exactly the same.
 
  Does anyone have any additional suggestions on what else I could tune to
  try and hit our goal, 50,000 eps with a 5 machine cluster? Based on the
  whitepapers published, LinkedIn describes a peak of 170,000 events per
  second across their cluster. My 20,000 seems so far away from their
  production figures.
 
  What is the relationship, in terms of performance, between ZK and Kafka?
  Do I need to have a more performant ZK cluster, the same, or does it
 really
  not matter in terms of maximizing throughput.
 
  Thanks for any suggestions ­ I've been pulling knobs and turning levers
 on
  this for several days now.
 
 
  Jason
 
  This electronic message contains information which may be confidential
 or
  privileged. The information is intended for the use of the individual or
  entity named above. If you are not the intended recipient, be aware that
  any disclosure, copying, distribution or use of the contents of this
  information is prohibited. If you have received this electronic
  transmission in error, please notify us by e-mail at (
  postmas...@rapid7.com) immediately.
 

 This electronic message contains information which may be confidential or
 privileged. The information is intended for the use of the individual or
 entity named above. If you are not the intended recipient, be aware that
 any disclosure, copying, distribution or use of the contents of this
 information is prohibited. If you have received this electronic
 transmission in error, please notify us by e-mail at (
 postmas...@rapid7.com) immediately.




Re: Update: RE: are commitOffsets botched to zookeeper?

2013-05-20 Thread Alex Zuzin
Did so. The proposal looks perfectly sensible on first reading.

I understand that the patches in 
https://issues.apache.org/jira/browse/KAFKA-657 are already in the trunk and 
scheduled for 0.8.1? Are they going out with 0.8? If not, what's ETA for 0.8.1?

Either way, I'm going to try my hand at backing this with MySQL and report the 
results here shortly.

-- 
If you can't conceal it well, expose it with all your might
Alex Zuzin


On Monday, May 20, 2013 at 10:24 AM, Neha Narkhede wrote:

 No problem. You can take a look at some of the thoughts we had on improving
 the offset storage here -
 https://cwiki.apache.org/confluence/display/KAFKA/ffset+Management 
 (https://cwiki.apache.org/confluence/display/KAFKA/Offset+Management).
 Suggestions are welcome.
 
 Thanks,
 Neha
 
 
 On Fri, May 17, 2013 at 2:40 PM, Alex Zuzin carna...@gmail.com 
 (mailto:carna...@gmail.com) wrote:
 
  Neha,
  
  apologies, I just re-read what I sent and realized my you wasn't
  specific enough - it meant the Kafka team ;).
  
  --
  If you can't conceal it well, expose it with all your might
  Alex Zuzin
  
  
  On Friday, May 17, 2013 at 2:25 PM, Alex Zuzin wrote:
  
   Have you considered abstracting offset storage away so people could
  implement their own?
   Would you take a patch if I'd stabbed at it, and if yes, what's the
  
  process (pardon the n00b)?
   
   KCBO,
   --
   If you can't conceal it well, expose it with all your might
   Alex Zuzin
   
   
   On Friday, May 17, 2013 at 2:22 PM, Neha Narkhede wrote:
   
There is no particular need for storing the offsets in zookeeper. In
  fact
with Kafka 0.8, since partitions will be highly available, offsets
   
  
  could be
stored in Kafka topics. However, we haven't ironed out the design for
   
  
  this
yet.

Thanks,
Neha


On Fri, May 17, 2013 at 2:19 PM, Scott Clasen sc...@heroku.com 
(mailto:sc...@heroku.com)(mailto:
  sc...@heroku.com (mailto:sc...@heroku.com)) wrote:

 afaik you dont 'have' to store the consumed offsets in zk right,
  this is
 only automatic with some of the clients?
 
 why not store them in a data store that can write at the rate that
  you
 require?
 
 
 On Fri, May 17, 2013 at 2:15 PM, Withers, Robert 
  robert.with...@dish.com (mailto:robert.with...@dish.com)
  wrote:
 
 
 
  Update from our OPS team, regarding zookeeper 3.4.x. Given
  stability,
  adoption of offset batching would be the only remaining bit of
 

   
  
  work to
  
 
 
 do.
  Still, I totally understand the restraint for 0.8...
  
  
  As exercise in upgradability of zookeeper, I did a
  out-of-thebox
  upgrade on Zookeeper. I downloaded a generic distribution of Apache
  Zookeeper and used it for the upgrade.
  
  Kafka included version of Zookeeper 3.3.3.
  Out of the box Apache Zookeeper 3.4.5 (which I upgraded to)
  
  Running, working great. I did *not* have to wipe out the zookeeper
  databases. All data stayed intact.
  
  I got a new feature, which allows auto-purging of logs. This keeps
  OPS
  maintenance to a minimum.
  
  
  thanks,
  rob
  
  
  -Original Message-
  From: Withers, Robert [mailto:robert.with...@dish.com]
  Sent: Friday, May 17, 2013 7:38 AM
  To: users@kafka.apache.org (mailto:users@kafka.apache.org)
  Subject: RE: are commitOffsets botched to zookeeper?
  
  Fair enough, this is something to look forward to. I appreciate the
  restraint you show to stay out of troubled waters. :)
  
  thanks,
  rob
  
  
  From: Neha Narkhede [neha.narkh...@gmail.com 
  (mailto:neha.narkh...@gmail.com) (mailto:
  
 

   
  
  neha.narkh...@gmail.com (mailto:neha.narkh...@gmail.com))]
  Sent: Friday, May 17, 2013 7:35 AM
  To: users@kafka.apache.org (mailto:users@kafka.apache.org)
  Subject: RE: are commitOffsets botched to zookeeper?
  
  Upgrading to a new zookeeper version is not an easy change. Also
 zookeeper
  3.3.4 is much more stable compared to 3.4.x. We think it is better
 
 

   
  
  not to
  club 2 big changes together. So most likely this will be a post 08
 

   
  
  item
  
 
 
 for
  stability purposes.
  
  Thanks,
  Neha
  On May 17, 2013 6:31 AM, Withers, Robert 
  
 
 

   
  
  robert.with...@dish.com (mailto:robert.with...@dish.com)
  wrote:
  
   Awesome! Thanks for the clarification. I would like to offer my
   strong vote that this get tackled before a beta, to get it
   
  
  
 

   
  
  firmly into
   
  
  
  0.8.
   Stabilize everything else to the existing use, but make offset
  
  
 

   
  
  updates
   batched.
   
   

Re: Relationship between Zookeeper and Kafka

2013-05-20 Thread Ken Krugler
Hi Jason,

On May 20, 2013, at 10:01am, Jason Weiss wrote:

 Hi Scott.
 
 I'm using Kafka 0.7.2. I am using the default replication factor, since I
 don't recall changing that configuration at all.
 
 I'm using provisioned IOPS, which from attending the AWS event in NYC a
 few weeks ago was presented as the fastest storage option for EC2. A
 number of partners presented success stories in terms of throughput with
 provisioned IOPS. I've tried to follow that model.

In my experience directly hitting an ephemeral drive on m1.large is faster than 
using EBS.

I've seen some articles where RAIDing multiple EBS volumes can exceed the 
performance of ephemeral drives, but with high variability.

If you want to maximize performance, set up up a (smaller) cluster of 
SSD-backed instances with 10Gb Ethernet in the same cluster group.

E.g. test with three cr1.8xlarge instances.

-- Ken


 On 5/20/13 12:56 PM, Scott Clasen sc...@heroku.com wrote:
 
 My guess, EBS is likely your bottleneck.  Try running on instance local
 disks, and compare your results.  Is this 0.8? What replication factor are
 you using?
 
 
 On Mon, May 20, 2013 at 8:11 AM, Jason Weiss jason_we...@rapid7.com
 wrote:
 
 I'm trying to maximize my throughput and seem to have hit a ceiling.
 Everything described below is running in AWS.
 
 I have configured a Kafka cluster with 5 machines, M1.Large, with 600
 provisioned IOPS storage for each EC2 instance. I have a Zookeeper
 server
 (we aren't in production yet, so I didn't take the time to setup a ZK
 cluster). Publishing to a single topic from 7 different clients, I seem
 to
 max out at around 20,000 eps with a fixed 2K message size. Each broker
 defines 10 file segments, with a 25000 message / 5 second flush
 configuration in server.properties. I have stuck with 8 threads. My
 producers (Java) are configured with batch.num.messages at 50, and
 queue.buffering.max.messages at 100.
 
 When I went from 4 servers in the cluster to 5 servers, I only saw an
 increase of about 500 events per second in throughput. In sharp
 contrast,
 when I run a complete environment on my MacBook Pro, tuned as described
 above but with a single ZK and a single Kafka broker, I am seeing 61,000
 events per second. I don't think I'm network constrained in the AWS
 environment (producer side) because when I add one more client, my
 MacBook
 Pro, I see a proportionate decrease in EC2 client throughput, and the
 net
 result is an identical 20,000 eps. Stated differently, my EC2 instance
 give
 up throughput when my local MacBook Pro joins the array of producers
 such
 that the throughput is exactly the same.
 
 Does anyone have any additional suggestions on what else I could tune to
 try and hit our goal, 50,000 eps with a 5 machine cluster? Based on the
 whitepapers published, LinkedIn describes a peak of 170,000 events per
 second across their cluster. My 20,000 seems so far away from their
 production figures.
 
 What is the relationship, in terms of performance, between ZK and Kafka?
 Do I need to have a more performant ZK cluster, the same, or does it
 really
 not matter in terms of maximizing throughput.
 
 Thanks for any suggestions ­ I've been pulling knobs and turning levers
 on
 this for several days now.
 
 
 Jason
 

--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions  training
Hadoop, Cascading, Cassandra  Solr







About Kafka Users Group around Hadoop Summit

2013-05-20 Thread Vaibhav Puranik
Jun and Neha,

Is there any plan for Kafka Users group meeting around Hadoop Summit?

It was done last year.  It really works well for  people like me who don't
live in SF Bay Area.
A session on 0.8 would be great.

Regards,
Vaibhav Puranik
GumGum


Re: About Kafka Users Group around Hadoop Summit

2013-05-20 Thread Jonathan Hodges
Great idea Vaibhav!  I would also be interested in this as I live in Denver
and don't get to the Bay area too often.

-Jonathan



On Mon, May 20, 2013 at 2:35 PM, Vaibhav Puranik vpura...@gmail.com wrote:

 Jun and Neha,

 Is there any plan for Kafka Users group meeting around Hadoop Summit?

 It was done last year.  It really works well for  people like me who don't
 live in SF Bay Area.
 A session on 0.8 would be great.

 Regards,
 Vaibhav Puranik
 GumGum



RE: Update: RE: are commitOffsets botched to zookeeper?

2013-05-20 Thread Rob Withers
Yes, it looks spot on.

Thanks,
rob

 -Original Message-
 From: Alex Zuzin [mailto:carna...@gmail.com]
 Sent: Monday, May 20, 2013 11:37 AM
 To: users@kafka.apache.org
 Subject: Re: Update: RE: are commitOffsets botched to zookeeper?
 
 Did so. The proposal looks perfectly sensible on first reading.
 
 I understand that the patches in
 https://issues.apache.org/jira/browse/KAFKA-657 are already in the trunk
 and scheduled for 0.8.1? Are they going out with 0.8? If not, what's ETA for
 0.8.1?
 
 Either way, I'm going to try my hand at backing this with MySQL and report
 the results here shortly.
 
 --
 If you can't conceal it well, expose it with all your might
 Alex Zuzin
 
 
 On Monday, May 20, 2013 at 10:24 AM, Neha Narkhede wrote:
 
  No problem. You can take a look at some of the thoughts we had on
 improving
  the offset storage here -
  https://cwiki.apache.org/confluence/display/KAFKA/ffset+Management
 (https://cwiki.apache.org/confluence/display/KAFKA/Offset+Management).
  Suggestions are welcome.
 
  Thanks,
  Neha
 
 
  On Fri, May 17, 2013 at 2:40 PM, Alex Zuzin carna...@gmail.com
 (mailto:carna...@gmail.com) wrote:
 
   Neha,
  
   apologies, I just re-read what I sent and realized my you wasn't
   specific enough - it meant the Kafka team ;).
  
   --
   If you can't conceal it well, expose it with all your might
   Alex Zuzin
  
  
   On Friday, May 17, 2013 at 2:25 PM, Alex Zuzin wrote:
  
Have you considered abstracting offset storage away so people could
   implement their own?
Would you take a patch if I'd stabbed at it, and if yes, what's the
  
   process (pardon the n00b)?
   
KCBO,
--
If you can't conceal it well, expose it with all your might
Alex Zuzin
   
   
On Friday, May 17, 2013 at 2:22 PM, Neha Narkhede wrote:
   
 There is no particular need for storing the offsets in zookeeper. In
   fact
 with Kafka 0.8, since partitions will be highly available, offsets
   
  
   could be
 stored in Kafka topics. However, we haven't ironed out the design for
   
  
   this
 yet.

 Thanks,
 Neha


 On Fri, May 17, 2013 at 2:19 PM, Scott Clasen sc...@heroku.com
 (mailto:sc...@heroku.com)(mailto:
   sc...@heroku.com (mailto:sc...@heroku.com)) wrote:

  afaik you dont 'have' to store the consumed offsets in zk right,
   this is
  only automatic with some of the clients?
 
  why not store them in a data store that can write at the rate that
   you
  require?
 
 
  On Fri, May 17, 2013 at 2:15 PM, Withers, Robert 
   robert.with...@dish.com (mailto:robert.with...@dish.com)
   wrote:
 
 
 
   Update from our OPS team, regarding zookeeper 3.4.x. Given
   stability,
   adoption of offset batching would be the only remaining bit of
 

   
  
   work to
  
 
 
  do.
   Still, I totally understand the restraint for 0.8...
  
  
   As exercise in upgradability of zookeeper, I did a
   out-of-thebox
   upgrade on Zookeeper. I downloaded a generic distribution of
 Apache
   Zookeeper and used it for the upgrade.
  
   Kafka included version of Zookeeper 3.3.3.
   Out of the box Apache Zookeeper 3.4.5 (which I upgraded to)
  
   Running, working great. I did *not* have to wipe out the
 zookeeper
   databases. All data stayed intact.
  
   I got a new feature, which allows auto-purging of logs. This keeps
   OPS
   maintenance to a minimum.
  
  
   thanks,
   rob
  
  
   -Original Message-
   From: Withers, Robert [mailto:robert.with...@dish.com]
   Sent: Friday, May 17, 2013 7:38 AM
   To: users@kafka.apache.org (mailto:users@kafka.apache.org)
   Subject: RE: are commitOffsets botched to zookeeper?
  
   Fair enough, this is something to look forward to. I appreciate 
   the
   restraint you show to stay out of troubled waters. :)
  
   thanks,
   rob
  
   
   From: Neha Narkhede [neha.narkh...@gmail.com
 (mailto:neha.narkh...@gmail.com) (mailto:
  
 

   
  
   neha.narkh...@gmail.com (mailto:neha.narkh...@gmail.com))]
   Sent: Friday, May 17, 2013 7:35 AM
   To: users@kafka.apache.org (mailto:users@kafka.apache.org)
   Subject: RE: are commitOffsets botched to zookeeper?
  
   Upgrading to a new zookeeper version is not an easy change. Also
  zookeeper
   3.3.4 is much more stable compared to 3.4.x. We think it is better
 
 

   
  
   not to
   club 2 big changes together. So most likely this will be a post 08
 

   
  
   item
  
 
 
  for
   stability purposes.
  
   Thanks,
   Neha
   On May 17, 2013 6:31 AM, Withers, Robert 
  
 
 

   
  
   robert.with...@dish.com (mailto:robert.with...@dish.com)
   wrote:
  
Awesome! 

RE: only-once consumer groups

2013-05-20 Thread Rob Withers
That page is packed full of super design!  Many of these features we would
find useful, I think.  One thing I found myself not knowing is what a
consumer rebalance actually is.  

Is a rebalance when the thread that is consuming a particular partition
dies, this is detected and the partition is reassigned to a new thread, thus
violating manual partitioning?  

Or is a rebalance when a broker dies and new leader partitions are elected?
The simple consumer must get told of a leader election and this is nothing
to do with rebalance, is it so?

Thanks,
rob


 -Original Message-
 From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
 Sent: Friday, May 17, 2013 7:32 AM
 To: users@kafka.apache.org
 Subject: RE: only-once consumer groups
 
 We spent some time thinking about consolidating the high level and low
level
 consumer APIs. It will be great if you can read the wiki and provide
feedback
 - https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-
 Design
 
 Thanks,
 Neha
 On May 16, 2013 10:29 PM, Rob Withers reefed...@gmail.com wrote:
 
  We want to ensure only-once message processing, but we also want the
  benefit of rebalancing.  commitOffsets updates all partitions from out
  of a connector instance.  We want to commit the offset for just the
  partition that delivered a message to the iterator, even if several
  fetchers are feeding a thread.  Perhaps the message metadata contains
  the partition a msg came from; could you not just update the offset
  for that partition if the property only.once=true is sent to the
  consumer connector?
 
  Thanks,
  rob
 
   -Original Message-
   From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
   Sent: Thursday, May 16, 2013 10:03 PM
   To: users@kafka.apache.org
   Subject: Re: only-once consumer groups
  
   Can you describe your requirements in a little more detail?
  
   Thanks,
   Neha
   On May 16, 2013 6:11 AM, Withers, Robert
 robert.with...@dish.com
   wrote:
  
is it technically feasible to use an only-once simple consumer
within a consumer group?
   
thanks,
rob
 
 



RE: only-once consumer groups

2013-05-20 Thread Neha Narkhede
Rob,

A consumer rebalances whenever a consumer process dies or a new consumer
process joins the group. The details of the algorithm can be found here
http://kafka.apache.org/07/design.html

Thanks,
Neha
On May 20, 2013 6:45 PM, Rob Withers reefed...@gmail.com wrote:

 That page is packed full of super design!  Many of these features we would
 find useful, I think.  One thing I found myself not knowing is what a
 consumer rebalance actually is.

 Is a rebalance when the thread that is consuming a particular partition
 dies, this is detected and the partition is reassigned to a new thread,
 thus
 violating manual partitioning?

 Or is a rebalance when a broker dies and new leader partitions are elected?
 The simple consumer must get told of a leader election and this is nothing
 to do with rebalance, is it so?

 Thanks,
 rob


  -Original Message-
  From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
  Sent: Friday, May 17, 2013 7:32 AM
  To: users@kafka.apache.org
  Subject: RE: only-once consumer groups
 
  We spent some time thinking about consolidating the high level and low
 level
  consumer APIs. It will be great if you can read the wiki and provide
 feedback
  - https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-
  Design
 
  Thanks,
  Neha
  On May 16, 2013 10:29 PM, Rob Withers reefed...@gmail.com wrote:
 
   We want to ensure only-once message processing, but we also want the
   benefit of rebalancing.  commitOffsets updates all partitions from out
   of a connector instance.  We want to commit the offset for just the
   partition that delivered a message to the iterator, even if several
   fetchers are feeding a thread.  Perhaps the message metadata contains
   the partition a msg came from; could you not just update the offset
   for that partition if the property only.once=true is sent to the
   consumer connector?
  
   Thanks,
   rob
  
-Original Message-
From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
Sent: Thursday, May 16, 2013 10:03 PM
To: users@kafka.apache.org
Subject: Re: only-once consumer groups
   
Can you describe your requirements in a little more detail?
   
Thanks,
Neha
On May 16, 2013 6:11 AM, Withers, Robert
  robert.with...@dish.com
wrote:
   
 is it technically feasible to use an only-once simple consumer
 within a consumer group?

 thanks,
 rob
  
  




RE: only-once consumer groups

2013-05-20 Thread Rob Withers
Right, Neha, consumer groups are from 7, while replicas are in 8.  Does this
mean the simple consumer in 8 can recognize a leader change?

Thanks,
rob

 -Original Message-
 From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
 Sent: Monday, May 20, 2013 8:00 PM
 To: users@kafka.apache.org
 Subject: RE: only-once consumer groups
 
 Rob,
 
 A consumer rebalances whenever a consumer process dies or a new
 consumer process joins the group. The details of the algorithm can be
found
 here http://kafka.apache.org/07/design.html
 
 Thanks,
 Neha
 On May 20, 2013 6:45 PM, Rob Withers reefed...@gmail.com wrote:
 
  That page is packed full of super design!  Many of these features we
  would find useful, I think.  One thing I found myself not knowing is
  what a consumer rebalance actually is.
 
  Is a rebalance when the thread that is consuming a particular
  partition dies, this is detected and the partition is reassigned to a
  new thread, thus violating manual partitioning?
 
  Or is a rebalance when a broker dies and new leader partitions are
elected?
  The simple consumer must get told of a leader election and this is
  nothing to do with rebalance, is it so?
 
  Thanks,
  rob
 
 
   -Original Message-
   From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
   Sent: Friday, May 17, 2013 7:32 AM
   To: users@kafka.apache.org
   Subject: RE: only-once consumer groups
  
   We spent some time thinking about consolidating the high level and
   low
  level
   consumer APIs. It will be great if you can read the wiki and provide
  feedback
   -
  
 https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re
   -
   Design
  
   Thanks,
   Neha
   On May 16, 2013 10:29 PM, Rob Withers reefed...@gmail.com wrote:
  
We want to ensure only-once message processing, but we also want
the benefit of rebalancing.  commitOffsets updates all partitions
from out of a connector instance.  We want to commit the offset
for just the partition that delivered a message to the iterator,
even if several fetchers are feeding a thread.  Perhaps the
message metadata contains the partition a msg came from; could you
not just update the offset for that partition if the property
only.once=true is sent to the consumer connector?
   
Thanks,
rob
   
 -Original Message-
 From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
 Sent: Thursday, May 16, 2013 10:03 PM
 To: users@kafka.apache.org
 Subject: Re: only-once consumer groups

 Can you describe your requirements in a little more detail?

 Thanks,
 Neha
 On May 16, 2013 6:11 AM, Withers, Robert
   robert.with...@dish.com
 wrote:

  is it technically feasible to use an only-once simple consumer
  within a consumer group?
 
  thanks,
  rob
   
   
 
 



RE: only-once consumer groups

2013-05-20 Thread Rob Withers
Sorry for being unclear Neha.  I meant that I had forgotten that the
introduction of replicas is happening in 0.8 and I was confusing the two.  

Thanks,
rob

 -Original Message-
 From: Rob Withers [mailto:reefed...@gmail.com]
 Sent: Monday, May 20, 2013 8:21 PM
 To: 'users@kafka.apache.org'
 Subject: RE: only-once consumer groups
 
 Right, Neha, consumer groups are from 7, while replicas are in 8.  Does
this
 mean the simple consumer in 8 can recognize a leader change?
 
 Thanks,
 rob
 
  -Original Message-
  From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
  Sent: Monday, May 20, 2013 8:00 PM
  To: users@kafka.apache.org
  Subject: RE: only-once consumer groups
 
  Rob,
 
  A consumer rebalances whenever a consumer process dies or a new
  consumer process joins the group. The details of the algorithm can be
  found here http://kafka.apache.org/07/design.html
 
  Thanks,
  Neha
  On May 20, 2013 6:45 PM, Rob Withers reefed...@gmail.com wrote:
 
   That page is packed full of super design!  Many of these features we
   would find useful, I think.  One thing I found myself not knowing is
   what a consumer rebalance actually is.
  
   Is a rebalance when the thread that is consuming a particular
   partition dies, this is detected and the partition is reassigned to
   a new thread, thus violating manual partitioning?
  
   Or is a rebalance when a broker dies and new leader partitions are
 elected?
   The simple consumer must get told of a leader election and this is
   nothing to do with rebalance, is it so?
  
   Thanks,
   rob
  
  
-Original Message-
From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
Sent: Friday, May 17, 2013 7:32 AM
To: users@kafka.apache.org
Subject: RE: only-once consumer groups
   
We spent some time thinking about consolidating the high level and
low
   level
consumer APIs. It will be great if you can read the wiki and
provide
   feedback
-
   
  https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re
-
Design
   
Thanks,
Neha
On May 16, 2013 10:29 PM, Rob Withers reefed...@gmail.com
 wrote:
   
 We want to ensure only-once message processing, but we also want
 the benefit of rebalancing.  commitOffsets updates all
 partitions from out of a connector instance.  We want to commit
 the offset for just the partition that delivered a message to
 the iterator, even if several fetchers are feeding a thread.
 Perhaps the message metadata contains the partition a msg came
 from; could you not just update the offset for that partition if
 the property only.once=true is sent to the consumer connector?

 Thanks,
 rob

  -Original Message-
  From: Neha Narkhede [mailto:neha.narkh...@gmail.com]
  Sent: Thursday, May 16, 2013 10:03 PM
  To: users@kafka.apache.org
  Subject: Re: only-once consumer groups
 
  Can you describe your requirements in a little more detail?
 
  Thanks,
  Neha
  On May 16, 2013 6:11 AM, Withers, Robert
robert.with...@dish.com
  wrote:
 
   is it technically feasible to use an only-once simple
   consumer within a consumer group?
  
   thanks,
   rob


  
  



Re: About Kafka Users Group around Hadoop Summit

2013-05-20 Thread Jun Rao
Yes, we can have a Kafka user group meeting then. We could do this in one
of the evenings (Tue, Wed, or Thu). What will people prefer?

Also, there will be a Kafka talk in Hadoop summit too.

Thanks,

Jun


On Mon, May 20, 2013 at 1:35 PM, Vaibhav Puranik vpura...@gmail.com wrote:

 Jun and Neha,

 Is there any plan for Kafka Users group meeting around Hadoop Summit?

 It was done last year.  It really works well for  people like me who don't
 live in SF Bay Area.
 A session on 0.8 would be great.

 Regards,
 Vaibhav Puranik
 GumGum