Re: can producers(from same system) send messages to separate broker systems?
Hi Neha No i haven't experienced any noticeable latency as of now, the high priority data is too critical for any sort of latency, that's why i wanted to optimize everything before deployment. I'm using 0.7.2 since my consumer is a storm spout, and i read that storm is most compatible with 0.7.2 Does 0.7.2 have that feature as well ?? On Sat, May 18, 2013 at 2:55 AM, Neha Narkhede neha.narkh...@gmail.comwrote: Do you have any tests that measure that your high priority data is being delayed ? Assuming you are using 0.8, the end to end latency can be reduced by tuning some configs on the consumer (fetch.min.bytes, fetch.wait.max.ms ). The defaults for these configs are already tuned for low latency though. Thanks, Neha On Tue, May 14, 2013 at 11:46 AM, Chitra Raveendran chitra.raveend...@fluturasolutions.com wrote: Thanks for the reply, I have a 3 node kafka cluster, and i have 2 topics( one of very high priority and other normal data), i need to transmit the normal data to two brokers in the cluster , and the high priority data directly to the 3rd broker. This is so that my high priority data has a clear path and can be transmitted without any delay at all. What should I do to achieve this ? Just specifying appropriate broker.list for each producer , is enough? On Tue, May 14, 2013 at 9:11 PM, Neha Narkhede neha.narkh...@gmail.com wrote: Yes there can be. You just need to make sure they are configured with separate broker.list. Thanks, Neha On May 14, 2013 5:28 AM, Andrea Gazzarini andrea.gazzar...@gmail.com wrote: Of course...what is the problem? Or maybe you're missing some other constraint? On 05/14/2013 02:20 PM, Chitra Raveendran wrote: Hi Can there be 2 producers in the same server, both sending their own separate topics to separate broker systems? -- Chitra Raveendran Data Scientist *Flutura Business Solutions Pvt. Ltd* Tel : +918197563660 email : chitra.raveend...@fluturasoutions.com -- Chitra Raveendran Data Scientist *Flutura Business Solutions Pvt. Ltd* email : chitra.raveend...@fluturasoutions.com
Re: can producers(from same system) send messages to separate broker systems?
The feature I mentioned is only available on 0.8. In 0.7.2, you can tweak producer batch size and the flush interval on the broker for the high priority topics. Note that setting those too low will have performance implications. Thanks, Neha On May 17, 2013 2:25 PM, Neha Narkhede neha.narkh...@gmail.com wrote: Do you have any tests that measure that your high priority data is being delayed ? Assuming you are using 0.8, the end to end latency can be reduced by tuning some configs on the consumer (fetch.min.bytes, fetch.wait.max.ms ). The defaults for these configs are already tuned for low latency though. Thanks, Neha On Tue, May 14, 2013 at 11:46 AM, Chitra Raveendran chitra.raveend...@fluturasolutions.com wrote: Thanks for the reply, I have a 3 node kafka cluster, and i have 2 topics( one of very high priority and other normal data), i need to transmit the normal data to two brokers in the cluster , and the high priority data directly to the 3rd broker. This is so that my high priority data has a clear path and can be transmitted without any delay at all. What should I do to achieve this ? Just specifying appropriate broker.list for each producer , is enough? On Tue, May 14, 2013 at 9:11 PM, Neha Narkhede neha.narkh...@gmail.com wrote: Yes there can be. You just need to make sure they are configured with separate broker.list. Thanks, Neha On May 14, 2013 5:28 AM, Andrea Gazzarini andrea.gazzar...@gmail.com wrote: Of course...what is the problem? Or maybe you're missing some other constraint? On 05/14/2013 02:20 PM, Chitra Raveendran wrote: Hi Can there be 2 producers in the same server, both sending their own separate topics to separate broker systems? -- Chitra Raveendran Data Scientist *Flutura Business Solutions Pvt. Ltd* Tel : +918197563660 email : chitra.raveend...@fluturasoutions.com
Re: What happens if one broker goes down
0.7.2 does not support replication. So when a broker goes down, there can be some data loss. If you are ok with duplicates, you can configure the producer side retries to be higher. Thanks, Neha On May 19, 2013 11:32 PM, Chitra Raveendran chitra.raveend...@fluturasolutions.com wrote: HI When my broker went down, my producer just stopped reading the file giving an exception saying connection refused to kafka broker. I tried overcoming the exception by giving a try and catch ! BUT I'M SEEING DATA LOSS. What am I doing wrong ??? Please help For reference : I'm using kafka 0.7.2 since my consumer is a storm spout and i read that storm is most compatible with kafka 0.7.2. IT DOESNT SUPPORT REPLICATION RIGHT? On Fri, May 17, 2013 at 6:56 PM, Neha Narkhede neha.narkh...@gmail.com wrote: You can read the high level design of kafka replication here http://www.slideshare.net/junrao/kafka-replication-apachecon2013 Generally if your replication factor is more than 1 you shouldn't see data loss in your test. When a broker fails, the producer will get an exception and it will retry. Thanks, Neha On May 16, 2013 10:21 PM, Chitra Raveendran chitra.raveend...@fluturasolutions.com wrote: Hi I was just doing some benchmarking with a 3node cluster. If one broker goes down , what happens? Does the producer go down ? that's what happened in my case. Is the data lost? Or does it get distributed amongst the other brokers? -- Chitra Raveendran Data Scientist *Flutura | Decision Sciences Analytics* mail : chitra.raveend...@fluturasoutions.com -- Chitra Raveendran Data Scientist *Flutura Business Solutions Pvt. Ltd* Tel : +918197563660 email : chitra.raveend...@fluturasoutions.com
Re: hi plz reple me
You can do that using Kafka. Please read the design details here - http://kafka.apache.org/07/design.html Thanks, Neha On May 20, 2013 6:57 AM, satya prakash satyacusa...@gmail.com wrote: i am using kafka .i need to create one message on producer side and send to multiple consumer...?
RE: are commitOffsets botched to zookeeper?
Hi Neha, Is moving to zookeeper 3.4.x is a big change ?. Can you please explain parts it affects consumer API for example ?. Thanks, Balaji -Original Message- From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: Friday, May 17, 2013 7:35 AM To: users@kafka.apache.org Subject: RE: are commitOffsets botched to zookeeper? Upgrading to a new zookeeper version is not an easy change. Also zookeeper 3.3.4 is much more stable compared to 3.4.x. We think it is better not to club 2 big changes together. So most likely this will be a post 08 item for stability purposes. Thanks, Neha On May 17, 2013 6:31 AM, Withers, Robert robert.with...@dish.com wrote: Awesome! Thanks for the clarification. I would like to offer my strong vote that this get tackled before a beta, to get it firmly into 0.8. Stabilize everything else to the existing use, but make offset updates batched. thanks, rob From: Neha Narkhede [neha.narkh...@gmail.com] Sent: Friday, May 17, 2013 7:17 AM To: users@kafka.apache.org Subject: RE: are commitOffsets botched to zookeeper? Sorry I wasn't clear. Zookeeper 3.4.x has this feature. As soon as 08 is stable and released it will be worth looking into when we can use zookeeper 3.4.x. Thanks, Neha On May 16, 2013 10:32 PM, Rob Withers reefed...@gmail.com wrote: Can a request be made to zookeeper for this feature? Thanks, rob -Original Message- From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: Thursday, May 16, 2013 9:53 PM To: users@kafka.apache.org Subject: Re: are commitOffsets botched to zookeeper? Currently Kafka depends on zookeeper 3.3.4 that doesn't have a batch write api. So if you commit after every message at a high rate, it will be slow and inefficient. Besides it will cause zookeeper performance to degrade. Thanks, Neha On May 16, 2013 6:54 PM, Rob Withers reefed...@gmail.com wrote: We are calling commitOffsets after every message consumption. It looks to be ~60% slower, with 29 partitions. If a single KafkaStream thread is from a connector, and there are 29 partitions, then commitOffsets sends 29 offset updates, correct? Are these offset updates batched in one send to zookeeper? thanks, rob
RE: are commitOffsets botched to zookeeper?
Zookeeper 3.4.x is API compatible. However to get full benefits, we will have to change kafka code to use the batch API that zookeeper 3.4.x provides. Also, we use zkclient library to interface with zookeeper. We might have to patch that to use zookeeper 3.4.x APIs. Thanks, Neha On May 20, 2013 9:36 AM, Seshadri, Balaji balaji.sesha...@dish.com wrote: Hi Neha, Is moving to zookeeper 3.4.x is a big change ?. Can you please explain parts it affects consumer API for example ?. Thanks, Balaji -Original Message- From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: Friday, May 17, 2013 7:35 AM To: users@kafka.apache.org Subject: RE: are commitOffsets botched to zookeeper? Upgrading to a new zookeeper version is not an easy change. Also zookeeper 3.3.4 is much more stable compared to 3.4.x. We think it is better not to club 2 big changes together. So most likely this will be a post 08 item for stability purposes. Thanks, Neha On May 17, 2013 6:31 AM, Withers, Robert robert.with...@dish.com wrote: Awesome! Thanks for the clarification. I would like to offer my strong vote that this get tackled before a beta, to get it firmly into 0.8. Stabilize everything else to the existing use, but make offset updates batched. thanks, rob From: Neha Narkhede [neha.narkh...@gmail.com] Sent: Friday, May 17, 2013 7:17 AM To: users@kafka.apache.org Subject: RE: are commitOffsets botched to zookeeper? Sorry I wasn't clear. Zookeeper 3.4.x has this feature. As soon as 08 is stable and released it will be worth looking into when we can use zookeeper 3.4.x. Thanks, Neha On May 16, 2013 10:32 PM, Rob Withers reefed...@gmail.com wrote: Can a request be made to zookeeper for this feature? Thanks, rob -Original Message- From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: Thursday, May 16, 2013 9:53 PM To: users@kafka.apache.org Subject: Re: are commitOffsets botched to zookeeper? Currently Kafka depends on zookeeper 3.3.4 that doesn't have a batch write api. So if you commit after every message at a high rate, it will be slow and inefficient. Besides it will cause zookeeper performance to degrade. Thanks, Neha On May 16, 2013 6:54 PM, Rob Withers reefed...@gmail.com wrote: We are calling commitOffsets after every message consumption. It looks to be ~60% slower, with 29 partitions. If a single KafkaStream thread is from a connector, and there are 29 partitions, then commitOffsets sends 29 offset updates, correct? Are these offset updates batched in one send to zookeeper? thanks, rob
Re: Relationship between Zookeeper and Kafka
My guess, EBS is likely your bottleneck. Try running on instance local disks, and compare your results. Is this 0.8? What replication factor are you using? On Mon, May 20, 2013 at 8:11 AM, Jason Weiss jason_we...@rapid7.com wrote: I'm trying to maximize my throughput and seem to have hit a ceiling. Everything described below is running in AWS. I have configured a Kafka cluster with 5 machines, M1.Large, with 600 provisioned IOPS storage for each EC2 instance. I have a Zookeeper server (we aren't in production yet, so I didn't take the time to setup a ZK cluster). Publishing to a single topic from 7 different clients, I seem to max out at around 20,000 eps with a fixed 2K message size. Each broker defines 10 file segments, with a 25000 message / 5 second flush configuration in server.properties. I have stuck with 8 threads. My producers (Java) are configured with batch.num.messages at 50, and queue.buffering.max.messages at 100. When I went from 4 servers in the cluster to 5 servers, I only saw an increase of about 500 events per second in throughput. In sharp contrast, when I run a complete environment on my MacBook Pro, tuned as described above but with a single ZK and a single Kafka broker, I am seeing 61,000 events per second. I don't think I'm network constrained in the AWS environment (producer side) because when I add one more client, my MacBook Pro, I see a proportionate decrease in EC2 client throughput, and the net result is an identical 20,000 eps. Stated differently, my EC2 instance give up throughput when my local MacBook Pro joins the array of producers such that the throughput is exactly the same. Does anyone have any additional suggestions on what else I could tune to try and hit our goal, 50,000 eps with a 5 machine cluster? Based on the whitepapers published, LinkedIn describes a peak of 170,000 events per second across their cluster. My 20,000 seems so far away from their production figures. What is the relationship, in terms of performance, between ZK and Kafka? Do I need to have a more performant ZK cluster, the same, or does it really not matter in terms of maximizing throughput. Thanks for any suggestions – I've been pulling knobs and turning levers on this for several days now. Jason This electronic message contains information which may be confidential or privileged. The information is intended for the use of the individual or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic transmission in error, please notify us by e-mail at ( postmas...@rapid7.com) immediately.
Re: Relationship between Zookeeper and Kafka
Ahh, yeah, piops is definitely faster than standard EBS, but still much slower than local disk. you could try benchmarking local disk to see what the instances you are using are capable of, then try tweaking iops etc to see where you get. M1.Larges arent super fast so your macbook beating them isnt suprising to me. On Mon, May 20, 2013 at 10:01 AM, Jason Weiss jason_we...@rapid7.comwrote: Hi Scott. I'm using Kafka 0.7.2. I am using the default replication factor, since I don't recall changing that configuration at all. I'm using provisioned IOPS, which from attending the AWS event in NYC a few weeks ago was presented as the fastest storage option for EC2. A number of partners presented success stories in terms of throughput with provisioned IOPS. I've tried to follow that model. Jason On 5/20/13 12:56 PM, Scott Clasen sc...@heroku.com wrote: My guess, EBS is likely your bottleneck. Try running on instance local disks, and compare your results. Is this 0.8? What replication factor are you using? On Mon, May 20, 2013 at 8:11 AM, Jason Weiss jason_we...@rapid7.com wrote: I'm trying to maximize my throughput and seem to have hit a ceiling. Everything described below is running in AWS. I have configured a Kafka cluster with 5 machines, M1.Large, with 600 provisioned IOPS storage for each EC2 instance. I have a Zookeeper server (we aren't in production yet, so I didn't take the time to setup a ZK cluster). Publishing to a single topic from 7 different clients, I seem to max out at around 20,000 eps with a fixed 2K message size. Each broker defines 10 file segments, with a 25000 message / 5 second flush configuration in server.properties. I have stuck with 8 threads. My producers (Java) are configured with batch.num.messages at 50, and queue.buffering.max.messages at 100. When I went from 4 servers in the cluster to 5 servers, I only saw an increase of about 500 events per second in throughput. In sharp contrast, when I run a complete environment on my MacBook Pro, tuned as described above but with a single ZK and a single Kafka broker, I am seeing 61,000 events per second. I don't think I'm network constrained in the AWS environment (producer side) because when I add one more client, my MacBook Pro, I see a proportionate decrease in EC2 client throughput, and the net result is an identical 20,000 eps. Stated differently, my EC2 instance give up throughput when my local MacBook Pro joins the array of producers such that the throughput is exactly the same. Does anyone have any additional suggestions on what else I could tune to try and hit our goal, 50,000 eps with a 5 machine cluster? Based on the whitepapers published, LinkedIn describes a peak of 170,000 events per second across their cluster. My 20,000 seems so far away from their production figures. What is the relationship, in terms of performance, between ZK and Kafka? Do I need to have a more performant ZK cluster, the same, or does it really not matter in terms of maximizing throughput. Thanks for any suggestions I've been pulling knobs and turning levers on this for several days now. Jason This electronic message contains information which may be confidential or privileged. The information is intended for the use of the individual or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic transmission in error, please notify us by e-mail at ( postmas...@rapid7.com) immediately. This electronic message contains information which may be confidential or privileged. The information is intended for the use of the individual or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic transmission in error, please notify us by e-mail at ( postmas...@rapid7.com) immediately.
Re: Update: RE: are commitOffsets botched to zookeeper?
Did so. The proposal looks perfectly sensible on first reading. I understand that the patches in https://issues.apache.org/jira/browse/KAFKA-657 are already in the trunk and scheduled for 0.8.1? Are they going out with 0.8? If not, what's ETA for 0.8.1? Either way, I'm going to try my hand at backing this with MySQL and report the results here shortly. -- If you can't conceal it well, expose it with all your might Alex Zuzin On Monday, May 20, 2013 at 10:24 AM, Neha Narkhede wrote: No problem. You can take a look at some of the thoughts we had on improving the offset storage here - https://cwiki.apache.org/confluence/display/KAFKA/ffset+Management (https://cwiki.apache.org/confluence/display/KAFKA/Offset+Management). Suggestions are welcome. Thanks, Neha On Fri, May 17, 2013 at 2:40 PM, Alex Zuzin carna...@gmail.com (mailto:carna...@gmail.com) wrote: Neha, apologies, I just re-read what I sent and realized my you wasn't specific enough - it meant the Kafka team ;). -- If you can't conceal it well, expose it with all your might Alex Zuzin On Friday, May 17, 2013 at 2:25 PM, Alex Zuzin wrote: Have you considered abstracting offset storage away so people could implement their own? Would you take a patch if I'd stabbed at it, and if yes, what's the process (pardon the n00b)? KCBO, -- If you can't conceal it well, expose it with all your might Alex Zuzin On Friday, May 17, 2013 at 2:22 PM, Neha Narkhede wrote: There is no particular need for storing the offsets in zookeeper. In fact with Kafka 0.8, since partitions will be highly available, offsets could be stored in Kafka topics. However, we haven't ironed out the design for this yet. Thanks, Neha On Fri, May 17, 2013 at 2:19 PM, Scott Clasen sc...@heroku.com (mailto:sc...@heroku.com)(mailto: sc...@heroku.com (mailto:sc...@heroku.com)) wrote: afaik you dont 'have' to store the consumed offsets in zk right, this is only automatic with some of the clients? why not store them in a data store that can write at the rate that you require? On Fri, May 17, 2013 at 2:15 PM, Withers, Robert robert.with...@dish.com (mailto:robert.with...@dish.com) wrote: Update from our OPS team, regarding zookeeper 3.4.x. Given stability, adoption of offset batching would be the only remaining bit of work to do. Still, I totally understand the restraint for 0.8... As exercise in upgradability of zookeeper, I did a out-of-thebox upgrade on Zookeeper. I downloaded a generic distribution of Apache Zookeeper and used it for the upgrade. Kafka included version of Zookeeper 3.3.3. Out of the box Apache Zookeeper 3.4.5 (which I upgraded to) Running, working great. I did *not* have to wipe out the zookeeper databases. All data stayed intact. I got a new feature, which allows auto-purging of logs. This keeps OPS maintenance to a minimum. thanks, rob -Original Message- From: Withers, Robert [mailto:robert.with...@dish.com] Sent: Friday, May 17, 2013 7:38 AM To: users@kafka.apache.org (mailto:users@kafka.apache.org) Subject: RE: are commitOffsets botched to zookeeper? Fair enough, this is something to look forward to. I appreciate the restraint you show to stay out of troubled waters. :) thanks, rob From: Neha Narkhede [neha.narkh...@gmail.com (mailto:neha.narkh...@gmail.com) (mailto: neha.narkh...@gmail.com (mailto:neha.narkh...@gmail.com))] Sent: Friday, May 17, 2013 7:35 AM To: users@kafka.apache.org (mailto:users@kafka.apache.org) Subject: RE: are commitOffsets botched to zookeeper? Upgrading to a new zookeeper version is not an easy change. Also zookeeper 3.3.4 is much more stable compared to 3.4.x. We think it is better not to club 2 big changes together. So most likely this will be a post 08 item for stability purposes. Thanks, Neha On May 17, 2013 6:31 AM, Withers, Robert robert.with...@dish.com (mailto:robert.with...@dish.com) wrote: Awesome! Thanks for the clarification. I would like to offer my strong vote that this get tackled before a beta, to get it firmly into 0.8. Stabilize everything else to the existing use, but make offset updates batched.
Re: Relationship between Zookeeper and Kafka
Hi Jason, On May 20, 2013, at 10:01am, Jason Weiss wrote: Hi Scott. I'm using Kafka 0.7.2. I am using the default replication factor, since I don't recall changing that configuration at all. I'm using provisioned IOPS, which from attending the AWS event in NYC a few weeks ago was presented as the fastest storage option for EC2. A number of partners presented success stories in terms of throughput with provisioned IOPS. I've tried to follow that model. In my experience directly hitting an ephemeral drive on m1.large is faster than using EBS. I've seen some articles where RAIDing multiple EBS volumes can exceed the performance of ephemeral drives, but with high variability. If you want to maximize performance, set up up a (smaller) cluster of SSD-backed instances with 10Gb Ethernet in the same cluster group. E.g. test with three cr1.8xlarge instances. -- Ken On 5/20/13 12:56 PM, Scott Clasen sc...@heroku.com wrote: My guess, EBS is likely your bottleneck. Try running on instance local disks, and compare your results. Is this 0.8? What replication factor are you using? On Mon, May 20, 2013 at 8:11 AM, Jason Weiss jason_we...@rapid7.com wrote: I'm trying to maximize my throughput and seem to have hit a ceiling. Everything described below is running in AWS. I have configured a Kafka cluster with 5 machines, M1.Large, with 600 provisioned IOPS storage for each EC2 instance. I have a Zookeeper server (we aren't in production yet, so I didn't take the time to setup a ZK cluster). Publishing to a single topic from 7 different clients, I seem to max out at around 20,000 eps with a fixed 2K message size. Each broker defines 10 file segments, with a 25000 message / 5 second flush configuration in server.properties. I have stuck with 8 threads. My producers (Java) are configured with batch.num.messages at 50, and queue.buffering.max.messages at 100. When I went from 4 servers in the cluster to 5 servers, I only saw an increase of about 500 events per second in throughput. In sharp contrast, when I run a complete environment on my MacBook Pro, tuned as described above but with a single ZK and a single Kafka broker, I am seeing 61,000 events per second. I don't think I'm network constrained in the AWS environment (producer side) because when I add one more client, my MacBook Pro, I see a proportionate decrease in EC2 client throughput, and the net result is an identical 20,000 eps. Stated differently, my EC2 instance give up throughput when my local MacBook Pro joins the array of producers such that the throughput is exactly the same. Does anyone have any additional suggestions on what else I could tune to try and hit our goal, 50,000 eps with a 5 machine cluster? Based on the whitepapers published, LinkedIn describes a peak of 170,000 events per second across their cluster. My 20,000 seems so far away from their production figures. What is the relationship, in terms of performance, between ZK and Kafka? Do I need to have a more performant ZK cluster, the same, or does it really not matter in terms of maximizing throughput. Thanks for any suggestions I've been pulling knobs and turning levers on this for several days now. Jason -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions training Hadoop, Cascading, Cassandra Solr
About Kafka Users Group around Hadoop Summit
Jun and Neha, Is there any plan for Kafka Users group meeting around Hadoop Summit? It was done last year. It really works well for people like me who don't live in SF Bay Area. A session on 0.8 would be great. Regards, Vaibhav Puranik GumGum
Re: About Kafka Users Group around Hadoop Summit
Great idea Vaibhav! I would also be interested in this as I live in Denver and don't get to the Bay area too often. -Jonathan On Mon, May 20, 2013 at 2:35 PM, Vaibhav Puranik vpura...@gmail.com wrote: Jun and Neha, Is there any plan for Kafka Users group meeting around Hadoop Summit? It was done last year. It really works well for people like me who don't live in SF Bay Area. A session on 0.8 would be great. Regards, Vaibhav Puranik GumGum
RE: Update: RE: are commitOffsets botched to zookeeper?
Yes, it looks spot on. Thanks, rob -Original Message- From: Alex Zuzin [mailto:carna...@gmail.com] Sent: Monday, May 20, 2013 11:37 AM To: users@kafka.apache.org Subject: Re: Update: RE: are commitOffsets botched to zookeeper? Did so. The proposal looks perfectly sensible on first reading. I understand that the patches in https://issues.apache.org/jira/browse/KAFKA-657 are already in the trunk and scheduled for 0.8.1? Are they going out with 0.8? If not, what's ETA for 0.8.1? Either way, I'm going to try my hand at backing this with MySQL and report the results here shortly. -- If you can't conceal it well, expose it with all your might Alex Zuzin On Monday, May 20, 2013 at 10:24 AM, Neha Narkhede wrote: No problem. You can take a look at some of the thoughts we had on improving the offset storage here - https://cwiki.apache.org/confluence/display/KAFKA/ffset+Management (https://cwiki.apache.org/confluence/display/KAFKA/Offset+Management). Suggestions are welcome. Thanks, Neha On Fri, May 17, 2013 at 2:40 PM, Alex Zuzin carna...@gmail.com (mailto:carna...@gmail.com) wrote: Neha, apologies, I just re-read what I sent and realized my you wasn't specific enough - it meant the Kafka team ;). -- If you can't conceal it well, expose it with all your might Alex Zuzin On Friday, May 17, 2013 at 2:25 PM, Alex Zuzin wrote: Have you considered abstracting offset storage away so people could implement their own? Would you take a patch if I'd stabbed at it, and if yes, what's the process (pardon the n00b)? KCBO, -- If you can't conceal it well, expose it with all your might Alex Zuzin On Friday, May 17, 2013 at 2:22 PM, Neha Narkhede wrote: There is no particular need for storing the offsets in zookeeper. In fact with Kafka 0.8, since partitions will be highly available, offsets could be stored in Kafka topics. However, we haven't ironed out the design for this yet. Thanks, Neha On Fri, May 17, 2013 at 2:19 PM, Scott Clasen sc...@heroku.com (mailto:sc...@heroku.com)(mailto: sc...@heroku.com (mailto:sc...@heroku.com)) wrote: afaik you dont 'have' to store the consumed offsets in zk right, this is only automatic with some of the clients? why not store them in a data store that can write at the rate that you require? On Fri, May 17, 2013 at 2:15 PM, Withers, Robert robert.with...@dish.com (mailto:robert.with...@dish.com) wrote: Update from our OPS team, regarding zookeeper 3.4.x. Given stability, adoption of offset batching would be the only remaining bit of work to do. Still, I totally understand the restraint for 0.8... As exercise in upgradability of zookeeper, I did a out-of-thebox upgrade on Zookeeper. I downloaded a generic distribution of Apache Zookeeper and used it for the upgrade. Kafka included version of Zookeeper 3.3.3. Out of the box Apache Zookeeper 3.4.5 (which I upgraded to) Running, working great. I did *not* have to wipe out the zookeeper databases. All data stayed intact. I got a new feature, which allows auto-purging of logs. This keeps OPS maintenance to a minimum. thanks, rob -Original Message- From: Withers, Robert [mailto:robert.with...@dish.com] Sent: Friday, May 17, 2013 7:38 AM To: users@kafka.apache.org (mailto:users@kafka.apache.org) Subject: RE: are commitOffsets botched to zookeeper? Fair enough, this is something to look forward to. I appreciate the restraint you show to stay out of troubled waters. :) thanks, rob From: Neha Narkhede [neha.narkh...@gmail.com (mailto:neha.narkh...@gmail.com) (mailto: neha.narkh...@gmail.com (mailto:neha.narkh...@gmail.com))] Sent: Friday, May 17, 2013 7:35 AM To: users@kafka.apache.org (mailto:users@kafka.apache.org) Subject: RE: are commitOffsets botched to zookeeper? Upgrading to a new zookeeper version is not an easy change. Also zookeeper 3.3.4 is much more stable compared to 3.4.x. We think it is better not to club 2 big changes together. So most likely this will be a post 08 item for stability purposes. Thanks, Neha On May 17, 2013 6:31 AM, Withers, Robert robert.with...@dish.com (mailto:robert.with...@dish.com) wrote: Awesome!
RE: only-once consumer groups
That page is packed full of super design! Many of these features we would find useful, I think. One thing I found myself not knowing is what a consumer rebalance actually is. Is a rebalance when the thread that is consuming a particular partition dies, this is detected and the partition is reassigned to a new thread, thus violating manual partitioning? Or is a rebalance when a broker dies and new leader partitions are elected? The simple consumer must get told of a leader election and this is nothing to do with rebalance, is it so? Thanks, rob -Original Message- From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: Friday, May 17, 2013 7:32 AM To: users@kafka.apache.org Subject: RE: only-once consumer groups We spent some time thinking about consolidating the high level and low level consumer APIs. It will be great if you can read the wiki and provide feedback - https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re- Design Thanks, Neha On May 16, 2013 10:29 PM, Rob Withers reefed...@gmail.com wrote: We want to ensure only-once message processing, but we also want the benefit of rebalancing. commitOffsets updates all partitions from out of a connector instance. We want to commit the offset for just the partition that delivered a message to the iterator, even if several fetchers are feeding a thread. Perhaps the message metadata contains the partition a msg came from; could you not just update the offset for that partition if the property only.once=true is sent to the consumer connector? Thanks, rob -Original Message- From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: Thursday, May 16, 2013 10:03 PM To: users@kafka.apache.org Subject: Re: only-once consumer groups Can you describe your requirements in a little more detail? Thanks, Neha On May 16, 2013 6:11 AM, Withers, Robert robert.with...@dish.com wrote: is it technically feasible to use an only-once simple consumer within a consumer group? thanks, rob
RE: only-once consumer groups
Rob, A consumer rebalances whenever a consumer process dies or a new consumer process joins the group. The details of the algorithm can be found here http://kafka.apache.org/07/design.html Thanks, Neha On May 20, 2013 6:45 PM, Rob Withers reefed...@gmail.com wrote: That page is packed full of super design! Many of these features we would find useful, I think. One thing I found myself not knowing is what a consumer rebalance actually is. Is a rebalance when the thread that is consuming a particular partition dies, this is detected and the partition is reassigned to a new thread, thus violating manual partitioning? Or is a rebalance when a broker dies and new leader partitions are elected? The simple consumer must get told of a leader election and this is nothing to do with rebalance, is it so? Thanks, rob -Original Message- From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: Friday, May 17, 2013 7:32 AM To: users@kafka.apache.org Subject: RE: only-once consumer groups We spent some time thinking about consolidating the high level and low level consumer APIs. It will be great if you can read the wiki and provide feedback - https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re- Design Thanks, Neha On May 16, 2013 10:29 PM, Rob Withers reefed...@gmail.com wrote: We want to ensure only-once message processing, but we also want the benefit of rebalancing. commitOffsets updates all partitions from out of a connector instance. We want to commit the offset for just the partition that delivered a message to the iterator, even if several fetchers are feeding a thread. Perhaps the message metadata contains the partition a msg came from; could you not just update the offset for that partition if the property only.once=true is sent to the consumer connector? Thanks, rob -Original Message- From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: Thursday, May 16, 2013 10:03 PM To: users@kafka.apache.org Subject: Re: only-once consumer groups Can you describe your requirements in a little more detail? Thanks, Neha On May 16, 2013 6:11 AM, Withers, Robert robert.with...@dish.com wrote: is it technically feasible to use an only-once simple consumer within a consumer group? thanks, rob
RE: only-once consumer groups
Right, Neha, consumer groups are from 7, while replicas are in 8. Does this mean the simple consumer in 8 can recognize a leader change? Thanks, rob -Original Message- From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: Monday, May 20, 2013 8:00 PM To: users@kafka.apache.org Subject: RE: only-once consumer groups Rob, A consumer rebalances whenever a consumer process dies or a new consumer process joins the group. The details of the algorithm can be found here http://kafka.apache.org/07/design.html Thanks, Neha On May 20, 2013 6:45 PM, Rob Withers reefed...@gmail.com wrote: That page is packed full of super design! Many of these features we would find useful, I think. One thing I found myself not knowing is what a consumer rebalance actually is. Is a rebalance when the thread that is consuming a particular partition dies, this is detected and the partition is reassigned to a new thread, thus violating manual partitioning? Or is a rebalance when a broker dies and new leader partitions are elected? The simple consumer must get told of a leader election and this is nothing to do with rebalance, is it so? Thanks, rob -Original Message- From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: Friday, May 17, 2013 7:32 AM To: users@kafka.apache.org Subject: RE: only-once consumer groups We spent some time thinking about consolidating the high level and low level consumer APIs. It will be great if you can read the wiki and provide feedback - https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re - Design Thanks, Neha On May 16, 2013 10:29 PM, Rob Withers reefed...@gmail.com wrote: We want to ensure only-once message processing, but we also want the benefit of rebalancing. commitOffsets updates all partitions from out of a connector instance. We want to commit the offset for just the partition that delivered a message to the iterator, even if several fetchers are feeding a thread. Perhaps the message metadata contains the partition a msg came from; could you not just update the offset for that partition if the property only.once=true is sent to the consumer connector? Thanks, rob -Original Message- From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: Thursday, May 16, 2013 10:03 PM To: users@kafka.apache.org Subject: Re: only-once consumer groups Can you describe your requirements in a little more detail? Thanks, Neha On May 16, 2013 6:11 AM, Withers, Robert robert.with...@dish.com wrote: is it technically feasible to use an only-once simple consumer within a consumer group? thanks, rob
RE: only-once consumer groups
Sorry for being unclear Neha. I meant that I had forgotten that the introduction of replicas is happening in 0.8 and I was confusing the two. Thanks, rob -Original Message- From: Rob Withers [mailto:reefed...@gmail.com] Sent: Monday, May 20, 2013 8:21 PM To: 'users@kafka.apache.org' Subject: RE: only-once consumer groups Right, Neha, consumer groups are from 7, while replicas are in 8. Does this mean the simple consumer in 8 can recognize a leader change? Thanks, rob -Original Message- From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: Monday, May 20, 2013 8:00 PM To: users@kafka.apache.org Subject: RE: only-once consumer groups Rob, A consumer rebalances whenever a consumer process dies or a new consumer process joins the group. The details of the algorithm can be found here http://kafka.apache.org/07/design.html Thanks, Neha On May 20, 2013 6:45 PM, Rob Withers reefed...@gmail.com wrote: That page is packed full of super design! Many of these features we would find useful, I think. One thing I found myself not knowing is what a consumer rebalance actually is. Is a rebalance when the thread that is consuming a particular partition dies, this is detected and the partition is reassigned to a new thread, thus violating manual partitioning? Or is a rebalance when a broker dies and new leader partitions are elected? The simple consumer must get told of a leader election and this is nothing to do with rebalance, is it so? Thanks, rob -Original Message- From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: Friday, May 17, 2013 7:32 AM To: users@kafka.apache.org Subject: RE: only-once consumer groups We spent some time thinking about consolidating the high level and low level consumer APIs. It will be great if you can read the wiki and provide feedback - https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re - Design Thanks, Neha On May 16, 2013 10:29 PM, Rob Withers reefed...@gmail.com wrote: We want to ensure only-once message processing, but we also want the benefit of rebalancing. commitOffsets updates all partitions from out of a connector instance. We want to commit the offset for just the partition that delivered a message to the iterator, even if several fetchers are feeding a thread. Perhaps the message metadata contains the partition a msg came from; could you not just update the offset for that partition if the property only.once=true is sent to the consumer connector? Thanks, rob -Original Message- From: Neha Narkhede [mailto:neha.narkh...@gmail.com] Sent: Thursday, May 16, 2013 10:03 PM To: users@kafka.apache.org Subject: Re: only-once consumer groups Can you describe your requirements in a little more detail? Thanks, Neha On May 16, 2013 6:11 AM, Withers, Robert robert.with...@dish.com wrote: is it technically feasible to use an only-once simple consumer within a consumer group? thanks, rob
Re: About Kafka Users Group around Hadoop Summit
Yes, we can have a Kafka user group meeting then. We could do this in one of the evenings (Tue, Wed, or Thu). What will people prefer? Also, there will be a Kafka talk in Hadoop summit too. Thanks, Jun On Mon, May 20, 2013 at 1:35 PM, Vaibhav Puranik vpura...@gmail.com wrote: Jun and Neha, Is there any plan for Kafka Users group meeting around Hadoop Summit? It was done last year. It really works well for people like me who don't live in SF Bay Area. A session on 0.8 would be great. Regards, Vaibhav Puranik GumGum