Re: consumer groups in python
We set the partitions the python consumers needs manually for now, I'm looking into a solution using zookeeper (possibly) to balance them out automatically though. On Tue, Mar 17, 2015 at 2:51 PM, Todd Palino tpal...@gmail.com wrote: Yeah, this is exactly correct. The python client does not implement the Zookeeper logic that would be needed to do a balanced consumer. While it's certainly possible to do it (for example, Joe implemented it in Go), the logic is non-trivial and nobody has bothered to this point. I don't think anyone will, as the new consumer will make it much easier to implement clients without needing to do it. In the past, we've used an internal python module that calls a C library underneath that does the balancing. Now we're moving to one that calls our REST interface to Kafka, which is easier to work with. Another option that some consumers use is to pipe messages in from the kafka-console-consumer. This works well, but if you're not careful with stopping it you can easily lose messages. -Todd On Tue, Mar 17, 2015 at 6:47 AM, Sloot, Hans-Peter hans-peter.sl...@atos.net wrote: Thanks I just came across this https://github.com/mumrah/kafka-python/issues/112 It says: That contract of one message per consumer group only works for the coordinated consumers which are implemented for the JVM only (i.e., Scala and Java clients). -Original Message- From: Steve Miller [mailto:st...@idrathernotsay.com] Sent: Tuesday, March 17, 2015 2:18 PM To: users@kafka.apache.org Subject: Re: consumer groups in python It's possible that I just haven't used it but I am reasonably sure that the python API doesn't have a way to store offsets in ZK. You would need to implement something more or less compatible with what the Scala/Java API does, presumably. On the plus side the python API -- possibly just because in python, nothing is truly private (: -- exposes offsets and offset management in ways that those other APIs seem not to. Seeking, say, to approximately 1000 messages before the current offset is no big deal in python, nor is fetching oldest and newest offsets for topics (e.g., if you want to alert if nothing is being produced, without having to fire up a consumer). I have close to zero experience with anything other than the python API and librdkafka but judging from questions I see here those seem to be difficult to do in Scala or Java. I hope to do more with those APIs soon (and in fact am at ScalaDays right now in part so I can attend some intro Scala training (-: ). -Steve On Mar 17, 2015, at 3:54 AM, Sloot, Hans-Peter hans-peter.sl...@atos.net wrote: Hi, I wrote a small python script to consume messages from kafka. The consumer is defined as follows: kafka = KafkaConsumer('my-replicated-topic', metadata_broker_list=['localhost:9092'], group_id='my_consumer_group', auto_commit_enable=True, auto_commit_interval_ms=30 * 1000, auto_offset_reset='smallest') But when I start 2 consumers simultaneously both receive all messages from the topic. I would expect to have 1 consumer about half the number of messages and the other the rest. How can I arrange this? Regards Hans-Peter This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, Atos’ liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. On all offers and agreements under which Atos Nederland B.V. supplies goods and/or services of whatever nature, the Terms of Delivery from Atos Nederland B.V. exclusively apply. The Terms of Delivery shall be promptly submitted to you on your request. This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, Atos’ liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. On all offers and agreements under which Atos Nederland B.V. supplies goods and/or services of whatever nature, the Terms of Delivery from Atos Nederland B.V. exclusively apply
Re: consumer groups in python
Yeah, this is exactly correct. The python client does not implement the Zookeeper logic that would be needed to do a balanced consumer. While it's certainly possible to do it (for example, Joe implemented it in Go), the logic is non-trivial and nobody has bothered to this point. I don't think anyone will, as the new consumer will make it much easier to implement clients without needing to do it. In the past, we've used an internal python module that calls a C library underneath that does the balancing. Now we're moving to one that calls our REST interface to Kafka, which is easier to work with. Another option that some consumers use is to pipe messages in from the kafka-console-consumer. This works well, but if you're not careful with stopping it you can easily lose messages. -Todd On Tue, Mar 17, 2015 at 6:47 AM, Sloot, Hans-Peter hans-peter.sl...@atos.net wrote: Thanks I just came across this https://github.com/mumrah/kafka-python/issues/112 It says: That contract of one message per consumer group only works for the coordinated consumers which are implemented for the JVM only (i.e., Scala and Java clients). -Original Message- From: Steve Miller [mailto:st...@idrathernotsay.com] Sent: Tuesday, March 17, 2015 2:18 PM To: users@kafka.apache.org Subject: Re: consumer groups in python It's possible that I just haven't used it but I am reasonably sure that the python API doesn't have a way to store offsets in ZK. You would need to implement something more or less compatible with what the Scala/Java API does, presumably. On the plus side the python API -- possibly just because in python, nothing is truly private (: -- exposes offsets and offset management in ways that those other APIs seem not to. Seeking, say, to approximately 1000 messages before the current offset is no big deal in python, nor is fetching oldest and newest offsets for topics (e.g., if you want to alert if nothing is being produced, without having to fire up a consumer). I have close to zero experience with anything other than the python API and librdkafka but judging from questions I see here those seem to be difficult to do in Scala or Java. I hope to do more with those APIs soon (and in fact am at ScalaDays right now in part so I can attend some intro Scala training (-: ). -Steve On Mar 17, 2015, at 3:54 AM, Sloot, Hans-Peter hans-peter.sl...@atos.net wrote: Hi, I wrote a small python script to consume messages from kafka. The consumer is defined as follows: kafka = KafkaConsumer('my-replicated-topic', metadata_broker_list=['localhost:9092'], group_id='my_consumer_group', auto_commit_enable=True, auto_commit_interval_ms=30 * 1000, auto_offset_reset='smallest') But when I start 2 consumers simultaneously both receive all messages from the topic. I would expect to have 1 consumer about half the number of messages and the other the rest. How can I arrange this? Regards Hans-Peter This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, Atos’ liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. On all offers and agreements under which Atos Nederland B.V. supplies goods and/or services of whatever nature, the Terms of Delivery from Atos Nederland B.V. exclusively apply. The Terms of Delivery shall be promptly submitted to you on your request. This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, Atos’ liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. On all offers and agreements under which Atos Nederland B.V. supplies goods and/or services of whatever nature, the Terms of Delivery from Atos Nederland B.V. exclusively apply. The Terms of Delivery shall be promptly submitted to you on your request.
RE: consumer groups in python
Thanks I just came across this https://github.com/mumrah/kafka-python/issues/112 It says: That contract of one message per consumer group only works for the coordinated consumers which are implemented for the JVM only (i.e., Scala and Java clients). -Original Message- From: Steve Miller [mailto:st...@idrathernotsay.com] Sent: Tuesday, March 17, 2015 2:18 PM To: users@kafka.apache.org Subject: Re: consumer groups in python It's possible that I just haven't used it but I am reasonably sure that the python API doesn't have a way to store offsets in ZK. You would need to implement something more or less compatible with what the Scala/Java API does, presumably. On the plus side the python API -- possibly just because in python, nothing is truly private (: -- exposes offsets and offset management in ways that those other APIs seem not to. Seeking, say, to approximately 1000 messages before the current offset is no big deal in python, nor is fetching oldest and newest offsets for topics (e.g., if you want to alert if nothing is being produced, without having to fire up a consumer). I have close to zero experience with anything other than the python API and librdkafka but judging from questions I see here those seem to be difficult to do in Scala or Java. I hope to do more with those APIs soon (and in fact am at ScalaDays right now in part so I can attend some intro Scala training (-: ). -Steve On Mar 17, 2015, at 3:54 AM, Sloot, Hans-Peter hans-peter.sl...@atos.net wrote: Hi, I wrote a small python script to consume messages from kafka. The consumer is defined as follows: kafka = KafkaConsumer('my-replicated-topic', metadata_broker_list=['localhost:9092'], group_id='my_consumer_group', auto_commit_enable=True, auto_commit_interval_ms=30 * 1000, auto_offset_reset='smallest') But when I start 2 consumers simultaneously both receive all messages from the topic. I would expect to have 1 consumer about half the number of messages and the other the rest. How can I arrange this? Regards Hans-Peter This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, Atos’ liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. On all offers and agreements under which Atos Nederland B.V. supplies goods and/or services of whatever nature, the Terms of Delivery from Atos Nederland B.V. exclusively apply. The Terms of Delivery shall be promptly submitted to you on your request. This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, Atos’ liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. On all offers and agreements under which Atos Nederland B.V. supplies goods and/or services of whatever nature, the Terms of Delivery from Atos Nederland B.V. exclusively apply. The Terms of Delivery shall be promptly submitted to you on your request.