Re: a few questions from high level consumer documentation.
You understanding is correct. There should be no message loss, unless the # of correlated failures is larger than the replication factor. Thanks, Jun On Mon, May 13, 2013 at 8:46 AM, Yu, Libo libo...@citi.com wrote: Thanks for answering my questions. Now I know why the offset is saved in zookeepers. If a consumer group has only one consumer, when it fails and restarts, I assume it starts consuming from the offset saved in the zookeeper. Is that right? If that is the case, then the consumer client does not need to worry about duplicate Messages. Is there any chance that messages will be lost ? Regards, Libo
Re: a few questions from high level consumer documentation.
Thanks, Neha On May 9, 2013 5:28 AM, Chris Curtin curtin.ch...@gmail.com wrote: On Thu, May 9, 2013 at 12:36 AM, Rob Withers reefed...@gmail.com wrote: -Original Message- From: Chris Curtin [mailto:curtin.ch...@gmail.com] 1 When you say the iterator may block, do you mean hasNext() may block? Yes. Is this due to a potential non-blocking fetch (broker/zookeeper returns an empty block if offset is current)? Yet this blocks the network call of the consumer iterator, do I have that right? Are there other reasons it could block? Like the call fails and a backup call is made? I'll let the Kafka team answer this. I don't know the low level details. It is because the consumer could be at the tail end and new data could arrive at the server at a later time. The consumer is blocking by default to handle a continuous stream of data. b. For client crash, what can client do to avoid duplicate messages when restarted? What I can think of is to read last message from log file and ignore the first few received duplicate messages until receiving the last read message. But is it possible for client to read log file directly? If you can't tolerate the possibility of duplicates you need to look at the Simple Consumer example, There you control the offset storage. Do you have example code that manages only once, even when a consumer for a given partition goes away? No, but if you look at the Simple Consumer example where the read occurs (and the write to System.out) at that point you know the offset you just read, so you need to put it somewhere. Using the Simple Consumer Kafka leaves all the offset management to you. What does happen with rebalancing when a consumer goes away? Hmm, I can't find the link to the algorithm right now. Jun or Neha can you? You can find the algorithm on the design page. http://kafka.apache.org/07/design.html Is this behavior of the high-level consumer group? Yes. Is there a way to supply one's own simple consumer with only once, within a consumer group that rebalances? No. Simple Consumers don't have rebalancing steps. Basically you take control of what is requested from which topics and partitions. So you could ask for a specific offset in a topic/partition 100 times in a row and Kafka will happily return it to you. Nothing is written to ZooKeeper either, you control everything. What happens if a producer goes away? Shouldn't matter to the consumers. The Brokers are what the consumers talk to, so if nothing is writing the Broker won't have anything to send. thanks much, rob
a few questions from high level consumer documentation.
Hi, I read this link https://cwiki.apache.org/KAFKA/consumer-group-example.html and have a few questions (if not too many). 1 When you say the iterator may block, do you mean hasNext() may block? 2 Remember, you can only use a single process per Consumer Group. Do you mean we can only use a single process on one node of the cluster for a consumer group? Or there can be only one process on the whole cluster for a consumer group? Please clarify on this. 3 Why save offset to zookeeper? Is it easier to save it to a local file? 4 When client exits/crashes or leader for a partition is changed, duplicate messages may be replayed. To help avoid this (replayed duplicate messages), make sure you provide a clean way for your client to exit instead of assuming it can be 'kill -9'd. a. For client exit, if the client is receiving data at the time, how to do a clean exit? How can client tell consumer to write offset to zookeepr before exiting? b. For client crash, what can client do to avoid duplicate messages when restarted? What I can think of is to read last message from log file and ignore the first few received duplicate messages until receiving the last read message. But is it possible for client to read log file directly? c. For the change of the partition leader, is there anything that clients can do to avoid duplicates? Thanks. Libo
Re: a few questions from high level consumer documentation.
I'll try to answer some, the Kafka team will need to answer the others: On Wed, May 8, 2013 at 12:17 PM, Yu, Libo libo...@citi.com wrote: Hi, I read this link https://cwiki.apache.org/KAFKA/consumer-group-example.html and have a few questions (if not too many). 1 When you say the iterator may block, do you mean hasNext() may block? Yes. 2 Remember, you can only use a single process per Consumer Group. Do you mean we can only use a single process on one node of the cluster for a consumer group? Or there can be only one process on the whole cluster for a consumer group? Please clarify on this. Bug. I'll change it. When I wrote this I mis-understood the re-balancing step. I missed this reference but fixed the others. Sorry 3 Why save offset to zookeeper? Is it easier to save it to a local file? 4 When client exits/crashes or leader for a partition is changed, duplicate messages may be replayed. To help avoid this (replayed duplicate messages), make sure you provide a clean way for your client to exit instead of assuming it can be 'kill -9'd. a. For client exit, if the client is receiving data at the time, how to do a clean exit? How can client tell consumer to write offset to zookeepr before exiting? If you call the shutdown() method on the Consumer it will cleanly stop, releasing any blocked iterators. In the example it goes to sleep for a few seconds then cleanly shuts down. b. For client crash, what can client do to avoid duplicate messages when restarted? What I can think of is to read last message from log file and ignore the first few received duplicate messages until receiving the last read message. But is it possible for client to read log file directly? If you can't tolerate the possibility of duplicates you need to look at the Simple Consumer example, There you control the offset storage. c. For the change of the partition leader, is there anything that clients can do to avoid duplicates? Thanks. Libo
Re: a few questions from high level consumer documentation.
For #3, we need to checkpoint offsets to a central place so that if a consumer fails, another consumer in the same group can pick up from where it's left off. For #4c, leader change doesn't introduce duplicates. Thanks, Jun On Wed, May 8, 2013 at 9:17 AM, Yu, Libo libo...@citi.com wrote: Hi, I read this link https://cwiki.apache.org/KAFKA/consumer-group-example.html and have a few questions (if not too many). 1 When you say the iterator may block, do you mean hasNext() may block? 2 Remember, you can only use a single process per Consumer Group. Do you mean we can only use a single process on one node of the cluster for a consumer group? Or there can be only one process on the whole cluster for a consumer group? Please clarify on this. 3 Why save offset to zookeeper? Is it easier to save it to a local file? 4 When client exits/crashes or leader for a partition is changed, duplicate messages may be replayed. To help avoid this (replayed duplicate messages), make sure you provide a clean way for your client to exit instead of assuming it can be 'kill -9'd. a. For client exit, if the client is receiving data at the time, how to do a clean exit? How can client tell consumer to write offset to zookeepr before exiting? b. For client crash, what can client do to avoid duplicate messages when restarted? What I can think of is to read last message from log file and ignore the first few received duplicate messages until receiving the last read message. But is it possible for client to read log file directly? c. For the change of the partition leader, is there anything that clients can do to avoid duplicates? Thanks. Libo