Hey Nicolas, There are two topics in question here:
1. Checkpoint topic. 2. State topic. (1) is used to store the input offsets for every SystemStreamPartition that's consumed by your StreamTask. (2) is used to store the writes/deletes by your StreamTask to a state store. The first thing to establish is what you mean by "lost". In Kafka 0.8+, topics are assumed to be highly available. If a cluster knows about a topic (i.e. It has an entry in ZooKeeper), but the brokers that contain the data for the topic are not available, consumers/producers will get either a timeout (if they were talking to a dead broker) or a "no leader for partition" exception. Now, if the topic/partition exists, and a leader is available for the partition, but the leader has no data, then we might call it "lost". In this case, a consumer/producer just sees the topic/partition as empty. With this definition, we can "lose" either the checkpoint topic or the state store topic. If the checkpoint topic is lost, the Samza job will start using samza.offset.default. Likewise, if the state store topic is "lost", Samza would just treat the store as empty. Cheers, Chris On 7/21/14 11:25 AM, "Nicolas Bär" <[email protected]> wrote: >Hi All > >I'm looking into the case of Kafka broker failures with regard to offset >management and state checkpointing. As far as I understood the offsets and >state checkpoints are stored in separate Kafka topics and happen at the >same time. Therefore if a samza job fails it will restore the state from >the corresponding Kafka topics. >Now what happens if for example the offset topic is available, but the >data >of the checkpoint topic is lost due to Kafka broker failures? I'm guessing >Kafka restores the offset and continues as if there was never any state >stored. Is this correct? > > >Best >Nicolas
