On a single server our retention window is always approximate and a lower bound on what is retained since we only discard full partitions at a time. That is if you say you want to retain 100GB and have a 1GB partition size we will discard the last partition when doing so would not bring the retained data below 100GB (and similarly with time).
Between servers no attempt is made to synchronize the discard of data. That is, it is likely that all replicas will discard at roughly the same time but this is purely a local computation for each of them. Since it is approximate and a lower bound it does not seem useful to try to synchronize this further. If your consumers are bumping up against the retention window so close that they may actually be falling off that is a problem. Indeed even in the absence of leader change it is likely that if you are lagging this much you will eventually fall off the end of the retention window on the leader. So this is either a problem of retention being too small (double it) or the consumer being fundamentally unable to keep up (in which case no amount of retention will help). -Jay On Wed, Aug 28, 2013 at 2:51 PM, Luke Forehand < [email protected]> wrote: > I'm running into strange behavior when testing failure scenarios. I have > 4 brokers and 8 partitions for a topic called "feed". I wrote a piece of > code that prints out the partitionId, leaderId, and earliest offset for > each partition. > > Here is the printed information about partition leader earliest offsets: > > partition:0 leader:0 offset: 1676913 > partition:1 leader:1 offset: 0 > partition:2 leader:2 offset: 0 > partition:3 leader:0 offset: 1676760 > partition:4 leader:0 offset: 1676635 > partition:5 leader:1 offset: 0 > partition:6 leader:2 offset: 0 > partition:7 leader:0 offset: 1676101 > > I then kill broker 0 (using kill <pid>) and re-run my program > > partition:0 leader:1 offset: 0 > partition:1 leader:1 offset: 0 > partition:2 leader:2 offset: 0 > partition:3 leader:3 offset: 0 > partition:4 leader:1 offset: 0 > partition:5 leader:1 offset: 0 > partition:6 leader:2 offset: 0 > partition:7 leader:1 offset: 0 > > As you can see the leaders have changed where the leader was broker 0. > However the earliest offset has also changed. I was under the impression > that a replica must have the same offset range otherwise it would confuse > the consumer of the partition. For example I run into an issue where > during a failover test my consumer tries to request an offset into a > partition on the new leader but the offset didn't exist (it was earlier > than the earliest offset in that partition). Can anybody explain what is > happening? > > Here is my code that prints the leader partition offset information: > https://gist.github.com/lukeforehand/c37e22aea7192e00fff5 > > Thanks, > Luke > > >
