Re: replicas have different earliest offset

Jay Kreps Wed, 28 Aug 2013 15:30:48 -0700

On a single server our retention window is always approximate and a lower
bound on what is retained since we only discard full partitions at a time.
That is if you say you want to retain 100GB and have a 1GB partition size
we will discard the last partition when doing so would not bring the
retained data below 100GB (and similarly with time).

Between servers no attempt is made to synchronize the discard of data. That
is, it is likely that all replicas will discard at roughly the same time
but this is purely a local computation for each of them. Since it is
approximate and a lower bound it does not seem useful to try to synchronize
this further.

If your consumers are bumping up against the retention window so close that
they may actually be falling off that is a problem. Indeed even in the
absence of leader change it is likely that if you are lagging this much you
will eventually fall off the end of the retention window on the leader. So
this is either a problem of retention being too small (double it) or the
consumer being fundamentally unable to keep up (in which case no amount of
retention will help).

-Jay

On Wed, Aug 28, 2013 at 2:51 PM, Luke Forehand <
[email protected]> wrote:

> I'm running into strange behavior when testing failure scenarios.  I have
> 4 brokers and 8 partitions for a topic called "feed".  I wrote a piece of
> code that prints out the partitionId, leaderId, and earliest offset for
> each partition.
>
> Here is the printed information about partition leader earliest offsets:
>
> partition:0 leader:0 offset: 1676913
> partition:1 leader:1 offset: 0
> partition:2 leader:2 offset: 0
> partition:3 leader:0 offset: 1676760
> partition:4 leader:0 offset: 1676635
> partition:5 leader:1 offset: 0
> partition:6 leader:2 offset: 0
> partition:7 leader:0 offset: 1676101
>
> I then kill broker 0 (using kill <pid>) and re-run my program
>
> partition:0 leader:1 offset: 0
> partition:1 leader:1 offset: 0
> partition:2 leader:2 offset: 0
> partition:3 leader:3 offset: 0
> partition:4 leader:1 offset: 0
> partition:5 leader:1 offset: 0
> partition:6 leader:2 offset: 0
> partition:7 leader:1 offset: 0
>
> As you can see the leaders have changed where the leader was broker 0.
>  However the earliest offset has also changed.  I was under the impression
> that a replica must have the same offset range otherwise it would confuse
> the consumer of the partition.  For example I run into an issue where
> during a failover test my consumer tries to request an offset into a
> partition on the new leader but the offset didn't exist (it was earlier
> than the earliest offset in that partition).  Can anybody explain what is
> happening?
>
> Here is my code that prints the leader partition offset information:
> https://gist.github.com/lukeforehand/c37e22aea7192e00fff5
>
> Thanks,
> Luke
>
>
>

Re: replicas have different earliest offset

Reply via email to