Hi Jun,

Please find my comments inline.


On Thu, Apr 18, 2024 at 3:26 AM Jun Rao <j...@confluent.io.invalid> wrote:

> Hi, Abhijeet,
>
> Thanks for the reply.
>
> 1. I am wondering if we could achieve the same result by just lowering
> local.retention.ms and local.retention.bytes. This also allows the newly
> started follower to build up the local data before serving the consumer
> traffic.
>

I am not sure I fully followed this. Do you mean we could lower the
local.retention (by size and time)
so that there is little data on the leader's local storage so that the
follower can quickly catch up with the leader?

In that case, we will need to set small local retention across brokers in
the cluster. It will have the undesired
effect where there will be increased remote log fetches for serving consume
requests, and this can cause
degradations. Also, this behaviour (of increased remote fetches) will
happen on all brokers at all times, whereas in
the KIP we are restricting the behavior only to the newly bootstrapped
brokers and only until the time it fully builds
the local logs as per retention defined at the cluster level.
(Deprioritization of the broker could help reduce the impact
 even further)


>
> 2. Have you updated the KIP?
>

The KIP has been updated now.


>
> Thanks,
>
> Jun
>
> On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana <satish.dugg...@gmail.com>
> wrote:
>
> > +1 to Jun for adding the consumer fetching from a follower scenario
> > also to the existing section that talked about the drawback when a
> > node built with last-tiered-offset has become a leader. As Abhijeet
> > mentioned, we plan to have a follow-up KIP that will address these by
> > having a deprioritzation of these brokers. The deprioritization of
> > those brokers can be removed once they catchup until the local log
> > retention.
> >
> > Thanks,
> > Satish.
> >
> > On Tue, 9 Apr 2024 at 14:08, Luke Chen <show...@gmail.com> wrote:
> > >
> > > Hi Abhijeet,
> > >
> > > Thanks for the KIP to improve the tiered storage feature!
> > >
> > > Questions:
> > > 1. We could also get the "pending-upload-offset" and epoch via remote
> log
> > > metadata, instead of adding a new API to fetch from the leader. Could
> you
> > > explain why you choose the later approach, instead of the former?
> > > 2.
> > > > We plan to have a follow-up KIP that will address both the
> > > deprioritization
> > > of these brokers from leadership, as well as
> > > for consumption (when fetching from followers is allowed).
> > >
> > > I agree with Jun that we might need to make it clear all possible
> > drawbacks
> > > that could have. So, could we add the drawbacks that Jun mentioned
> about
> > > the performance issue when consumer fetch from follower?
> > >
> > > 3. Could we add "Rejected Alternatives" section to the end of the KIP
> to
> > > add some of them?
> > > Like the "ListOffsetRequest" approach VS
> "Earliest-Pending-Upload-Offset"
> > > approach, or getting the "Earliest-Pending-Upload-Offset" from remote
> log
> > > metadata... etc.
> > >
> > > Thanks.
> > > Luke
> > >
> > >
> > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar <
> > abhijeet.cse....@gmail.com>
> > > wrote:
> > >
> > > > Hi Christo,
> > > >
> > > > Please find my comments inline.
> > > >
> > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov <
> christolo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello Abhijeet and Jun,
> > > > >
> > > > > I have been mulling this KIP over a bit more in recent days!
> > > > >
> > > > > re: Jun
> > > > >
> > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps -
> in
> > > > > retrospect it should have been fairly obvious. I would need to go
> an
> > > > update
> > > > > KIP-1005 myself then, thank you for giving the useful reference!
> > > > >
> > > > > 4. I think Abhijeet wants to rebuild state from
> latest-tiered-offset
> > and
> > > > > fetch from latest-tiered-offset + 1 only for new replicas (or
> > replicas
> > > > > which experienced a disk failure) to decrease the time a partition
> > spends
> > > > > in under-replicated state. In other words, a follower which has
> just
> > > > fallen
> > > > > out of ISR, but has local data will continue using today's Tiered
> > Storage
> > > > > replication protocol (i.e. fetching from earliest-local). I further
> > > > believe
> > > > > he has taken this approach so that local state of replicas which
> have
> > > > just
> > > > > fallen out of ISR isn't forcefully wiped thus leading to situation
> 1.
> > > > > Abhijeet, have I understood (and summarised) what you are proposing
> > > > > correctly?
> > > > >
> > > > > Yes, your understanding is correct. We want to limit the behavior
> > changes
> > > > only to new replicas.
> > > >
> > > >
> > > > > 5. I think in today's Tiered Storage we know the leader's
> > > > log-start-offset
> > > > > from the FetchResponse and we can learn its local-log-start-offset
> > from
> > > > the
> > > > > ListOffsets by asking for earliest-local timestamp (-4). But
> granted,
> > > > this
> > > > > ought to be added as an additional API call in the KIP.
> > > > >
> > > > >
> > > > Yes, I clarified this in my reply to Jun. I will add this missing
> > detail in
> > > > the KIP.
> > > >
> > > >
> > > > > re: Abhijeet
> > > > >
> > > > > 101. I am still a bit confused as to why you want to include a new
> > offset
> > > > > (i.e. pending-upload-offset) when you yourself mention that you
> > could use
> > > > > an already existing offset (i.e. last-tiered-offset + 1). In
> > essence, you
> > > > > end your Motivation with "In this KIP, we will focus only on the
> > follower
> > > > > fetch protocol using the *last-tiered-offset*" and then in the
> > following
> > > > > sections you talk about pending-upload-offset. I understand this
> > might be
> > > > > classified as an implementation detail, but if you introduce a new
> > offset
> > > > > (i.e. pending-upload-offset) you have to make a change to the
> > ListOffsets
> > > > > API (i.e. introduce -6) and thus document it in this KIP as such.
> > > > However,
> > > > > the last-tiered-offset ought to already be exposed as part of
> > KIP-1005
> > > > > (under implementation). Am I misunderstanding something here?
> > > > >
> > > >
> > > > I have tried to clarify this in my reply to Jun.
> > > >
> > > > > The follower needs to build the local data starting from the offset
> > > > > Earliest-Pending-Upload-Offset. Hence it needs the offset and the
> > > > > corresponding leader-epoch.
> > > > > There are two ways to do this:
> > > > >    1. We add support in ListOffsetRequest to be able to fetch this
> > offset
> > > > > (and leader epoch) from the leader.
> > > > >    2. Or, fetch the tiered-offset (which is already supported).
> From
> > this
> > > > > offset, we can get the Earliest-Pending-Upload-Offset. We can just
> > add 1
> > > > to
> > > > > the tiered-offset.
> > > > >       However, we still need the leader epoch for offset, since
> > there is
> > > > > no guarantee that the leader epoch for
> Earliest-Pending-Upload-Offset
> > > > will
> > > > > be the same as the
> > > > >       leader epoch for tiered-offset. We may need another API call
> > to the
> > > > > leader for this.
> > > > > I prefer the first approach. The only problem with the first
> > approach is
> > > > > that it introduces one more offset. The second approach avoids this
> > > > problem
> > > > > but is a little complicated.
> > > >
> > > >
> > > >
> > > > >
> > > > > Best,
> > > > > Christo
> > > > >
> > > > > On Thu, 4 Apr 2024 at 19:37, Jun Rao <j...@confluent.io.invalid>
> > wrote:
> > > > >
> > > > > > Hi, Abhijeet,
> > > > > >
> > > > > > Thanks for the KIP. Left a few comments.
> > > > > >
> > > > > > 1. "A drawback of using the last-tiered-offset is that this new
> > > > follower
> > > > > > would possess only a limited number of locally stored segments.
> > Should
> > > > it
> > > > > > ascend to the role of leader, there is a risk of needing to fetch
> > these
> > > > > > segments from the remote storage, potentially impacting broker
> > > > > > performance."
> > > > > > Since we support consumers fetching from followers, this is a
> > potential
> > > > > > issue on the follower side too. In theory, it's possible for a
> > segment
> > > > to
> > > > > > be tiered immediately after rolling. In that case, there could be
> > very
> > > > > > little data after last-tiered-offset. It would be useful to think
> > > > through
> > > > > > how to address this issue.
> > > > > >
> > > > > > 2. ListOffsetsRequest:
> > > > > > 2.1 Typically, we need to bump up the version of the request if
> we
> > add
> > > > a
> > > > > > new value for timestamp. See
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633
> > > > > > .
> > > > > > 2.2 Since this changes the inter broker request protocol, it
> would
> > be
> > > > > > useful to have a section on upgrade (e.g. new
> > IBP/metadata.version).
> > > > > >
> > > > > > 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could
> > fetch
> > > > > the
> > > > > > last-tiered-offset from the leader, and make a separate leader
> > call to
> > > > > > fetch leader epoch for the following offset."
> > > > > > Why do we need to make a separate call for the leader epoch?
> > > > > > ListOffsetsResponse include both the offset and the corresponding
> > > > epoch.
> > > > > >
> > > > > > 4. "Check if the follower replica is empty and if the feature to
> > use
> > > > > > last-tiered-offset is enabled."
> > > > > > Why do we need to check if the follower replica is empty?
> > > > > >
> > > > > > 5. It can be confirmed by checking if the leader's
> > Log-Start-Offset is
> > > > > the
> > > > > > same as the Leader's Local-Log-Start-Offset.
> > > > > > How does the follower know Local-Log-Start-Offset?
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar <
> > > > > abhijeet.cse....@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Christo,
> > > > > > >
> > > > > > > Thanks for reviewing the KIP.
> > > > > > >
> > > > > > > The follower needs the earliest-pending-upload-offset (and the
> > > > > > > corresponding leader epoch) from the leader.
> > > > > > > This is the first offset the follower will have locally.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Abhijeet.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov <
> > > > christolo...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Heya!
> > > > > > > >
> > > > > > > > First of all, thank you very much for the proposal, you have
> > > > > explained
> > > > > > > the
> > > > > > > > problem you want solved very well - I think a faster
> bootstrap
> > of
> > > > an
> > > > > > > empty
> > > > > > > > replica is definitely an improvement!
> > > > > > > >
> > > > > > > > For my understanding, which concrete offset do you want the
> > leader
> > > > to
> > > > > > > give
> > > > > > > > back to a follower - earliest-pending-upload-offset or the
> > > > > > > > latest-tiered-offset? If it is the second, then I believe
> > KIP-1005
> > > > > > ought
> > > > > > > to
> > > > > > > > already be exposing that offset as part of the ListOffsets
> > API, no?
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Christo
> > > > > > > >
> > > > > > > > On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar <
> > > > > > abhijeet.cse....@gmail.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > > I have created KIP-1023 to introduce follower fetch from
> > tiered
> > > > > > offset.
> > > > > > > > > This feature will be helpful in significantly reducing
> Kafka
> > > > > > > > > rebalance/rebuild times when the cluster is enabled with
> > tiered
> > > > > > > storage.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset
> > > > > > > > >
> > > > > > > > > Feedback and suggestions are welcome.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Abhijeet.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>

Reply via email to