Hi, Abhijeet,

Thanks for the update.

I have no more comments.

Luke

On Thu, Apr 25, 2024 at 4:21 AM Jun Rao <j...@confluent.io.invalid> wrote:

> Hi, Abhijeet,
>
> Thanks for the updated KIP. It looks good to me.
>
> Jun
>
> On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar <
> abhijeet.cse....@gmail.com>
> wrote:
>
> > Hi Jun,
> >
> > Please find my comments inline.
> >
> >
> > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao <j...@confluent.io.invalid>
> wrote:
> >
> > > Hi, Abhijeet,
> > >
> > > Thanks for the reply.
> > >
> > > 1. I am wondering if we could achieve the same result by just lowering
> > > local.retention.ms and local.retention.bytes. This also allows the
> newly
> > > started follower to build up the local data before serving the consumer
> > > traffic.
> > >
> >
> > I am not sure I fully followed this. Do you mean we could lower the
> > local.retention (by size and time)
> > so that there is little data on the leader's local storage so that the
> > follower can quickly catch up with the leader?
> >
> > In that case, we will need to set small local retention across brokers in
> > the cluster. It will have the undesired
> > effect where there will be increased remote log fetches for serving
> consume
> > requests, and this can cause
> > degradations. Also, this behaviour (of increased remote fetches) will
> > happen on all brokers at all times, whereas in
> > the KIP we are restricting the behavior only to the newly bootstrapped
> > brokers and only until the time it fully builds
> > the local logs as per retention defined at the cluster level.
> > (Deprioritization of the broker could help reduce the impact
> >  even further)
> >
> >
> > >
> > > 2. Have you updated the KIP?
> > >
> >
> > The KIP has been updated now.
> >
> >
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana <
> satish.dugg...@gmail.com>
> > > wrote:
> > >
> > > > +1 to Jun for adding the consumer fetching from a follower scenario
> > > > also to the existing section that talked about the drawback when a
> > > > node built with last-tiered-offset has become a leader. As Abhijeet
> > > > mentioned, we plan to have a follow-up KIP that will address these by
> > > > having a deprioritzation of these brokers. The deprioritization of
> > > > those brokers can be removed once they catchup until the local log
> > > > retention.
> > > >
> > > > Thanks,
> > > > Satish.
> > > >
> > > > On Tue, 9 Apr 2024 at 14:08, Luke Chen <show...@gmail.com> wrote:
> > > > >
> > > > > Hi Abhijeet,
> > > > >
> > > > > Thanks for the KIP to improve the tiered storage feature!
> > > > >
> > > > > Questions:
> > > > > 1. We could also get the "pending-upload-offset" and epoch via
> remote
> > > log
> > > > > metadata, instead of adding a new API to fetch from the leader.
> Could
> > > you
> > > > > explain why you choose the later approach, instead of the former?
> > > > > 2.
> > > > > > We plan to have a follow-up KIP that will address both the
> > > > > deprioritization
> > > > > of these brokers from leadership, as well as
> > > > > for consumption (when fetching from followers is allowed).
> > > > >
> > > > > I agree with Jun that we might need to make it clear all possible
> > > > drawbacks
> > > > > that could have. So, could we add the drawbacks that Jun mentioned
> > > about
> > > > > the performance issue when consumer fetch from follower?
> > > > >
> > > > > 3. Could we add "Rejected Alternatives" section to the end of the
> KIP
> > > to
> > > > > add some of them?
> > > > > Like the "ListOffsetRequest" approach VS
> > > "Earliest-Pending-Upload-Offset"
> > > > > approach, or getting the "Earliest-Pending-Upload-Offset" from
> remote
> > > log
> > > > > metadata... etc.
> > > > >
> > > > > Thanks.
> > > > > Luke
> > > > >
> > > > >
> > > > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar <
> > > > abhijeet.cse....@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Christo,
> > > > > >
> > > > > > Please find my comments inline.
> > > > > >
> > > > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov <
> > > christolo...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hello Abhijeet and Jun,
> > > > > > >
> > > > > > > I have been mulling this KIP over a bit more in recent days!
> > > > > > >
> > > > > > > re: Jun
> > > > > > >
> > > > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new
> timestamps
> > -
> > > in
> > > > > > > retrospect it should have been fairly obvious. I would need to
> go
> > > an
> > > > > > update
> > > > > > > KIP-1005 myself then, thank you for giving the useful
> reference!
> > > > > > >
> > > > > > > 4. I think Abhijeet wants to rebuild state from
> > > latest-tiered-offset
> > > > and
> > > > > > > fetch from latest-tiered-offset + 1 only for new replicas (or
> > > > replicas
> > > > > > > which experienced a disk failure) to decrease the time a
> > partition
> > > > spends
> > > > > > > in under-replicated state. In other words, a follower which has
> > > just
> > > > > > fallen
> > > > > > > out of ISR, but has local data will continue using today's
> Tiered
> > > > Storage
> > > > > > > replication protocol (i.e. fetching from earliest-local). I
> > further
> > > > > > believe
> > > > > > > he has taken this approach so that local state of replicas
> which
> > > have
> > > > > > just
> > > > > > > fallen out of ISR isn't forcefully wiped thus leading to
> > situation
> > > 1.
> > > > > > > Abhijeet, have I understood (and summarised) what you are
> > proposing
> > > > > > > correctly?
> > > > > > >
> > > > > > > Yes, your understanding is correct. We want to limit the
> behavior
> > > > changes
> > > > > > only to new replicas.
> > > > > >
> > > > > >
> > > > > > > 5. I think in today's Tiered Storage we know the leader's
> > > > > > log-start-offset
> > > > > > > from the FetchResponse and we can learn its
> > local-log-start-offset
> > > > from
> > > > > > the
> > > > > > > ListOffsets by asking for earliest-local timestamp (-4). But
> > > granted,
> > > > > > this
> > > > > > > ought to be added as an additional API call in the KIP.
> > > > > > >
> > > > > > >
> > > > > > Yes, I clarified this in my reply to Jun. I will add this missing
> > > > detail in
> > > > > > the KIP.
> > > > > >
> > > > > >
> > > > > > > re: Abhijeet
> > > > > > >
> > > > > > > 101. I am still a bit confused as to why you want to include a
> > new
> > > > offset
> > > > > > > (i.e. pending-upload-offset) when you yourself mention that you
> > > > could use
> > > > > > > an already existing offset (i.e. last-tiered-offset + 1). In
> > > > essence, you
> > > > > > > end your Motivation with "In this KIP, we will focus only on
> the
> > > > follower
> > > > > > > fetch protocol using the *last-tiered-offset*" and then in the
> > > > following
> > > > > > > sections you talk about pending-upload-offset. I understand
> this
> > > > might be
> > > > > > > classified as an implementation detail, but if you introduce a
> > new
> > > > offset
> > > > > > > (i.e. pending-upload-offset) you have to make a change to the
> > > > ListOffsets
> > > > > > > API (i.e. introduce -6) and thus document it in this KIP as
> such.
> > > > > > However,
> > > > > > > the last-tiered-offset ought to already be exposed as part of
> > > > KIP-1005
> > > > > > > (under implementation). Am I misunderstanding something here?
> > > > > > >
> > > > > >
> > > > > > I have tried to clarify this in my reply to Jun.
> > > > > >
> > > > > > > The follower needs to build the local data starting from the
> > offset
> > > > > > > Earliest-Pending-Upload-Offset. Hence it needs the offset and
> the
> > > > > > > corresponding leader-epoch.
> > > > > > > There are two ways to do this:
> > > > > > >    1. We add support in ListOffsetRequest to be able to fetch
> > this
> > > > offset
> > > > > > > (and leader epoch) from the leader.
> > > > > > >    2. Or, fetch the tiered-offset (which is already supported).
> > > From
> > > > this
> > > > > > > offset, we can get the Earliest-Pending-Upload-Offset. We can
> > just
> > > > add 1
> > > > > > to
> > > > > > > the tiered-offset.
> > > > > > >       However, we still need the leader epoch for offset, since
> > > > there is
> > > > > > > no guarantee that the leader epoch for
> > > Earliest-Pending-Upload-Offset
> > > > > > will
> > > > > > > be the same as the
> > > > > > >       leader epoch for tiered-offset. We may need another API
> > call
> > > > to the
> > > > > > > leader for this.
> > > > > > > I prefer the first approach. The only problem with the first
> > > > approach is
> > > > > > > that it introduces one more offset. The second approach avoids
> > this
> > > > > > problem
> > > > > > > but is a little complicated.
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > > Christo
> > > > > > >
> > > > > > > On Thu, 4 Apr 2024 at 19:37, Jun Rao <j...@confluent.io.invalid
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi, Abhijeet,
> > > > > > > >
> > > > > > > > Thanks for the KIP. Left a few comments.
> > > > > > > >
> > > > > > > > 1. "A drawback of using the last-tiered-offset is that this
> new
> > > > > > follower
> > > > > > > > would possess only a limited number of locally stored
> segments.
> > > > Should
> > > > > > it
> > > > > > > > ascend to the role of leader, there is a risk of needing to
> > fetch
> > > > these
> > > > > > > > segments from the remote storage, potentially impacting
> broker
> > > > > > > > performance."
> > > > > > > > Since we support consumers fetching from followers, this is a
> > > > potential
> > > > > > > > issue on the follower side too. In theory, it's possible for
> a
> > > > segment
> > > > > > to
> > > > > > > > be tiered immediately after rolling. In that case, there
> could
> > be
> > > > very
> > > > > > > > little data after last-tiered-offset. It would be useful to
> > think
> > > > > > through
> > > > > > > > how to address this issue.
> > > > > > > >
> > > > > > > > 2. ListOffsetsRequest:
> > > > > > > > 2.1 Typically, we need to bump up the version of the request
> if
> > > we
> > > > add
> > > > > > a
> > > > > > > > new value for timestamp. See
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633
> > > > > > > > .
> > > > > > > > 2.2 Since this changes the inter broker request protocol, it
> > > would
> > > > be
> > > > > > > > useful to have a section on upgrade (e.g. new
> > > > IBP/metadata.version).
> > > > > > > >
> > > > > > > > 3. "Instead of fetching Earliest-Pending-Upload-Offset, it
> > could
> > > > fetch
> > > > > > > the
> > > > > > > > last-tiered-offset from the leader, and make a separate
> leader
> > > > call to
> > > > > > > > fetch leader epoch for the following offset."
> > > > > > > > Why do we need to make a separate call for the leader epoch?
> > > > > > > > ListOffsetsResponse include both the offset and the
> > corresponding
> > > > > > epoch.
> > > > > > > >
> > > > > > > > 4. "Check if the follower replica is empty and if the feature
> > to
> > > > use
> > > > > > > > last-tiered-offset is enabled."
> > > > > > > > Why do we need to check if the follower replica is empty?
> > > > > > > >
> > > > > > > > 5. It can be confirmed by checking if the leader's
> > > > Log-Start-Offset is
> > > > > > > the
> > > > > > > > same as the Leader's Local-Log-Start-Offset.
> > > > > > > > How does the follower know Local-Log-Start-Offset?
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > > On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar <
> > > > > > > abhijeet.cse....@gmail.com
> > > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Christo,
> > > > > > > > >
> > > > > > > > > Thanks for reviewing the KIP.
> > > > > > > > >
> > > > > > > > > The follower needs the earliest-pending-upload-offset (and
> > the
> > > > > > > > > corresponding leader epoch) from the leader.
> > > > > > > > > This is the first offset the follower will have locally.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Abhijeet.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov <
> > > > > > christolo...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Heya!
> > > > > > > > > >
> > > > > > > > > > First of all, thank you very much for the proposal, you
> > have
> > > > > > > explained
> > > > > > > > > the
> > > > > > > > > > problem you want solved very well - I think a faster
> > > bootstrap
> > > > of
> > > > > > an
> > > > > > > > > empty
> > > > > > > > > > replica is definitely an improvement!
> > > > > > > > > >
> > > > > > > > > > For my understanding, which concrete offset do you want
> the
> > > > leader
> > > > > > to
> > > > > > > > > give
> > > > > > > > > > back to a follower - earliest-pending-upload-offset or
> the
> > > > > > > > > > latest-tiered-offset? If it is the second, then I believe
> > > > KIP-1005
> > > > > > > > ought
> > > > > > > > > to
> > > > > > > > > > already be exposing that offset as part of the
> ListOffsets
> > > > API, no?
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Christo
> > > > > > > > > >
> > > > > > > > > > On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar <
> > > > > > > > abhijeet.cse....@gmail.com
> > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi All,
> > > > > > > > > > >
> > > > > > > > > > > I have created KIP-1023 to introduce follower fetch
> from
> > > > tiered
> > > > > > > > offset.
> > > > > > > > > > > This feature will be helpful in significantly reducing
> > > Kafka
> > > > > > > > > > > rebalance/rebuild times when the cluster is enabled
> with
> > > > tiered
> > > > > > > > > storage.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset
> > > > > > > > > > >
> > > > > > > > > > > Feedback and suggestions are welcome.
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Abhijeet.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
>

Reply via email to