Hi Abhijeet,

Thanks for the KIP to improve the tiered storage feature!

Questions:
1. We could also get the "pending-upload-offset" and epoch via remote log
metadata, instead of adding a new API to fetch from the leader. Could you
explain why you choose the later approach, instead of the former?
2.
> We plan to have a follow-up KIP that will address both the
deprioritization
of these brokers from leadership, as well as
for consumption (when fetching from followers is allowed).

I agree with Jun that we might need to make it clear all possible drawbacks
that could have. So, could we add the drawbacks that Jun mentioned about
the performance issue when consumer fetch from follower?

3. Could we add "Rejected Alternatives" section to the end of the KIP to
add some of them?
Like the "ListOffsetRequest" approach VS "Earliest-Pending-Upload-Offset"
approach, or getting the "Earliest-Pending-Upload-Offset" from remote log
metadata... etc.

Thanks.
Luke


On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar <abhijeet.cse....@gmail.com>
wrote:

> Hi Christo,
>
> Please find my comments inline.
>
> On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov <christolo...@gmail.com>
> wrote:
>
> > Hello Abhijeet and Jun,
> >
> > I have been mulling this KIP over a bit more in recent days!
> >
> > re: Jun
> >
> > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in
> > retrospect it should have been fairly obvious. I would need to go an
> update
> > KIP-1005 myself then, thank you for giving the useful reference!
> >
> > 4. I think Abhijeet wants to rebuild state from latest-tiered-offset and
> > fetch from latest-tiered-offset + 1 only for new replicas (or replicas
> > which experienced a disk failure) to decrease the time a partition spends
> > in under-replicated state. In other words, a follower which has just
> fallen
> > out of ISR, but has local data will continue using today's Tiered Storage
> > replication protocol (i.e. fetching from earliest-local). I further
> believe
> > he has taken this approach so that local state of replicas which have
> just
> > fallen out of ISR isn't forcefully wiped thus leading to situation 1.
> > Abhijeet, have I understood (and summarised) what you are proposing
> > correctly?
> >
> > Yes, your understanding is correct. We want to limit the behavior changes
> only to new replicas.
>
>
> > 5. I think in today's Tiered Storage we know the leader's
> log-start-offset
> > from the FetchResponse and we can learn its local-log-start-offset from
> the
> > ListOffsets by asking for earliest-local timestamp (-4). But granted,
> this
> > ought to be added as an additional API call in the KIP.
> >
> >
> Yes, I clarified this in my reply to Jun. I will add this missing detail in
> the KIP.
>
>
> > re: Abhijeet
> >
> > 101. I am still a bit confused as to why you want to include a new offset
> > (i.e. pending-upload-offset) when you yourself mention that you could use
> > an already existing offset (i.e. last-tiered-offset + 1). In essence, you
> > end your Motivation with "In this KIP, we will focus only on the follower
> > fetch protocol using the *last-tiered-offset*" and then in the following
> > sections you talk about pending-upload-offset. I understand this might be
> > classified as an implementation detail, but if you introduce a new offset
> > (i.e. pending-upload-offset) you have to make a change to the ListOffsets
> > API (i.e. introduce -6) and thus document it in this KIP as such.
> However,
> > the last-tiered-offset ought to already be exposed as part of KIP-1005
> > (under implementation). Am I misunderstanding something here?
> >
>
> I have tried to clarify this in my reply to Jun.
>
> > The follower needs to build the local data starting from the offset
> > Earliest-Pending-Upload-Offset. Hence it needs the offset and the
> > corresponding leader-epoch.
> > There are two ways to do this:
> >    1. We add support in ListOffsetRequest to be able to fetch this offset
> > (and leader epoch) from the leader.
> >    2. Or, fetch the tiered-offset (which is already supported). From this
> > offset, we can get the Earliest-Pending-Upload-Offset. We can just add 1
> to
> > the tiered-offset.
> >       However, we still need the leader epoch for offset, since there is
> > no guarantee that the leader epoch for Earliest-Pending-Upload-Offset
> will
> > be the same as the
> >       leader epoch for tiered-offset. We may need another API call to the
> > leader for this.
> > I prefer the first approach. The only problem with the first approach is
> > that it introduces one more offset. The second approach avoids this
> problem
> > but is a little complicated.
>
>
>
> >
> > Best,
> > Christo
> >
> > On Thu, 4 Apr 2024 at 19:37, Jun Rao <j...@confluent.io.invalid> wrote:
> >
> > > Hi, Abhijeet,
> > >
> > > Thanks for the KIP. Left a few comments.
> > >
> > > 1. "A drawback of using the last-tiered-offset is that this new
> follower
> > > would possess only a limited number of locally stored segments. Should
> it
> > > ascend to the role of leader, there is a risk of needing to fetch these
> > > segments from the remote storage, potentially impacting broker
> > > performance."
> > > Since we support consumers fetching from followers, this is a potential
> > > issue on the follower side too. In theory, it's possible for a segment
> to
> > > be tiered immediately after rolling. In that case, there could be very
> > > little data after last-tiered-offset. It would be useful to think
> through
> > > how to address this issue.
> > >
> > > 2. ListOffsetsRequest:
> > > 2.1 Typically, we need to bump up the version of the request if we add
> a
> > > new value for timestamp. See
> > >
> > >
> >
> https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633
> > > .
> > > 2.2 Since this changes the inter broker request protocol, it would be
> > > useful to have a section on upgrade (e.g. new IBP/metadata.version).
> > >
> > > 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch
> > the
> > > last-tiered-offset from the leader, and make a separate leader call to
> > > fetch leader epoch for the following offset."
> > > Why do we need to make a separate call for the leader epoch?
> > > ListOffsetsResponse include both the offset and the corresponding
> epoch.
> > >
> > > 4. "Check if the follower replica is empty and if the feature to use
> > > last-tiered-offset is enabled."
> > > Why do we need to check if the follower replica is empty?
> > >
> > > 5. It can be confirmed by checking if the leader's Log-Start-Offset is
> > the
> > > same as the Leader's Local-Log-Start-Offset.
> > > How does the follower know Local-Log-Start-Offset?
> > >
> > > Jun
> > >
> > > On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar <
> > abhijeet.cse....@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi Christo,
> > > >
> > > > Thanks for reviewing the KIP.
> > > >
> > > > The follower needs the earliest-pending-upload-offset (and the
> > > > corresponding leader epoch) from the leader.
> > > > This is the first offset the follower will have locally.
> > > >
> > > > Regards,
> > > > Abhijeet.
> > > >
> > > >
> > > >
> > > > On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov <
> christolo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Heya!
> > > > >
> > > > > First of all, thank you very much for the proposal, you have
> > explained
> > > > the
> > > > > problem you want solved very well - I think a faster bootstrap of
> an
> > > > empty
> > > > > replica is definitely an improvement!
> > > > >
> > > > > For my understanding, which concrete offset do you want the leader
> to
> > > > give
> > > > > back to a follower - earliest-pending-upload-offset or the
> > > > > latest-tiered-offset? If it is the second, then I believe KIP-1005
> > > ought
> > > > to
> > > > > already be exposing that offset as part of the ListOffsets API, no?
> > > > >
> > > > > Best,
> > > > > Christo
> > > > >
> > > > > On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar <
> > > abhijeet.cse....@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I have created KIP-1023 to introduce follower fetch from tiered
> > > offset.
> > > > > > This feature will be helpful in significantly reducing Kafka
> > > > > > rebalance/rebuild times when the cluster is enabled with tiered
> > > > storage.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset
> > > > > >
> > > > > > Feedback and suggestions are welcome.
> > > > > >
> > > > > > Regards,
> > > > > > Abhijeet.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to