Thank you all for your comments. As all the comments in the thread are addressed, I am starting a Vote thread for the KIP. Please have a look.
Regards. Abhijeet. On Thu, Apr 25, 2024 at 6:08 PM Luke Chen <show...@gmail.com> wrote: > Hi, Abhijeet, > > Thanks for the update. > > I have no more comments. > > Luke > > On Thu, Apr 25, 2024 at 4:21 AM Jun Rao <j...@confluent.io.invalid> wrote: > > > Hi, Abhijeet, > > > > Thanks for the updated KIP. It looks good to me. > > > > Jun > > > > On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar < > > abhijeet.cse....@gmail.com> > > wrote: > > > > > Hi Jun, > > > > > > Please find my comments inline. > > > > > > > > > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao <j...@confluent.io.invalid> > > wrote: > > > > > > > Hi, Abhijeet, > > > > > > > > Thanks for the reply. > > > > > > > > 1. I am wondering if we could achieve the same result by just > lowering > > > > local.retention.ms and local.retention.bytes. This also allows the > > newly > > > > started follower to build up the local data before serving the > consumer > > > > traffic. > > > > > > > > > > I am not sure I fully followed this. Do you mean we could lower the > > > local.retention (by size and time) > > > so that there is little data on the leader's local storage so that the > > > follower can quickly catch up with the leader? > > > > > > In that case, we will need to set small local retention across brokers > in > > > the cluster. It will have the undesired > > > effect where there will be increased remote log fetches for serving > > consume > > > requests, and this can cause > > > degradations. Also, this behaviour (of increased remote fetches) will > > > happen on all brokers at all times, whereas in > > > the KIP we are restricting the behavior only to the newly bootstrapped > > > brokers and only until the time it fully builds > > > the local logs as per retention defined at the cluster level. > > > (Deprioritization of the broker could help reduce the impact > > > even further) > > > > > > > > > > > > > > 2. Have you updated the KIP? > > > > > > > > > > The KIP has been updated now. > > > > > > > > > > > > > > Thanks, > > > > > > > > Jun > > > > > > > > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana < > > satish.dugg...@gmail.com> > > > > wrote: > > > > > > > > > +1 to Jun for adding the consumer fetching from a follower scenario > > > > > also to the existing section that talked about the drawback when a > > > > > node built with last-tiered-offset has become a leader. As Abhijeet > > > > > mentioned, we plan to have a follow-up KIP that will address these > by > > > > > having a deprioritzation of these brokers. The deprioritization of > > > > > those brokers can be removed once they catchup until the local log > > > > > retention. > > > > > > > > > > Thanks, > > > > > Satish. > > > > > > > > > > On Tue, 9 Apr 2024 at 14:08, Luke Chen <show...@gmail.com> wrote: > > > > > > > > > > > > Hi Abhijeet, > > > > > > > > > > > > Thanks for the KIP to improve the tiered storage feature! > > > > > > > > > > > > Questions: > > > > > > 1. We could also get the "pending-upload-offset" and epoch via > > remote > > > > log > > > > > > metadata, instead of adding a new API to fetch from the leader. > > Could > > > > you > > > > > > explain why you choose the later approach, instead of the former? > > > > > > 2. > > > > > > > We plan to have a follow-up KIP that will address both the > > > > > > deprioritization > > > > > > of these brokers from leadership, as well as > > > > > > for consumption (when fetching from followers is allowed). > > > > > > > > > > > > I agree with Jun that we might need to make it clear all possible > > > > > drawbacks > > > > > > that could have. So, could we add the drawbacks that Jun > mentioned > > > > about > > > > > > the performance issue when consumer fetch from follower? > > > > > > > > > > > > 3. Could we add "Rejected Alternatives" section to the end of the > > KIP > > > > to > > > > > > add some of them? > > > > > > Like the "ListOffsetRequest" approach VS > > > > "Earliest-Pending-Upload-Offset" > > > > > > approach, or getting the "Earliest-Pending-Upload-Offset" from > > remote > > > > log > > > > > > metadata... etc. > > > > > > > > > > > > Thanks. > > > > > > Luke > > > > > > > > > > > > > > > > > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar < > > > > > abhijeet.cse....@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hi Christo, > > > > > > > > > > > > > > Please find my comments inline. > > > > > > > > > > > > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov < > > > > christolo...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Hello Abhijeet and Jun, > > > > > > > > > > > > > > > > I have been mulling this KIP over a bit more in recent days! > > > > > > > > > > > > > > > > re: Jun > > > > > > > > > > > > > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new > > timestamps > > > - > > > > in > > > > > > > > retrospect it should have been fairly obvious. I would need > to > > go > > > > an > > > > > > > update > > > > > > > > KIP-1005 myself then, thank you for giving the useful > > reference! > > > > > > > > > > > > > > > > 4. I think Abhijeet wants to rebuild state from > > > > latest-tiered-offset > > > > > and > > > > > > > > fetch from latest-tiered-offset + 1 only for new replicas (or > > > > > replicas > > > > > > > > which experienced a disk failure) to decrease the time a > > > partition > > > > > spends > > > > > > > > in under-replicated state. In other words, a follower which > has > > > > just > > > > > > > fallen > > > > > > > > out of ISR, but has local data will continue using today's > > Tiered > > > > > Storage > > > > > > > > replication protocol (i.e. fetching from earliest-local). I > > > further > > > > > > > believe > > > > > > > > he has taken this approach so that local state of replicas > > which > > > > have > > > > > > > just > > > > > > > > fallen out of ISR isn't forcefully wiped thus leading to > > > situation > > > > 1. > > > > > > > > Abhijeet, have I understood (and summarised) what you are > > > proposing > > > > > > > > correctly? > > > > > > > > > > > > > > > > Yes, your understanding is correct. We want to limit the > > behavior > > > > > changes > > > > > > > only to new replicas. > > > > > > > > > > > > > > > > > > > > > > 5. I think in today's Tiered Storage we know the leader's > > > > > > > log-start-offset > > > > > > > > from the FetchResponse and we can learn its > > > local-log-start-offset > > > > > from > > > > > > > the > > > > > > > > ListOffsets by asking for earliest-local timestamp (-4). But > > > > granted, > > > > > > > this > > > > > > > > ought to be added as an additional API call in the KIP. > > > > > > > > > > > > > > > > > > > > > > > Yes, I clarified this in my reply to Jun. I will add this > missing > > > > > detail in > > > > > > > the KIP. > > > > > > > > > > > > > > > > > > > > > > re: Abhijeet > > > > > > > > > > > > > > > > 101. I am still a bit confused as to why you want to include > a > > > new > > > > > offset > > > > > > > > (i.e. pending-upload-offset) when you yourself mention that > you > > > > > could use > > > > > > > > an already existing offset (i.e. last-tiered-offset + 1). In > > > > > essence, you > > > > > > > > end your Motivation with "In this KIP, we will focus only on > > the > > > > > follower > > > > > > > > fetch protocol using the *last-tiered-offset*" and then in > the > > > > > following > > > > > > > > sections you talk about pending-upload-offset. I understand > > this > > > > > might be > > > > > > > > classified as an implementation detail, but if you introduce > a > > > new > > > > > offset > > > > > > > > (i.e. pending-upload-offset) you have to make a change to the > > > > > ListOffsets > > > > > > > > API (i.e. introduce -6) and thus document it in this KIP as > > such. > > > > > > > However, > > > > > > > > the last-tiered-offset ought to already be exposed as part of > > > > > KIP-1005 > > > > > > > > (under implementation). Am I misunderstanding something here? > > > > > > > > > > > > > > > > > > > > > > I have tried to clarify this in my reply to Jun. > > > > > > > > > > > > > > > The follower needs to build the local data starting from the > > > offset > > > > > > > > Earliest-Pending-Upload-Offset. Hence it needs the offset and > > the > > > > > > > > corresponding leader-epoch. > > > > > > > > There are two ways to do this: > > > > > > > > 1. We add support in ListOffsetRequest to be able to fetch > > > this > > > > > offset > > > > > > > > (and leader epoch) from the leader. > > > > > > > > 2. Or, fetch the tiered-offset (which is already > supported). > > > > From > > > > > this > > > > > > > > offset, we can get the Earliest-Pending-Upload-Offset. We can > > > just > > > > > add 1 > > > > > > > to > > > > > > > > the tiered-offset. > > > > > > > > However, we still need the leader epoch for offset, > since > > > > > there is > > > > > > > > no guarantee that the leader epoch for > > > > Earliest-Pending-Upload-Offset > > > > > > > will > > > > > > > > be the same as the > > > > > > > > leader epoch for tiered-offset. We may need another API > > > call > > > > > to the > > > > > > > > leader for this. > > > > > > > > I prefer the first approach. The only problem with the first > > > > > approach is > > > > > > > > that it introduces one more offset. The second approach > avoids > > > this > > > > > > > problem > > > > > > > > but is a little complicated. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > Christo > > > > > > > > > > > > > > > > On Thu, 4 Apr 2024 at 19:37, Jun Rao > <j...@confluent.io.invalid > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi, Abhijeet, > > > > > > > > > > > > > > > > > > Thanks for the KIP. Left a few comments. > > > > > > > > > > > > > > > > > > 1. "A drawback of using the last-tiered-offset is that this > > new > > > > > > > follower > > > > > > > > > would possess only a limited number of locally stored > > segments. > > > > > Should > > > > > > > it > > > > > > > > > ascend to the role of leader, there is a risk of needing to > > > fetch > > > > > these > > > > > > > > > segments from the remote storage, potentially impacting > > broker > > > > > > > > > performance." > > > > > > > > > Since we support consumers fetching from followers, this > is a > > > > > potential > > > > > > > > > issue on the follower side too. In theory, it's possible > for > > a > > > > > segment > > > > > > > to > > > > > > > > > be tiered immediately after rolling. In that case, there > > could > > > be > > > > > very > > > > > > > > > little data after last-tiered-offset. It would be useful to > > > think > > > > > > > through > > > > > > > > > how to address this issue. > > > > > > > > > > > > > > > > > > 2. ListOffsetsRequest: > > > > > > > > > 2.1 Typically, we need to bump up the version of the > request > > if > > > > we > > > > > add > > > > > > > a > > > > > > > > > new value for timestamp. See > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633 > > > > > > > > > . > > > > > > > > > 2.2 Since this changes the inter broker request protocol, > it > > > > would > > > > > be > > > > > > > > > useful to have a section on upgrade (e.g. new > > > > > IBP/metadata.version). > > > > > > > > > > > > > > > > > > 3. "Instead of fetching Earliest-Pending-Upload-Offset, it > > > could > > > > > fetch > > > > > > > > the > > > > > > > > > last-tiered-offset from the leader, and make a separate > > leader > > > > > call to > > > > > > > > > fetch leader epoch for the following offset." > > > > > > > > > Why do we need to make a separate call for the leader > epoch? > > > > > > > > > ListOffsetsResponse include both the offset and the > > > corresponding > > > > > > > epoch. > > > > > > > > > > > > > > > > > > 4. "Check if the follower replica is empty and if the > feature > > > to > > > > > use > > > > > > > > > last-tiered-offset is enabled." > > > > > > > > > Why do we need to check if the follower replica is empty? > > > > > > > > > > > > > > > > > > 5. It can be confirmed by checking if the leader's > > > > > Log-Start-Offset is > > > > > > > > the > > > > > > > > > same as the Leader's Local-Log-Start-Offset. > > > > > > > > > How does the follower know Local-Log-Start-Offset? > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar < > > > > > > > > abhijeet.cse....@gmail.com > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi Christo, > > > > > > > > > > > > > > > > > > > > Thanks for reviewing the KIP. > > > > > > > > > > > > > > > > > > > > The follower needs the earliest-pending-upload-offset > (and > > > the > > > > > > > > > > corresponding leader epoch) from the leader. > > > > > > > > > > This is the first offset the follower will have locally. > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Abhijeet. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov < > > > > > > > christolo...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Heya! > > > > > > > > > > > > > > > > > > > > > > First of all, thank you very much for the proposal, you > > > have > > > > > > > > explained > > > > > > > > > > the > > > > > > > > > > > problem you want solved very well - I think a faster > > > > bootstrap > > > > > of > > > > > > > an > > > > > > > > > > empty > > > > > > > > > > > replica is definitely an improvement! > > > > > > > > > > > > > > > > > > > > > > For my understanding, which concrete offset do you want > > the > > > > > leader > > > > > > > to > > > > > > > > > > give > > > > > > > > > > > back to a follower - earliest-pending-upload-offset or > > the > > > > > > > > > > > latest-tiered-offset? If it is the second, then I > believe > > > > > KIP-1005 > > > > > > > > > ought > > > > > > > > > > to > > > > > > > > > > > already be exposing that offset as part of the > > ListOffsets > > > > > API, no? > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > Christo > > > > > > > > > > > > > > > > > > > > > > On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar < > > > > > > > > > abhijeet.cse....@gmail.com > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > > > > > I have created KIP-1023 to introduce follower fetch > > from > > > > > tiered > > > > > > > > > offset. > > > > > > > > > > > > This feature will be helpful in significantly > reducing > > > > Kafka > > > > > > > > > > > > rebalance/rebuild times when the cluster is enabled > > with > > > > > tiered > > > > > > > > > > storage. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset > > > > > > > > > > > > > > > > > > > > > > > > Feedback and suggestions are welcome. > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > Abhijeet. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >