David 03. Fixed as well, to remove ignorable. In MetadataResponse, Rack came a version later, hence was marked ignorable.
Thanks. On Fri, Jul 21, 2023 at 1:38 PM Mayank Shekhar Narula < [email protected]> wrote: > Hi David > > 01. My reasoning noted in the KIP is that CurrentLeader was first added > in version 12, so 12 is the least version where clients could get these > optimisations. So any client can now choose to implement this with version > 12 of the protocol itself. If the version is bumped to X(say 16), then all > clients(including non Java ones) will need to implement all versions from > 13-16 to get these optimisations. And since the scenario of leader changing > is not common, so bytes wasted on the network won't be much. What do you > think? > > 02. For ProduceRequest, same reasoning. And corrected the taggedVersion. > > 03. Regarding making rack ignorable, my decision was inspired by a similar > field in MetadataResponse. I must admit I am not clear, why was it made > ignorable there originally? > > > On Fri, Jul 21, 2023 at 11:14 AM David Jacot <[email protected]> > wrote: > >> Hi Mayank, >> >> Thanks for the KIP. This is an interesting idea that I have been thinking >> about for a long time so I am happy to see it. The gain is smaller than I >> expected but still worth it in my opinion. >> >> 01. In the FetchResponse, what's the reason for using version `12+` for >> the >> new tagged field `NodeEndpoints`? Old clients won't use this field because >> they are not aware of it. It seems to me that we are wasting bytes on the >> wire by sending those unnecessarily. Is there a reason that I may have >> missed? Intuitively, I would have bumped the version of the FetchRequest >> to >> 16 and I would have populated and sent `NodeEndpoints` only if version 16 >> is used by the client. >> >> 02. I have the very same question for the ProduceRequest's `CurrentLeader` >> and `NodeEndpoints` fields. Note that the `taggedVerions` of >> `NodeEndpoints` is incorrect. >> >> 03. For both, does the field `Rack` have to really be ignorable? That does >> not seem necessary to me but I may be wrong. >> >> Best, >> David >> >> On Fri, Jul 21, 2023 at 12:32 AM Crispin Bernier >> <[email protected]> wrote: >> >> > Benchmark numbers have been posted on the KIP, please review. >> > >> > On 2023/07/20 13:03:00 Mayank Shekhar Narula wrote: >> > > Jun >> > > >> > > Thanks for the feedback. >> > > >> > > Numbers to follow. >> > > >> > > If we don't plan to >> > > > bump up the FetchResponse version, we could just remove the >> reference >> > to >> > > > version 16. >> > > >> > > Fixed. >> > > >> > > On Thu, Jul 20, 2023 at 1:28 AM Jun Rao <[email protected]> >> > wrote: >> > > >> > > > Hi, Mayank, >> > > > >> > > > Thanks for the KIP. I agree with others that it would be useful to >> see >> > the >> > > > performance results. Otherwise, just a minor comment. If we don't >> plan >> > to >> > > > bump up the FetchResponse version, we could just remove the >> reference >> > to >> > > > version 16. >> > > > >> > > > Jun >> > > > >> > > > On Wed, Jul 19, 2023 at 2:31 PM Mayank Shekhar Narula < >> > > > [email protected]> wrote: >> > > > >> > > > > Luke >> > > > > >> > > > > Thanks for the interest in the KIP. >> > > > > >> > > > > But what if the consumer was fetching from the follower? >> > > > > >> > > > > We already include `PreferredReadReplica` in the fetch response. >> > > > > > Should we put the node info of PreferredReadReplica under this >> > case, >> > > > > > instead of the leader's info? >> > > > > > >> > > > > >> > > > > PreferredReadReplica is the decided on the leader. Looking at the >> > Java >> > > > > client code, AbstractFetch::selectReadReplica, first fetch request >> > goes >> > > > to >> > > > > Leader of the partition -> Sends back PreferredReadReplica -> Next >> > fetch >> > > > > uses PreferredReadReplica. So as long as leader is available, >> > > > > PreferredReadReplica would be found in subsequent fetches. >> > > > > >> > > > > Also, under this case, should we include the leader's info in the >> > > > response? >> > > > > >> > > > > >> > > > > In this case, I think the follower would fail the fetch if it >> knows a >> > > > > different leader. If the follower knows a newer leader, it would >> > return >> > > > new >> > > > > leader information in the response, for the client to act on. >> > > > > >> > > > > >> > > > > Will we include the leader/node info in the response when having >> > > > > > `UNKNOWN_LEADER_EPOCH` error? >> > > > > >> > > > > >> > > > > My understanding is UNKNOWN_LEADER_EPOCH when a request from a >> client >> > > > has a >> > > > > newer epoch than the broker. So the client is already up to date >> on >> > new >> > > > > leader information, it's the broker that has the catching up to >> do. I >> > > > think >> > > > > there might be some optimisations to make sure the broker >> refreshes >> > its >> > > > > metadata quickly, so it can quickly recover to handle requests >> that >> > > > > previously returned UNKNOWN_LEADER_EPOCH. But this work is outside >> > the >> > > > > scope of this KIP, as for now this KIP focusses on client-side >> > > > > optimisations. >> > > > > >> > > > > Mayank >> > > > > >> > > > > On Tue, Jul 18, 2023 at 8:51 AM Luke Chen <[email protected]> >> wrote: >> > > > > >> > > > > > Hi Mayank, >> > > > > > >> > > > > > Thanks for the KIP! >> > > > > > >> > > > > > Some questions: >> > > > > > 1. I can see most of the cases we only care about consumer fetch >> > from >> > > > the >> > > > > > leader. >> > > > > > But what if the consumer was fetching from the follower? >> > > > > > We already include `PreferredReadReplica` in the fetch response. >> > > > > > Should we put the node info of PreferredReadReplica under this >> > case, >> > > > > > instead of the leader's info? >> > > > > > Also, under this case, should we include the leader's info in >> the >> > > > > response? >> > > > > > >> > > > > > 2. Will we include the leader/node info in the response when >> having >> > > > > > `UNKNOWN_LEADER_EPOCH` error? >> > > > > > I think it's fine we ignore the `UNKNOWN_LEADER_EPOCH` error >> since >> > when >> > > > > > this happens, the node might have some error which should >> refresh >> > the >> > > > > > metadata. On the other hand, it might also be good if we can >> heal >> > the >> > > > > node >> > > > > > soon to do produce/consume works. >> > > > > > >> > > > > > >> > > > > > Thank you. >> > > > > > Luke >> > > > > > >> > > > > > On Tue, Jul 18, 2023 at 2:00 AM Philip Nee <[email protected]> >> > > > wrote: >> > > > > > >> > > > > > > Hey Mayank: >> > > > > > > >> > > > > > > For #1: I think fetch and produce behave a bit differently on >> > > > metadata. >> > > > > > > Maybe it is worth highlighting the changes for each client in >> > detail. >> > > > > In >> > > > > > > producer did you mean by the metadata timeout before sending >> out >> > > > > produce >> > > > > > > requests? For consumer: I think for fetches it requires user >> to >> > retry >> > > > > if >> > > > > > > the position does not exist on the leader. I don't have the >> > detail on >> > > > > top >> > > > > > > of my head, but I think we should lay out these behavioral >> > changes. >> > > > > > > >> > > > > > > For #3: Thanks for the clarification. >> > > > > > > >> > > > > > > On Mon, Jul 17, 2023 at 10:39 AM Mayank Shekhar Narula < >> > > > > > > [email protected]> wrote: >> > > > > > > >> > > > > > > > Philip >> > > > > > > > >> > > > > > > > 1. Good call out about "poll" behaviour, my understanding is >> > the >> > > > > same. >> > > > > > I >> > > > > > > am >> > > > > > > > assuming it's about the motivation of the KIP. There with >> > async, my >> > > > > > > > intention was to convey that the client doesn't wait for the >> > > > > > > > metadata-refresh before a subsequent retry of the produce or >> > fetch >> > > > > > > request >> > > > > > > > that failed due to stale metadata(i.e. going to an old >> > leader). The >> > > > > > only >> > > > > > > > wait client has is the configured retry-delay. >> > > > > > > > >> > > > > > > > 2. Yes, in theory other APIs could benefit from this too. >> But >> > that >> > > > is >> > > > > > > > outside of the scope of the KIP. >> > > > > > > > >> > > > > > > > 3. Do you mean the response for the Metadata RPC? I think >> > brokers >> > > > > > always >> > > > > > > > have a view of the cluster, although it can be stale,it >> would >> > > > always >> > > > > > > return >> > > > > > > > a leader(whether old or new). >> > > > > > > > >> > > > > > > > Mayank >> > > > > > > > >> > > > > > > > On Fri, Jul 14, 2023 at 8:53 PM Philip Nee <[email protected] >> > >> > > > > > wrote: >> > > > > > > > >> > > > > > > > > Hey Mayank, >> > > > > > > > > >> > > > > > > > > Thanks for the KIP. I think this is a great proposal, and >> > I'm in >> > > > > > favor >> > > > > > > > > of this idea. A few comments: >> > > > > > > > > >> > > > > > > > > 1. Claiming metadata refresh is done asynchronously is >> > > > misleading. >> > > > > > The >> > > > > > > > > metadata refresh requires Network Client to be physically >> > polled, >> > > > > > which >> > > > > > > > is >> > > > > > > > > done in a separate thread in Producer and Admin Client >> > (IIRC!) >> > > > but >> > > > > > not >> > > > > > > > > Consumer. >> > > > > > > > > 2. There are other API calls that might result in >> > > > > > > NOT_LEADER_OR_FOLLOWER >> > > > > > > > > response, but it seems like this KIP only wants to update >> on >> > > > fetch >> > > > > > and >> > > > > > > > > produce. Do we want to make the leader information >> available >> > for >> > > > > > other >> > > > > > > > API >> > > > > > > > > calls? >> > > > > > > > > 3. Do you know what would happen during a leader election? >> > I'm >> > > > not >> > > > > > sure >> > > > > > > > > about this process and I wonder if the current metadata >> > response >> > > > > uses >> > > > > > > the >> > > > > > > > > old leader or null as the leader isn't readily available >> yet. >> > > > > > > > > >> > > > > > > > > Thanks, >> > > > > > > > > P >> > > > > > > > > >> > > > > > > > > On Fri, Jul 14, 2023 at 11:30 AM Kirk True < >> > [email protected]> >> > > > > > wrote: >> > > > > > > > > >> > > > > > > > > > Hi Mayank, >> > > > > > > > > > >> > > > > > > > > > > On Jul 14, 2023, at 11:25 AM, Mayank Shekhar Narula < >> > > > > > > > > > [email protected]> wrote: >> > > > > > > > > > > >> > > > > > > > > > > Kirk >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> Is the requested restructuring of the response >> “simply” >> > to >> > > > > > > preserve >> > > > > > > > > > bytes, >> > > > > > > > > > >> or is it possible that the fetch response >> > could/should/would >> > > > > > > return >> > > > > > > > > > >> leadership changes for partitions that we’re >> > specifically >> > > > > > > requested? >> > > > > > > > > > >> >> > > > > > > > > > > >> > > > > > > > > > > Moving endpoints to top-level fields would preserve >> > bytes, >> > > > > > > otherwise >> > > > > > > > > the >> > > > > > > > > > > endpoint-information would be duplicated if included >> > with the >> > > > > > > > > > > partition-data in the response. Right now, the >> top-level >> > > > field >> > > > > > will >> > > > > > > > > only >> > > > > > > > > > be >> > > > > > > > > > > set in case leader changes for any requested >> partitions. >> > But >> > > > it >> > > > > > can >> > > > > > > > be >> > > > > > > > > > > re-used in the future, for which Jose has a use-case >> in >> > mind >> > > > > > shared >> > > > > > > > up >> > > > > > > > > in >> > > > > > > > > > > the thread. KIP is now upto date with endpoint info >> > being at >> > > > > > > > top-level. >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > I didn’t catch before that there was a separate section >> > for the >> > > > > > full >> > > > > > > > node >> > > > > > > > > > information, not just the ID and epoch. >> > > > > > > > > > >> > > > > > > > > > Thanks! >> > > > > > > > > > >> > > > > > > > > > >>> 3. In the future, I may use this information in the >> > > > > > > KRaft/Metadata >> > > > > > > > > > >>> implementation of FETCH. In that implementation not >> > all of >> > > > > the >> > > > > > > > > > >>> replicas are brokers. >> > > > > > > > > > >> >> > > > > > > > > > >> Side point: any references to the change you’re >> > referring >> > > > to? >> > > > > > The >> > > > > > > > idea >> > > > > > > > > > of >> > > > > > > > > > >> non-brokers serving as replicas is blowing my mind a >> > bit :) >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > > Jose, I missed this as well, would love to know more >> > about >> > > > > > > non-broker >> > > > > > > > > > > serving as replica! >> > > > > > > > > > > -- >> > > > > > > > > > > Regards, >> > > > > > > > > > > Mayank Shekhar Narula >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > -- >> > > > > > > > Regards, >> > > > > > > > Mayank Shekhar Narula >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Regards, >> > > > > Mayank Shekhar Narula >> > > > > >> > > > >> > > >> > > >> > > -- >> > > Regards, >> > > Mayank Shekhar Narula >> > > >> > > > -- > Regards, > Mayank Shekhar Narula > -- Regards, Mayank Shekhar Narula
