Re: Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

Mayank Shekhar Narula Fri, 21 Jul 2023 09:40:52 -0700

David

03. Fixed as well, to remove ignorable. In MetadataResponse, Rack came a
version later, hence was marked ignorable.


Thanks.

On Fri, Jul 21, 2023 at 1:38 PM Mayank Shekhar Narula <
[email protected]> wrote:

> Hi David
>
>  01. My reasoning noted in the KIP is that CurrentLeader was first added
> in version 12, so 12 is the least version where clients could get these
> optimisations. So any client can now choose to implement this with version
> 12 of the protocol itself. If the version is bumped to X(say 16), then all
> clients(including non Java ones) will need to implement all versions from
> 13-16 to get these optimisations. And since the scenario of leader changing
> is not common, so bytes wasted on the network won't be much. What do you
> think?
>
> 02. For ProduceRequest, same reasoning. And corrected the taggedVersion.
>
> 03. Regarding making rack ignorable, my decision was inspired by a similar
> field in MetadataResponse. I must admit I am not clear, why was it made
> ignorable there originally?
>
>
> On Fri, Jul 21, 2023 at 11:14 AM David Jacot <[email protected]>
> wrote:
>
>> Hi Mayank,
>>
>> Thanks for the KIP. This is an interesting idea that I have been thinking
>> about for a long time so I am happy to see it. The gain is smaller than I
>> expected but still worth it in my opinion.
>>
>> 01. In the FetchResponse, what's the reason for using version `12+` for
>> the
>> new tagged field `NodeEndpoints`? Old clients won't use this field because
>> they are not aware of it. It seems to me that we are wasting bytes on the
>> wire by sending those unnecessarily. Is there a reason that I may have
>> missed? Intuitively, I would have bumped the version of the FetchRequest
>> to
>> 16 and I would have populated and sent `NodeEndpoints` only if version 16
>> is used by the client.
>>
>> 02. I have the very same question for the ProduceRequest's `CurrentLeader`
>> and `NodeEndpoints` fields. Note that the `taggedVerions` of
>> `NodeEndpoints` is incorrect.
>>
>> 03. For both, does the field `Rack` have to really be ignorable? That does
>> not seem necessary to me but I may be wrong.
>>
>> Best,
>> David
>>
>> On Fri, Jul 21, 2023 at 12:32 AM Crispin Bernier
>> <[email protected]> wrote:
>>
>> > Benchmark numbers have been posted on the KIP, please review.
>> >
>> > On 2023/07/20 13:03:00 Mayank Shekhar Narula wrote:
>> > > Jun
>> > >
>> > > Thanks for the feedback.
>> > >
>> > > Numbers to follow.
>> > >
>> > > If we don't plan to
>> > > > bump up the FetchResponse version, we could just remove the
>> reference
>> > to
>> > > > version 16.
>> > >
>> > > Fixed.
>> > >
>> > > On Thu, Jul 20, 2023 at 1:28 AM Jun Rao <[email protected]>
>> > wrote:
>> > >
>> > > > Hi, Mayank,
>> > > >
>> > > > Thanks for the KIP. I agree with others that it would be useful to
>> see
>> > the
>> > > > performance results. Otherwise, just a minor comment. If we don't
>> plan
>> > to
>> > > > bump up the FetchResponse version, we could just remove the
>> reference
>> > to
>> > > > version 16.
>> > > >
>> > > > Jun
>> > > >
>> > > > On Wed, Jul 19, 2023 at 2:31 PM Mayank Shekhar Narula <
>> > > > [email protected]> wrote:
>> > > >
>> > > > > Luke
>> > > > >
>> > > > > Thanks for the interest in the KIP.
>> > > > >
>> > > > > But what if the consumer was fetching from the follower?
>> > > > >
>> > > > > We already include `PreferredReadReplica` in the fetch response.
>> > > > > > Should we put the node info of PreferredReadReplica under this
>> > case,
>> > > > > > instead of the leader's info?
>> > > > > >
>> > > > >
>> > > > > PreferredReadReplica is the decided on the leader. Looking at the
>> > Java
>> > > > > client code, AbstractFetch::selectReadReplica, first fetch request
>> > goes
>> > > > to
>> > > > > Leader of the partition -> Sends back PreferredReadReplica -> Next
>> > fetch
>> > > > > uses PreferredReadReplica. So as long as leader is available,
>> > > > > PreferredReadReplica would be found in subsequent fetches.
>> > > > >
>> > > > > Also, under this case, should we include the leader's info in the
>> > > > response?
>> > > > >
>> > > > >
>> > > > > In this case, I think the follower would fail the fetch if it
>> knows a
>> > > > > different leader. If the follower knows a newer leader, it would
>> > return
>> > > > new
>> > > > > leader information in the response, for the client to act on.
>> > > > >
>> > > > >
>> > > > > Will we include the leader/node info in the response when having
>> > > > > > `UNKNOWN_LEADER_EPOCH` error?
>> > > > >
>> > > > >
>> > > > > My understanding is UNKNOWN_LEADER_EPOCH when a request from a
>> client
>> > > > has a
>> > > > > newer epoch than the broker. So the client is already up to date
>> on
>> > new
>> > > > > leader information, it's the broker that has the catching up to
>> do. I
>> > > > think
>> > > > > there might be some optimisations to make sure the broker
>> refreshes
>> > its
>> > > > > metadata quickly, so it can quickly recover to handle requests
>> that
>> > > > > previously returned UNKNOWN_LEADER_EPOCH. But this work is outside
>> > the
>> > > > > scope of this KIP, as for now this KIP focusses on client-side
>> > > > > optimisations.
>> > > > >
>> > > > > Mayank
>> > > > >
>> > > > > On Tue, Jul 18, 2023 at 8:51 AM Luke Chen <[email protected]>
>> wrote:
>> > > > >
>> > > > > > Hi Mayank,
>> > > > > >
>> > > > > > Thanks for the KIP!
>> > > > > >
>> > > > > > Some questions:
>> > > > > > 1. I can see most of the cases we only care about consumer fetch
>> > from
>> > > > the
>> > > > > > leader.
>> > > > > > But what if the consumer was fetching from the follower?
>> > > > > > We already include `PreferredReadReplica` in the fetch response.
>> > > > > > Should we put the node info of PreferredReadReplica under this
>> > case,
>> > > > > > instead of the leader's info?
>> > > > > > Also, under this case, should we include the leader's info in
>> the
>> > > > > response?
>> > > > > >
>> > > > > > 2. Will we include the leader/node info in the response when
>> having
>> > > > > > `UNKNOWN_LEADER_EPOCH` error?
>> > > > > > I think it's fine we ignore the `UNKNOWN_LEADER_EPOCH` error
>> since
>> > when
>> > > > > > this happens, the node might have some error which should
>> refresh
>> > the
>> > > > > > metadata. On the other hand, it might also be good if we can
>> heal
>> > the
>> > > > > node
>> > > > > > soon to do produce/consume works.
>> > > > > >
>> > > > > >
>> > > > > > Thank you.
>> > > > > > Luke
>> > > > > >
>> > > > > > On Tue, Jul 18, 2023 at 2:00 AM Philip Nee <[email protected]>
>> > > > wrote:
>> > > > > >
>> > > > > > > Hey Mayank:
>> > > > > > >
>> > > > > > > For #1: I think fetch and produce behave a bit differently on
>> > > > metadata.
>> > > > > > > Maybe it is worth highlighting the changes for each client in
>> > detail.
>> > > > > In
>> > > > > > > producer did you mean by the metadata timeout before sending
>> out
>> > > > > produce
>> > > > > > > requests? For consumer: I think for fetches it requires user
>> to
>> > retry
>> > > > > if
>> > > > > > > the position does not exist on the leader. I don't have the
>> > detail on
>> > > > > top
>> > > > > > > of my head, but I think we should lay out these behavioral
>> > changes.
>> > > > > > >
>> > > > > > > For #3: Thanks for the clarification.
>> > > > > > >
>> > > > > > > On Mon, Jul 17, 2023 at 10:39 AM Mayank Shekhar Narula <
>> > > > > > > [email protected]> wrote:
>> > > > > > >
>> > > > > > > > Philip
>> > > > > > > >
>> > > > > > > > 1. Good call out about "poll" behaviour, my understanding is
>> > the
>> > > > > same.
>> > > > > > I
>> > > > > > > am
>> > > > > > > > assuming it's about the motivation of the KIP. There with
>> > async, my
>> > > > > > > > intention was to convey that the client doesn't wait for the
>> > > > > > > > metadata-refresh before a subsequent retry of the produce or
>> > fetch
>> > > > > > > request
>> > > > > > > > that failed due to stale metadata(i.e. going to an old
>> > leader). The
>> > > > > > only
>> > > > > > > > wait client has is the configured retry-delay.
>> > > > > > > >
>> > > > > > > > 2. Yes, in theory other APIs could benefit from this too.
>> But
>> > that
>> > > > is
>> > > > > > > > outside of the scope of the KIP.
>> > > > > > > >
>> > > > > > > > 3. Do you mean the response for the Metadata RPC? I think
>> > brokers
>> > > > > > always
>> > > > > > > > have a view of the cluster, although it can be stale,it
>> would
>> > > > always
>> > > > > > > return
>> > > > > > > > a leader(whether old or new).
>> > > > > > > >
>> > > > > > > > Mayank
>> > > > > > > >
>> > > > > > > > On Fri, Jul 14, 2023 at 8:53 PM Philip Nee <[email protected]
>> >
>> > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hey Mayank,
>> > > > > > > > >
>> > > > > > > > > Thanks for the KIP. I think this is a great proposal, and
>> > I'm in
>> > > > > > favor
>> > > > > > > > > of this idea.  A few comments:
>> > > > > > > > >
>> > > > > > > > > 1. Claiming metadata refresh is done asynchronously is
>> > > > misleading.
>> > > > > > The
>> > > > > > > > > metadata refresh requires Network Client to be physically
>> > polled,
>> > > > > > which
>> > > > > > > > is
>> > > > > > > > > done in a separate thread in Producer and Admin Client
>> > (IIRC!)
>> > > > but
>> > > > > > not
>> > > > > > > > > Consumer.
>> > > > > > > > > 2. There are other API calls that might result in
>> > > > > > > NOT_LEADER_OR_FOLLOWER
>> > > > > > > > > response, but it seems like this KIP only wants to update
>> on
>> > > > fetch
>> > > > > > and
>> > > > > > > > > produce. Do we want to make the leader information
>> available
>> > for
>> > > > > > other
>> > > > > > > > API
>> > > > > > > > > calls?
>> > > > > > > > > 3. Do you know what would happen during a leader election?
>> > I'm
>> > > > not
>> > > > > > sure
>> > > > > > > > > about this process and I wonder if the current metadata
>> > response
>> > > > > uses
>> > > > > > > the
>> > > > > > > > > old leader or null as the leader isn't readily available
>> yet.
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > > P
>> > > > > > > > >
>> > > > > > > > > On Fri, Jul 14, 2023 at 11:30 AM Kirk True <
>> > [email protected]>
>> > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hi Mayank,
>> > > > > > > > > >
>> > > > > > > > > > > On Jul 14, 2023, at 11:25 AM, Mayank Shekhar Narula <
>> > > > > > > > > > [email protected]> wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > Kirk
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >> Is the requested restructuring of the response
>> “simply”
>> > to
>> > > > > > > preserve
>> > > > > > > > > > bytes,
>> > > > > > > > > > >> or is it possible that the fetch response
>> > could/should/would
>> > > > > > > return
>> > > > > > > > > > >> leadership changes for partitions that we’re
>> > specifically
>> > > > > > > requested?
>> > > > > > > > > > >>
>> > > > > > > > > > >
>> > > > > > > > > > > Moving endpoints to top-level fields would preserve
>> > bytes,
>> > > > > > > otherwise
>> > > > > > > > > the
>> > > > > > > > > > > endpoint-information would be duplicated if included
>> > with the
>> > > > > > > > > > > partition-data in the response. Right now, the
>> top-level
>> > > > field
>> > > > > > will
>> > > > > > > > > only
>> > > > > > > > > > be
>> > > > > > > > > > > set in case leader changes for any requested
>> partitions.
>> > But
>> > > > it
>> > > > > > can
>> > > > > > > > be
>> > > > > > > > > > > re-used in the future, for which Jose has a use-case
>> in
>> > mind
>> > > > > > shared
>> > > > > > > > up
>> > > > > > > > > in
>> > > > > > > > > > > the thread. KIP is now upto date with endpoint info
>> > being at
>> > > > > > > > top-level.
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > I didn’t catch before that there was a separate section
>> > for the
>> > > > > > full
>> > > > > > > > node
>> > > > > > > > > > information, not just the ID and epoch.
>> > > > > > > > > >
>> > > > > > > > > > Thanks!
>> > > > > > > > > >
>> > > > > > > > > > >>> 3. In the future, I may use this information in the
>> > > > > > > KRaft/Metadata
>> > > > > > > > > > >>> implementation of FETCH. In that implementation not
>> > all of
>> > > > > the
>> > > > > > > > > > >>> replicas are brokers.
>> > > > > > > > > > >>
>> > > > > > > > > > >> Side point: any references to the change you’re
>> > referring
>> > > > to?
>> > > > > > The
>> > > > > > > > idea
>> > > > > > > > > > of
>> > > > > > > > > > >> non-brokers serving as replicas is blowing my mind a
>> > bit :)
>> > > > > > > > > > >>
>> > > > > > > > > > >>
>> > > > > > > > > > > Jose, I missed this as well, would love to know more
>> > about
>> > > > > > > non-broker
>> > > > > > > > > > > serving as replica!
>> > > > > > > > > > > --
>> > > > > > > > > > > Regards,
>> > > > > > > > > > > Mayank Shekhar Narula
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > Regards,
>> > > > > > > > Mayank Shekhar Narula
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Regards,
>> > > > > Mayank Shekhar Narula
>> > > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Regards,
>> > > Mayank Shekhar Narula
>> > >
>>
>
>
> --
> Regards,
> Mayank Shekhar Narula
>


-- 
Regards,
Mayank Shekhar Narula

Re: Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

Reply via email to