Re: Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

David Jacot Fri, 21 Jul 2023 03:14:31 -0700

Hi Mayank,

Thanks for the KIP. This is an interesting idea that I have been thinking
about for a long time so I am happy to see it. The gain is smaller than I
expected but still worth it in my opinion.


01. In the FetchResponse, what's the reason for using version `12+` for the
new tagged field `NodeEndpoints`? Old clients won't use this field because
they are not aware of it. It seems to me that we are wasting bytes on the
wire by sending those unnecessarily. Is there a reason that I may have
missed? Intuitively, I would have bumped the version of the FetchRequest to
16 and I would have populated and sent `NodeEndpoints` only if version 16
is used by the client.

02. I have the very same question for the ProduceRequest's `CurrentLeader`
and `NodeEndpoints` fields. Note that the `taggedVerions` of
`NodeEndpoints` is incorrect.

03. For both, does the field `Rack` have to really be ignorable? That does
not seem necessary to me but I may be wrong.

Best,
David

On Fri, Jul 21, 2023 at 12:32 AM Crispin Bernier
<[email protected]> wrote:

> Benchmark numbers have been posted on the KIP, please review.
>
> On 2023/07/20 13:03:00 Mayank Shekhar Narula wrote:
> > Jun
> >
> > Thanks for the feedback.
> >
> > Numbers to follow.
> >
> > If we don't plan to
> > > bump up the FetchResponse version, we could just remove the reference
> to
> > > version 16.
> >
> > Fixed.
> >
> > On Thu, Jul 20, 2023 at 1:28 AM Jun Rao <[email protected]>
> wrote:
> >
> > > Hi, Mayank,
> > >
> > > Thanks for the KIP. I agree with others that it would be useful to see
> the
> > > performance results. Otherwise, just a minor comment. If we don't plan
> to
> > > bump up the FetchResponse version, we could just remove the reference
> to
> > > version 16.
> > >
> > > Jun
> > >
> > > On Wed, Jul 19, 2023 at 2:31 PM Mayank Shekhar Narula <
> > > [email protected]> wrote:
> > >
> > > > Luke
> > > >
> > > > Thanks for the interest in the KIP.
> > > >
> > > > But what if the consumer was fetching from the follower?
> > > >
> > > > We already include `PreferredReadReplica` in the fetch response.
> > > > > Should we put the node info of PreferredReadReplica under this
> case,
> > > > > instead of the leader's info?
> > > > >
> > > >
> > > > PreferredReadReplica is the decided on the leader. Looking at the
> Java
> > > > client code, AbstractFetch::selectReadReplica, first fetch request
> goes
> > > to
> > > > Leader of the partition -> Sends back PreferredReadReplica -> Next
> fetch
> > > > uses PreferredReadReplica. So as long as leader is available,
> > > > PreferredReadReplica would be found in subsequent fetches.
> > > >
> > > > Also, under this case, should we include the leader's info in the
> > > response?
> > > >
> > > >
> > > > In this case, I think the follower would fail the fetch if it knows a
> > > > different leader. If the follower knows a newer leader, it would
> return
> > > new
> > > > leader information in the response, for the client to act on.
> > > >
> > > >
> > > > Will we include the leader/node info in the response when having
> > > > > `UNKNOWN_LEADER_EPOCH` error?
> > > >
> > > >
> > > > My understanding is UNKNOWN_LEADER_EPOCH when a request from a client
> > > has a
> > > > newer epoch than the broker. So the client is already up to date on
> new
> > > > leader information, it's the broker that has the catching up to do. I
> > > think
> > > > there might be some optimisations to make sure the broker refreshes
> its
> > > > metadata quickly, so it can quickly recover to handle requests that
> > > > previously returned UNKNOWN_LEADER_EPOCH. But this work is outside
> the
> > > > scope of this KIP, as for now this KIP focusses on client-side
> > > > optimisations.
> > > >
> > > > Mayank
> > > >
> > > > On Tue, Jul 18, 2023 at 8:51 AM Luke Chen <[email protected]> wrote:
> > > >
> > > > > Hi Mayank,
> > > > >
> > > > > Thanks for the KIP!
> > > > >
> > > > > Some questions:
> > > > > 1. I can see most of the cases we only care about consumer fetch
> from
> > > the
> > > > > leader.
> > > > > But what if the consumer was fetching from the follower?
> > > > > We already include `PreferredReadReplica` in the fetch response.
> > > > > Should we put the node info of PreferredReadReplica under this
> case,
> > > > > instead of the leader's info?
> > > > > Also, under this case, should we include the leader's info in the
> > > > response?
> > > > >
> > > > > 2. Will we include the leader/node info in the response when having
> > > > > `UNKNOWN_LEADER_EPOCH` error?
> > > > > I think it's fine we ignore the `UNKNOWN_LEADER_EPOCH` error since
> when
> > > > > this happens, the node might have some error which should refresh
> the
> > > > > metadata. On the other hand, it might also be good if we can heal
> the
> > > > node
> > > > > soon to do produce/consume works.
> > > > >
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Tue, Jul 18, 2023 at 2:00 AM Philip Nee <[email protected]>
> > > wrote:
> > > > >
> > > > > > Hey Mayank:
> > > > > >
> > > > > > For #1: I think fetch and produce behave a bit differently on
> > > metadata.
> > > > > > Maybe it is worth highlighting the changes for each client in
> detail.
> > > > In
> > > > > > producer did you mean by the metadata timeout before sending out
> > > > produce
> > > > > > requests? For consumer: I think for fetches it requires user to
> retry
> > > > if
> > > > > > the position does not exist on the leader. I don't have the
> detail on
> > > > top
> > > > > > of my head, but I think we should lay out these behavioral
> changes.
> > > > > >
> > > > > > For #3: Thanks for the clarification.
> > > > > >
> > > > > > On Mon, Jul 17, 2023 at 10:39 AM Mayank Shekhar Narula <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > > > Philip
> > > > > > >
> > > > > > > 1. Good call out about "poll" behaviour, my understanding is
> the
> > > > same.
> > > > > I
> > > > > > am
> > > > > > > assuming it's about the motivation of the KIP. There with
> async, my
> > > > > > > intention was to convey that the client doesn't wait for the
> > > > > > > metadata-refresh before a subsequent retry of the produce or
> fetch
> > > > > > request
> > > > > > > that failed due to stale metadata(i.e. going to an old
> leader). The
> > > > > only
> > > > > > > wait client has is the configured retry-delay.
> > > > > > >
> > > > > > > 2. Yes, in theory other APIs could benefit from this too. But
> that
> > > is
> > > > > > > outside of the scope of the KIP.
> > > > > > >
> > > > > > > 3. Do you mean the response for the Metadata RPC? I think
> brokers
> > > > > always
> > > > > > > have a view of the cluster, although it can be stale,it would
> > > always
> > > > > > return
> > > > > > > a leader(whether old or new).
> > > > > > >
> > > > > > > Mayank
> > > > > > >
> > > > > > > On Fri, Jul 14, 2023 at 8:53 PM Philip Nee <[email protected]>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hey Mayank,
> > > > > > > >
> > > > > > > > Thanks for the KIP. I think this is a great proposal, and
> I'm in
> > > > > favor
> > > > > > > > of this idea.  A few comments:
> > > > > > > >
> > > > > > > > 1. Claiming metadata refresh is done asynchronously is
> > > misleading.
> > > > > The
> > > > > > > > metadata refresh requires Network Client to be physically
> polled,
> > > > > which
> > > > > > > is
> > > > > > > > done in a separate thread in Producer and Admin Client
> (IIRC!)
> > > but
> > > > > not
> > > > > > > > Consumer.
> > > > > > > > 2. There are other API calls that might result in
> > > > > > NOT_LEADER_OR_FOLLOWER
> > > > > > > > response, but it seems like this KIP only wants to update on
> > > fetch
> > > > > and
> > > > > > > > produce. Do we want to make the leader information available
> for
> > > > > other
> > > > > > > API
> > > > > > > > calls?
> > > > > > > > 3. Do you know what would happen during a leader election?
> I'm
> > > not
> > > > > sure
> > > > > > > > about this process and I wonder if the current metadata
> response
> > > > uses
> > > > > > the
> > > > > > > > old leader or null as the leader isn't readily available yet.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > P
> > > > > > > >
> > > > > > > > On Fri, Jul 14, 2023 at 11:30 AM Kirk True <
> [email protected]>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Mayank,
> > > > > > > > >
> > > > > > > > > > On Jul 14, 2023, at 11:25 AM, Mayank Shekhar Narula <
> > > > > > > > > [email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > Kirk
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >> Is the requested restructuring of the response “simply”
> to
> > > > > > preserve
> > > > > > > > > bytes,
> > > > > > > > > >> or is it possible that the fetch response
> could/should/would
> > > > > > return
> > > > > > > > > >> leadership changes for partitions that we’re
> specifically
> > > > > > requested?
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > > Moving endpoints to top-level fields would preserve
> bytes,
> > > > > > otherwise
> > > > > > > > the
> > > > > > > > > > endpoint-information would be duplicated if included
> with the
> > > > > > > > > > partition-data in the response. Right now, the top-level
> > > field
> > > > > will
> > > > > > > > only
> > > > > > > > > be
> > > > > > > > > > set in case leader changes for any requested partitions.
> But
> > > it
> > > > > can
> > > > > > > be
> > > > > > > > > > re-used in the future, for which Jose has a use-case in
> mind
> > > > > shared
> > > > > > > up
> > > > > > > > in
> > > > > > > > > > the thread. KIP is now upto date with endpoint info
> being at
> > > > > > > top-level.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I didn’t catch before that there was a separate section
> for the
> > > > > full
> > > > > > > node
> > > > > > > > > information, not just the ID and epoch.
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > >>> 3. In the future, I may use this information in the
> > > > > > KRaft/Metadata
> > > > > > > > > >>> implementation of FETCH. In that implementation not
> all of
> > > > the
> > > > > > > > > >>> replicas are brokers.
> > > > > > > > > >>
> > > > > > > > > >> Side point: any references to the change you’re
> referring
> > > to?
> > > > > The
> > > > > > > idea
> > > > > > > > > of
> > > > > > > > > >> non-brokers serving as replicas is blowing my mind a
> bit :)
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > > Jose, I missed this as well, would love to know more
> about
> > > > > > non-broker
> > > > > > > > > > serving as replica!
> > > > > > > > > > --
> > > > > > > > > > Regards,
> > > > > > > > > > Mayank Shekhar Narula
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > > Mayank Shekhar Narula
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Mayank Shekhar Narula
> > > >
> > >
> >
> >
> > --
> > Regards,
> > Mayank Shekhar Narula
> >

Re: Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

Reply via email to