Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Guozhang Wang Mon, 09 Dec 2019 11:58:03 -0800

Hello Brian,

Thanks for your explanation, could you then update the wiki page for the
algorithm part since when I read it, I thought it was different from the
above, e.g. urgent topics should not be added just because of max.age
expiration, but should only be added if there are sending data pending.



Guozhang

On Mon, Dec 9, 2019 at 10:57 AM Brian Byrne <[email protected]> wrote:

> Hi Guozhang,
>
> Thanks for the feedback!
>
> On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang <[email protected]> wrote:
>
> > 1. The addition of *metadata.expiry.ms <http://metadata.expiry.ms>
> *should
> > be included in the public interface. Also its semantics needs more
> > clarification (since previously it is hard-coded internally we do not
> need
> > to explain it publicly, but now with the configurable value we do need).
> >
>
> This was an oversight. Done.
>
>
> > 2. There are a couple of hard-coded parameters like 25 and 0.5 in the
> > proposal, maybe we need to explain why these magic values makes sense in
> > common scenarios.
> >
>
> So these are pretty fuzzy numbers, and seemed to be a decent balance
> between trade-offs. I've updated the target size to account for setups with
> a large number of topics or a shorter refresh time, as well as added some
> light rationale.
>
>
> > 3. In the Urgent set condition, do you actually mean "with no cached
> > metadata AND there are existing data buffered for the topic"?
> >
>
> Yes, fixed.
>
>
>
> > One concern I have is whether or not we may introduce a regression,
> > especially during producer startup such that since we only require up to
> 25
> > topics each request, it may cause the send data to be buffered more time
> > than now due to metadata not available. I understand this is a
> acknowledged
> > trade-off in our design but any regression that may surface to users need
> > to be very carefully considered. I'm wondering, e.g. if we can tweak our
> > algorithm for the Urgent set, e.g. to consider those with non cached
> > metadata have higher priority than those who have elapsed max.age but not
> > yet have been called for sending. More specifically:
> >
> > Urgent: topics that have been requested for sending but no cached
> metadata,
> > and topics that have send request failed with e.g. NOT_LEADER.
> > Non-urgent: topics that are not in Urgent but have expired max.age.
> >
> > Then when sending metadata, we always send ALL in the urgent (i.e. ignore
> > the size limit), and only when they do not exceed the size limit,
> consider
> > fill in more topics from Non-urgent up to the size limit.
> >
>
> I think we're on the same page here. Urgent topics ignore the target
> metadata RPC size, and are not bound by it in any way, i.e. if there's 100
> urgent topics, we'll fetch all 100 in a single RPC. Like you say, however,
> if a topic transitions to urgent and there's several non-urgent ones, we'll
> piggyback the non-urgent updates up to the target size.
>
> Thanks,
>  Brian
>
>
> On Wed, Nov 20, 2019 at 7:00 PM deng ziming <[email protected]>
> > wrote:
> >
> > > I think it's ok, and you can add another issue about `asynchronous
> > > metadata` if `topic expiry` is not enough.
> > >
> > >
> > > On Thu, Nov 21, 2019 at 6:20 AM Brian Byrne <[email protected]>
> wrote:
> > >
> > > > Hello all,
> > > >
> > > > I've refactored the KIP to remove implementing asynchronous metadata
> > > > fetching in the producer during send(). It's now exclusively focused
> on
> > > > reducing the topic metadata fetch payload and proposes adding a new
> > > > configuration flag to control topic expiry behavior. Please take a
> look
> > > > when possible.
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > > > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > >
> > > > Thanks,
> > > > Brian
> > > >
> > > > On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <[email protected]>
> > wrote:
> > > >
> > > > > Lucas, Guozhang,
> > > > >
> > > > > Thank you for the comments. Good point on METADATA_MAX_AGE_CONFIG -
> > it
> > > > > looks like the ProducerMetadata was differentiating between expiry
> > and
> > > > > refresh, but it should be unnecessary to do so once the cost of
> > > fetching
> > > > a
> > > > > single topic's metadata is reduced.
> > > > >
> > > > > I've updated the rejected alternatives and removed the config
> > > variables.
> > > > >
> > > > > Brian
> > > > >
> > > > > On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <[email protected]>
> > > wrote:
> > > > >
> > > > >> Hello Brian,
> > > > >>
> > > > >> Thanks for the KIP.
> > > > >>
> > > > >> I think using asynchronous metadata update to address 1) metadata
> > > update
> > > > >> blocking send, but for other issues, currently at producer we do
> > have
> > > a
> > > > >> configurable `METADATA_MAX_AGE_CONFIG` similar to consumer, by
> > default
> > > > is
> > > > >> 5min. So maybe we do not need to introduce new configs here, but
> > only
> > > > >> change the semantics of that config from global expiry (today we
> > just
> > > > >> enforce a full metadata update for the whole cluster) to
> > single-topic
> > > > >> expiry, and we can also extend its expiry deadline whenever that
> > > > metadata
> > > > >> is successfully used to send a produce request.
> > > > >>
> > > > >>
> > > > >> Guozhang
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <
> [email protected]
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > Hi Brian,
> > > > >> >
> > > > >> > This looks great, and should help reduce blocking and high
> > metadata
> > > > >> request
> > > > >> > volumes when the producer is sending to large numbers of topics,
> > > > >> especially
> > > > >> > at low volumes. I think the approach to make metadata fetching
> > > > >> asynchronous
> > > > >> > and batch metadata requests together will help significantly.
> > > > >> >
> > > > >> > The only other approach I can think of is to allow users to
> supply
> > > the
> > > > >> > producer with the expected topics upfront, allowing the producer
> > to
> > > > >> perform
> > > > >> > a single initial metadata request before any sends occur. I see
> no
> > > > real
> > > > >> > advantages to this approach compared to the async method you’ve
> > > > >> proposed,
> > > > >> > but maybe we could add it to the rejected alternatives section?
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >> > Lucas
> > > > >> >
> > > > >> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <[email protected]>
> > > > wrote:
> > > > >> >
> > > > >> > > I've updated the 'Proposed Changes' to include two new
> producer
> > > > >> > > configuration variables: topic.expiry.ms and topic.refresh.ms
> .
> > > > Please
> > > > >> > take
> > > > >> > > a look.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Brian
> > > > >> > >
> > > > >> > > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <
> > [email protected]
> > > >
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > Dev team,
> > > > >> > > >
> > > > >> > > > Requesting discussion for improvement to the producer when
> > > dealing
> > > > >> > with a
> > > > >> > > > large number of topics.
> > > > >> > > >
> > > > >> > > > KIP:
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > >> > > >
> > > > >> > > > JIRA: https://issues.apache.org/jira/browse/KAFKA-8904
> > > > >> > > >
> > > > >> > > > Thoughts and feedback would be appreciated.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Brian
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >>
> > > > >> --
> > > > >> -- Guozhang
> > > > >>
> > > > >
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Reply via email to