Hello Brian,

Thanks for the updated PR and sorry for the late reply. I reviewed the page
again and here are some more comments:

Minor:

1. The addition of *metadata.expiry.ms <http://metadata.expiry.ms> *should
be included in the public interface. Also its semantics needs more
clarification (since previously it is hard-coded internally we do not need
to explain it publicly, but now with the configurable value we do need).
2. There are a couple of hard-coded parameters like 25 and 0.5 in the
proposal, maybe we need to explain why these magic values makes sense in
common scenarios.
3. In the Urgent set condition, do you actually mean "with no cached
metadata AND there are existing data buffered for the topic"?

Meta:

One concern I have is whether or not we may introduce a regression,
especially during producer startup such that since we only require up to 25
topics each request, it may cause the send data to be buffered more time
than now due to metadata not available. I understand this is a acknowledged
trade-off in our design but any regression that may surface to users need
to be very carefully considered. I'm wondering, e.g. if we can tweak our
algorithm for the Urgent set, e.g. to consider those with non cached
metadata have higher priority than those who have elapsed max.age but not
yet have been called for sending. More specifically:

Urgent: topics that have been requested for sending but no cached metadata,
and topics that have send request failed with e.g. NOT_LEADER.
Non-urgent: topics that are not in Urgent but have expired max.age.

Then when sending metadata, we always send ALL in the urgent (i.e. ignore
the size limit), and only when they do not exceed the size limit, consider
fill in more topics from Non-urgent up to the size limit.


Guozhang



On Wed, Nov 20, 2019 at 7:00 PM deng ziming <dengziming1...@gmail.com>
wrote:

> I think it's ok, and you can add another issue about `asynchronous
> metadata` if `topic expiry` is not enough.
>
>
> On Thu, Nov 21, 2019 at 6:20 AM Brian Byrne <bby...@confluent.io> wrote:
>
> > Hello all,
> >
> > I've refactored the KIP to remove implementing asynchronous metadata
> > fetching in the producer during send(). It's now exclusively focused on
> > reducing the topic metadata fetch payload and proposes adding a new
> > configuration flag to control topic expiry behavior. Please take a look
> > when possible.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> >
> > Thanks,
> > Brian
> >
> > On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <bby...@confluent.io> wrote:
> >
> > > Lucas, Guozhang,
> > >
> > > Thank you for the comments. Good point on METADATA_MAX_AGE_CONFIG - it
> > > looks like the ProducerMetadata was differentiating between expiry and
> > > refresh, but it should be unnecessary to do so once the cost of
> fetching
> > a
> > > single topic's metadata is reduced.
> > >
> > > I've updated the rejected alternatives and removed the config
> variables.
> > >
> > > Brian
> > >
> > > On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <wangg...@gmail.com>
> wrote:
> > >
> > >> Hello Brian,
> > >>
> > >> Thanks for the KIP.
> > >>
> > >> I think using asynchronous metadata update to address 1) metadata
> update
> > >> blocking send, but for other issues, currently at producer we do have
> a
> > >> configurable `METADATA_MAX_AGE_CONFIG` similar to consumer, by default
> > is
> > >> 5min. So maybe we do not need to introduce new configs here, but only
> > >> change the semantics of that config from global expiry (today we just
> > >> enforce a full metadata update for the whole cluster) to single-topic
> > >> expiry, and we can also extend its expiry deadline whenever that
> > metadata
> > >> is successfully used to send a produce request.
> > >>
> > >>
> > >> Guozhang
> > >>
> > >>
> > >>
> > >> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <lu...@confluent.io>
> > >> wrote:
> > >>
> > >> > Hi Brian,
> > >> >
> > >> > This looks great, and should help reduce blocking and high metadata
> > >> request
> > >> > volumes when the producer is sending to large numbers of topics,
> > >> especially
> > >> > at low volumes. I think the approach to make metadata fetching
> > >> asynchronous
> > >> > and batch metadata requests together will help significantly.
> > >> >
> > >> > The only other approach I can think of is to allow users to supply
> the
> > >> > producer with the expected topics upfront, allowing the producer to
> > >> perform
> > >> > a single initial metadata request before any sends occur. I see no
> > real
> > >> > advantages to this approach compared to the async method you’ve
> > >> proposed,
> > >> > but maybe we could add it to the rejected alternatives section?
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Lucas
> > >> >
> > >> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <bby...@confluent.io>
> > wrote:
> > >> >
> > >> > > I've updated the 'Proposed Changes' to include two new producer
> > >> > > configuration variables: topic.expiry.ms and topic.refresh.ms.
> > Please
> > >> > take
> > >> > > a look.
> > >> > >
> > >> > > Thanks,
> > >> > > Brian
> > >> > >
> > >> > > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <bby...@confluent.io
> >
> > >> > wrote:
> > >> > >
> > >> > > > Dev team,
> > >> > > >
> > >> > > > Requesting discussion for improvement to the producer when
> dealing
> > >> > with a
> > >> > > > large number of topics.
> > >> > > >
> > >> > > > KIP:
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > >> > > >
> > >> > > > JIRA: https://issues.apache.org/jira/browse/KAFKA-8904
> > >> > > >
> > >> > > > Thoughts and feedback would be appreciated.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Brian
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >> --
> > >> -- Guozhang
> > >>
> > >
> >
>


-- 
-- Guozhang

Reply via email to