Re: [VOTE] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Ted Yu Tue, 28 Jan 2020 11:46:24 -0800

+1

On Tue, Jan 28, 2020 at 10:52 AM Rajini Sivaram <rajinisiva...@gmail.com>
wrote:


> +1 (binding)
>
> Thanks for the KIP, Brian!
>
> Regards,
>
> Rajini
>
> On Thu, Jan 23, 2020 at 7:34 PM Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Sounds good. +1 from me.
> >
> > On Thu, Jan 23, 2020 at 9:00 AM Brian Byrne <bby...@confluent.io> wrote:
> >
> > > Thanks Jason,
> > >
> > > I'm in favor of the latter: metadata.max.idle.ms. I agree that
> > describing
> > > it as a "period" is inaccurate. With metadata.max.idle.ms, it also
> > aligns
> > > with metadata.max.age.ms for determining refresh period (which is an
> > > actual
> > > period).
> > >
> > > I've updated the docs.
> > >
> > > Thanks,
> > > Brian
> > >
> > > On Wed, Jan 22, 2020 at 6:19 PM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Thanks for the proposal. Looks good overall. I wanted to suggest a
> > > possible
> > > > name change. I was considering something like `
> > > idle.metadata.expiration.ms
> > > > `
> > > > or maybe `metadata.max.idle.ms`. Thoughts?
> > > >
> > > > -Jason
> > > >
> > > >
> > > > On Tue, Jan 21, 2020 at 11:38 AM Guozhang Wang <wangg...@gmail.com>
> > > wrote:
> > > >
> > > > > Got it.
> > > > >
> > > > > I was proposing that we do the "delayed async batch" but I think
> your
> > > > > argument for complexity and pushing it out of the scope is
> > convincing,
> > > so
> > > > > instead I propose we do the synchronous mini batching still but
> > > obviously
> > > > > it is already there :)  I'm +1 on the current proposal scope.
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Tue, Jan 21, 2020 at 10:16 AM Brian Byrne <bby...@confluent.io>
> > > > wrote:
> > > > >
> > > > > > Hi Guozhang,
> > > > > >
> > > > > > Ah, sorry, I misunderstood. Actually, this is solved for us
> today.
> > > How
> > > > > the
> > > > > > producer works is that it maintains at most one inflight metadata
> > > fetch
> > > > > > request at any time, where each request is tagged with the
> current
> > > > > > (monotonically increasing) request version. This version is
> bumped
> > > > > whenever
> > > > > > a new topic is encountered, and metadata fetching will continue
> to
> > > > > process
> > > > > > while the latest metadata response's version is below the current
> > > > > version.
> > > > > >
> > > > > > So if a metadata request is in flight, and a number of threads
> > > produce
> > > > to
> > > > > > new topics, they'll be added to the working set but the next
> > metadata
> > > > > > request won't take place until the outstanding one returns. So
> > their
> > > > > > updates will be batched together. As you suggest, we can have a
> > > simple
> > > > > list
> > > > > > that tracks unknown topics to isolate new vs. old topics.
> > > > > >
> > > > > > Thanks,
> > > > > > Brian
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 21, 2020 at 10:04 AM Guozhang Wang <
> wangg...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi Brian,
> > > > > > >
> > > > > > > I think I buy the complexity and extra end-to-end-latency
> > argument
> > > :)
> > > > > I'm
> > > > > > > fine with delaying the asynchronous tech fetching to future
> works
> > > and
> > > > > > keep
> > > > > > > the current KIP's scope as-is for now. Under that case can we
> > > > consider
> > > > > > just
> > > > > > > a minor implementation detail (since it is not affecting public
> > > APIs
> > > > we
> > > > > > > probably do not even need to list it, but just thinking loud
> > here):
> > > > > > >
> > > > > > > In your proposal when we request for a topic of unknown
> metadata,
> > > we
> > > > > are
> > > > > > > going to directly set the topic name as that singleton in the
> > > > request.
> > > > > > I'm
> > > > > > > wondering for the scenario that KAFKA-8904 described, if the
> > > > > > producer#send
> > > > > > > for thousands of new topics are triggered sequentially by a
> > single
> > > > > thread
> > > > > > > or concurrent threads? If it's the latter, and we expect in
> such
> > > > > > scenarios
> > > > > > > we may have multiple topics being requests within a very short
> > > time,
> > > > > then
> > > > > > > we can probably do sth. like this internally in a synchronized
> > > > manner:
> > > > > > >
> > > > > > > 1) put the topic name into a list, as "unknown topics", then
> > > > > > > 2) exhaust the list, and put all topics from that list to the
> > > > request;
> > > > > if
> > > > > > > the list is empty, it means it has been emptied by another
> thread
> > > so
> > > > we
> > > > > > > skip sending a new request and just wait for the returned
> > metadata
> > > > > > refresh.
> > > > > > >
> > > > > > > In most cases the list would just be a singleton with the one
> > that
> > > > > thread
> > > > > > > has just enqueued, but under extreme scenarios it can help
> > > batching a
> > > > > few
> > > > > > > topic names probably (of course, I'm thinking about very
> extreme
> > > > cases
> > > > > > > here, assuming that's was what we've seen in 8904). Since these
> > two
> > > > > steps
> > > > > > > are very light-weighted, doing that in a synchronized block
> would
> > > not
> > > > > > hurt
> > > > > > > the concurrency too much.
> > > > > > >
> > > > > > >
> > > > > > > Guozhang
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jan 21, 2020 at 9:39 AM Brian Byrne <
> bby...@confluent.io
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Guozhang,
> > > > > > > >
> > > > > > > > Your understanding of the rationale is accurate, and what you
> > > > suggest
> > > > > > is
> > > > > > > > completely plausible, however I have a slightly different
> take
> > on
> > > > the
> > > > > > > > situation.
> > > > > > > >
> > > > > > > > When the KIP was originally drafted, making
> KafkaProducer#send
> > > > > > > asynchronous
> > > > > > > > was one element of the changes (this is a little more general
> > > than
> > > > > (a),
> > > > > > > but
> > > > > > > > has similar implications). As you're aware, doing so would
> > allow
> > > > new
> > > > > > > topics
> > > > > > > > to aggregate since the producer could continue to push new
> > > records,
> > > > > > > whereas
> > > > > > > > today the producer thread is blocked waiting for resolution.
> > > > > > > >
> > > > > > > > However, there were concerns about changing client behavior
> > > > > > unexpectedly
> > > > > > > in
> > > > > > > > this manner, and the change isn't as trivial as one would
> hope.
> > > For
> > > > > > > > example, we'd have to introduce an intermediate queue of
> > records
> > > > for
> > > > > > > topics
> > > > > > > > without metadata, and have that play well with the buffer
> pool
> > > > which
> > > > > > > > ensures the memory limit isn't exceeded. A side effect is
> that
> > a
> > > > > > producer
> > > > > > > > could hit 'memory full' conditions easier, which could have
> > > > > unintended
> > > > > > > > consequences if, say, the model was setup such that different
> > > > > producer
> > > > > > > > threads produced to a disjoint set of topics. Where one
> > producer
> > > > > thread
> > > > > > > was
> > > > > > > > blocked waiting for new metadata, it could now push enough
> data
> > > to
> > > > > > block
> > > > > > > > all producer threads due to memory limits, so we'd need to be
> > > > careful
> > > > > > > here.
> > > > > > > >
> > > > > > > > For case (a) described, another concern would be adding
> > > additional
> > > > a
> > > > > > new
> > > > > > > > source of latency (possibly seconds) for new topics. Not a
> huge
> > > > > issue,
> > > > > > > but
> > > > > > > > it is new behavior to existing clients and adds to the
> > complexity
> > > > of
> > > > > > > > verifying no major regressions.
> > > > > > > >
> > > > > > > > It also wouldn't resolve all cases we're interested in. One
> > > > behavior
> > > > > > > we're
> > > > > > > > witnessing is the following: a producer generates to a very
> > large
> > > > > > number
> > > > > > > of
> > > > > > > > topics (several thousand), however the period of consecutive
> > > > records
> > > > > > for
> > > > > > > a
> > > > > > > > topic can often be beyond the current hard-coded expiry of 5
> > > > minutes.
> > > > > > > > Therefore, when the producer does submit a request for this
> > topic
> > > > > > after 5
> > > > > > > > minutes, it has to request all of the topic metadata again.
> > While
> > > > > > > batching
> > > > > > > > new topics in the start-up case would definitely help, here
> it
> > > > would
> > > > > > > likely
> > > > > > > > be lost effort.
> > > > > > > >
> > > > > > > > Being able to increase the metadata eviction for the above
> case
> > > > would
> > > > > > > > improve things, but there's no perfect value when consecutive
> > > > produce
> > > > > > > times
> > > > > > > > can be modelled as a probability distribution. Rather, the
> > better
> > > > > > > solution
> > > > > > > > could be what we discussed earlier, where we'd lower the
> > eviction
> > > > > > > timeout,
> > > > > > > > but make the cost of fetching uncached topic metadata much
> > > smaller.
> > > > > > > >
> > > > > > > > The changes in the KIP were meant to improve the general case
> > > > without
> > > > > > > > affecting external client behavior, and then the plan was to
> > fix
> > > > the
> > > > > > > > asynchronous send in the next iteration, if necessary. Point
> > (b)
> > > is
> > > > > > along
> > > > > > > > the lines of the latest revision: only send requests for
> > uncached
> > > > > > topics,
> > > > > > > > if they exist, otherwise request the full working set.
> > > > Piggy-backing
> > > > > > was
> > > > > > > > originally included, but removed because its utility was in
> > > doubt.
> > > > > > > >
> > > > > > > > So to summarize, you're correct in that asynchronous topic
> > > fetching
> > > > > > would
> > > > > > > > be a big improvement, and should be an item of future work.
> > > However
> > > > > > what
> > > > > > > > this KIP proposes should be the safest/easiest set of changes
> > to
> > > > > > resolve
> > > > > > > > the current pain points. Please let me know if you
> > agree/disagree
> > > > > with
> > > > > > > this
> > > > > > > > assessment.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Brian
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Jan 20, 2020 at 10:52 AM Guozhang Wang <
> > > wangg...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hello Brian,
> > > > > > > > >
> > > > > > > > > I looked at the new proposal again, and I'd like to reason
> > > about
> > > > > its
> > > > > > > > > rationale from the listed motivations in your wiki:
> > > > > > > > >
> > > > > > > > > 1) more RPCs: we may send metadata requests more frequently
> > > than
> > > > > > > > > appropriate. This is especially the case during producer
> > > > start-up,
> > > > > > > where
> > > > > > > > > the more topics it needs to send to, the more metadata
> > requests
> > > > it
> > > > > > > needs
> > > > > > > > to
> > > > > > > > > send. This the original reported issue as in KAFKA-8904.
> > > > > > > > >
> > > > > > > > > 2) large RPCs: we including all topics in the work set when
> > > > sending
> > > > > > > > > metadata request. But I think our conjecture (as Colin has
> > > > pointed
> > > > > > out)
> > > > > > > > is
> > > > > > > > > that this alone is fine most of the time, assuming e.g. you
> > are
> > > > > > sending
> > > > > > > > > such large RPC only once every 10 minutes. It is only
> because
> > > of
> > > > 1)
> > > > > > > where
> > > > > > > > > you are sending large RPC too frequently which is a common
> > > issue.
> > > > > > > > >
> > > > > > > > > 3) we want to have a configurable eviction period than
> > > hard-coded
> > > > > > > > values. I
> > > > > > > > > consider it as a semi-orthogonal motivation compared with
> 2)
> > /
> > > 3)
> > > > > but
> > > > > > > we
> > > > > > > > > wanted to piggy-back this fix along with the KIP.
> > > > > > > > >
> > > > > > > > > So from there, 1) and 2) does not contradict to each other
> > > since
> > > > > our
> > > > > > > > belief
> > > > > > > > > is that large RPCs is usually okay as long as it is not
> > > > > > > > large-and-frequent
> > > > > > > > > RPCs, and we actually prefer large-infrequent RPC >
> > > > > smaller-frequent
> > > > > > > RPC
> > > > > > > > >
> > > > > > > > > large-and-frequent RPC (of course).
> > > > > > > > >
> > > > > > > > > The current proposal basically tries to un-tangle 2) from
> 1),
> > > > i.e.
> > > > > > for
> > > > > > > > the
> > > > > > > > > scenario of KAFKA-8904 it would result in smaller-frequent
> > RPC
> > > > > during
> > > > > > > > > startup than large-and-frequent RPC. But I'm wondering why
> > > don't
> > > > we
> > > > > > > just
> > > > > > > > do
> > > > > > > > > even better and make it large-infrequent RPC? More
> > > specifically,
> > > > > just
> > > > > > > > like
> > > > > > > > > Lucas suggested in the ticket:
> > > > > > > > >
> > > > > > > > > a. when there's new topic with unknown metadata enqueued,
> > > instead
> > > > > of
> > > > > > > > > requesting a metadata immediately just delay it for a short
> > > > period
> > > > > > (no
> > > > > > > > more
> > > > > > > > > than seconds) hoping that more unknown topics would be
> > > requested
> > > > in
> > > > > > the
> > > > > > > > > period; during this period we would not know which
> partition
> > it
> > > > > would
> > > > > > > go
> > > > > > > > to
> > > > > > > > > of course, so we buffer it in a different manner.
> > > > > > > > >
> > > > > > > > > b. when we are about to send metadata, if there are unknown
> > > > > topic(s)
> > > > > > --
> > > > > > > > > consider them "urgent topics" -- just send them without
> other
> > > > > topics;
> > > > > > > > > otherwise, send the work set in the request. If we want to
> go
> > > > even
> > > > > > > > fancier,
> > > > > > > > > we can still piggy-back some non-urgent along with urgent
> > ones
> > > > but
> > > > > it
> > > > > > > is
> > > > > > > > > more complicated to reason about the trade-off so a simpler
> > > > > approach
> > > > > > is
> > > > > > > > > fine too.
> > > > > > > > >
> > > > > > > > > c. fixing 3) with a new config, which is relatively
> > orthogonal
> > > to
> > > > > a)
> > > > > > > and
> > > > > > > > > b).
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Guozhang
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Jan 14, 2020 at 10:39 AM Brian Byrne <
> > > > bby...@confluent.io>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hello all,
> > > > > > > > > >
> > > > > > > > > > After further offline discussion, I've removed any
> efforts
> > to
> > > > > > control
> > > > > > > > > > metadata RPC sizes. There are now only two items proposed
> > in
> > > > this
> > > > > > > KIP:
> > > > > > > > > >
> > > > > > > > > > (1) When encountering a new topic, only issue a metadata
> > > > request
> > > > > > for
> > > > > > > > that
> > > > > > > > > > particular topic. For all other cases, continue as it
> does
> > > > today
> > > > > > > with a
> > > > > > > > > > full working set refresh.
> > > > > > > > > >
> > > > > > > > > > (2) Introduces client configuration flag "
> > > > > > > metadata.eviction.period.ms"
> > > > > > > > > to
> > > > > > > > > > control cache eviction duration. I've reset the default
> > back
> > > to
> > > > > the
> > > > > > > > > current
> > > > > > > > > > (hard-coded) value of 5 minutes since we can identify
> cases
> > > > where
> > > > > > > > > changing
> > > > > > > > > > it would cause surprises.
> > > > > > > > > >
> > > > > > > > > > The votes have been cleared. My apologies for continually
> > > > > > > interrupting
> > > > > > > > > and
> > > > > > > > > > making changes to the KIP, but hopefully this is an
> > agreeable
> > > > > > minimum
> > > > > > > > > > solution to move forward.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Brian
> > > > > > > > > >
> > > > > > > > > > On Mon, Jan 6, 2020 at 5:23 PM Colin McCabe <
> > > > cmcc...@apache.org>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > On Mon, Jan 6, 2020, at 14:40, Brian Byrne wrote:
> > > > > > > > > > > > So the performance of a metadata RPC that occurs
> every
> > > once
> > > > > > every
> > > > > > > > 10
> > > > > > > > > > > > seconds should not be measured strictly in CPU cost,
> > but
> > > > > rather
> > > > > > > the
> > > > > > > > > > > effect
> > > > > > > > > > > > on the 95-99%. The larger the request is, the more
> > > > > opportunity
> > > > > > > > there
> > > > > > > > > is
> > > > > > > > > > > to
> > > > > > > > > > > > put a burst stress on the producer and broker, and
> the
> > > > larger
> > > > > > the
> > > > > > > > > > > response
> > > > > > > > > > > > payload to push through the control plane socket.
> Maybe
> > > > > that's
> > > > > > > not
> > > > > > > > at
> > > > > > > > > > 5k
> > > > > > > > > > > > topics, but there are groups that are 10k+ topics and
> > > > pushing
> > > > > > > > > further.
> > > > > > > > > > >
> > > > > > > > > > > KAFKA-7019 made reading the metadata lock-free.  There
> is
> > > no
> > > > a
> > > > > > > priori
> > > > > > > > > > > reason to prefer lots of small requests to a few big
> > > requests
> > > > > > > (within
> > > > > > > > > > > reason!)  In fact, it's quite the opposite: when we
> make
> > > lots
> > > > > of
> > > > > > > > small
> > > > > > > > > > > requests, it uses more network bandwidth than when we
> > make
> > > a
> > > > > few
> > > > > > > big
> > > > > > > > > > ones.
> > > > > > > > > > > There are a few reasons for this: the request and
> > response
> > > > > > headers
> > > > > > > > > have a
> > > > > > > > > > > fixed overhead, one big array takes less space when
> > > > serialized
> > > > > > than
> > > > > > > > > > several
> > > > > > > > > > > small ones, etc.  There is also TCP and IP overhead,
> etc.
> > > > > > > > > > >
> > > > > > > > > > > The broker can only push a few tens of thousands of
> > > metadata
> > > > > > > > requests a
> > > > > > > > > > > second, due to the overhead of message processing.
> This
> > is
> > > > why
> > > > > > > > almost
> > > > > > > > > > all
> > > > > > > > > > > of the admin commands support batching.  So if you need
> > to
> > > > > create
> > > > > > > > 1,000
> > > > > > > > > > > topics, you make one request, not 1,000 requests, for
> > > > example.
> > > > > > > > > > >
> > > > > > > > > > > It's definitely reasonable to limit the number of
> topics
> > > made
> > > > > per
> > > > > > > > > > metadata
> > > > > > > > > > > request.  But the reason for this is not improving
> > > > performance,
> > > > > > but
> > > > > > > > > > > preventing certain bad corner cases that happen when
> RPCs
> > > get
> > > > > too
> > > > > > > > big.
> > > > > > > > > > For
> > > > > > > > > > > example, one problem that can happen when a metadata
> > > response
> > > > > > gets
> > > > > > > > too
> > > > > > > > > > big
> > > > > > > > > > > is that the client could time out before it finishes
> > > reading
> > > > > the
> > > > > > > > > > response.
> > > > > > > > > > > Or if the response got way too big, it could even
> exceed
> > > the
> > > > > > > maximum
> > > > > > > > > > > response size.
> > > > > > > > > > >
> > > > > > > > > > > So I think the limit should be pretty high here.  We
> > might
> > > > also
> > > > > > > > > consider
> > > > > > > > > > > putting the limit in terms of number of partitions
> rather
> > > > than
> > > > > > > number
> > > > > > > > > of
> > > > > > > > > > > topics, since that's what really matters here (this is
> > > harder
> > > > > to
> > > > > > > > > > implement,
> > > > > > > > > > > I realize...)  If I had to put a rough number on it,
> I'd
> > > say
> > > > we
> > > > > > > don't
> > > > > > > > > > want
> > > > > > > > > > > more than like 50 MB of response data.  This is vaguely
> > in
> > > > line
> > > > > > > with
> > > > > > > > > how
> > > > > > > > > > we
> > > > > > > > > > > do fetch responses as well (although I think the limit
> > > there
> > > > is
> > > > > > > > > higher).
> > > > > > > > > > >
> > > > > > > > > > > We should also keep in mind that anyone with a wildcard
> > > > > > > subscription
> > > > > > > > is
> > > > > > > > > > > making full metadata requests, which will return back
> > > > > information
> > > > > > > > about
> > > > > > > > > > > every topic in the system.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > There's definitely weight to the metadata RPCs.
> Looking
> > > at
> > > > a
> > > > > > > > previous
> > > > > > > > > > > > local, non-loaded test I ran, I calculate about 2
> > > > > microseconds
> > > > > > > per
> > > > > > > > > > > > partition latency to the producer. At 10,000 topics
> > with
> > > > 100
> > > > > > > > > partitions
> > > > > > > > > > > > each, that's a full 2-second bubble in the best
> case. I
> > > can
> > > > > > > rerun a
> > > > > > > > > > more
> > > > > > > > > > > > targeted performance test, but I feel that's missing
> > the
> > > > > point.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > If the metadata is fetched in the background, there
> > should
> > > be
> > > > > no
> > > > > > > > impact
> > > > > > > > > > on
> > > > > > > > > > > producer latency, right?
> > > > > > > > > > >
> > > > > > > > > > > It would be good to talk more about the importance of
> > > > > background
> > > > > > > > > metadata
> > > > > > > > > > > fetching in the KIP.  The fact that we don't do this is
> > > > > actually
> > > > > > a
> > > > > > > > big
> > > > > > > > > > > problem with the current implementation.  As I
> understand
> > > it,
> > > > > > when
> > > > > > > > the
> > > > > > > > > > > metadata gets too old, we slam on the brakes and wait
> > for a
> > > > > > > metadata
> > > > > > > > > > fetch
> > > > > > > > > > > to complete, rather than starting the metadata fetch
> > BEFORE
> > > > we
> > > > > > need
> > > > > > > > it.
> > > > > > > > > > > It's just bad scheduling.
> > > > > > > > > > >
> > > > > > > > > > > best,
> > > > > > > > > > > Colin
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Brian
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jan 6, 2020 at 1:31 PM Colin McCabe <
> > > > > > cmcc...@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Jan 6, 2020, at 13:07, Brian Byrne wrote:
> > > > > > > > > > > > > > Hi Colin,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks again for the feedback!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Jan 6, 2020 at 12:07 PM Colin McCabe <
> > > > > > > > cmcc...@apache.org
> > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Metadata requests don't (always) go to the
> > > > controller,
> > > > > > > right?
> > > > > > > > > We
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > > fix the wording in this section.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > You're correct, s/controller/broker(s)/.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I feel like "Proposed Changes" should come before
> > > > "Public
> > > > > > > > > > Interfaces"
> > > > > > > > > > > > > > > here.  The new configuration won't make sense
> to
> > > the
> > > > > > reader
> > > > > > > > > until
> > > > > > > > > > > he
> > > > > > > > > > > > > or she
> > > > > > > > > > > > > > > has read the "changes" section.  Also, it's not
> > > clear
> > > > > > from
> > > > > > > > the
> > > > > > > > > > name
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > "metadata evict" refers to a span of time.
> What
> > do
> > > > you
> > > > > > > think
> > > > > > > > > > > about "
> > > > > > > > > > > > > > > metadata.eviction.period.ms" as a
> configuration
> > > > name?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Sure, makes sense. Updated order and config name.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Where is the "10 seconds" coming from here?
> The
> > > > > current
> > > > > > > > > default
> > > > > > > > > > > for
> > > > > > > > > > > > > > > metadata.max.age.ms is 5 * 60 * 1000 ms, which
> > > > implies
> > > > > > > that
> > > > > > > > we
> > > > > > > > > > > want to
> > > > > > > > > > > > > > > refresh every 5 minutes.  Definitely not every
> 10
> > > > > > seconds.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The 10 seconds is another arbitrary value to
> > prevent
> > > a
> > > > > > large
> > > > > > > > > number
> > > > > > > > > > > of
> > > > > > > > > > > > > RPCs
> > > > > > > > > > > > > > if the target batch size were fixed at 20. For
> > > example,
> > > > > if
> > > > > > > > > there's
> > > > > > > > > > > 5,000
> > > > > > > > > > > > > > topics with a 5-minute interval, then instead of
> > > > issuing
> > > > > an
> > > > > > > RPC
> > > > > > > > > > every
> > > > > > > > > > > > > > 1.2 seconds with batch size of 20, it would issue
> > an
> > > > RPC
> > > > > > > every
> > > > > > > > 10
> > > > > > > > > > > seconds
> > > > > > > > > > > > > > with batch size of 167.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hmm.  This will lead to many more RPCs compared to
> > the
> > > > > > current
> > > > > > > > > > > situation
> > > > > > > > > > > > > of issuing an RPC every 5 minutes with 5,000
> topics,
> > > > right?
> > > > > > > See
> > > > > > > > > > below
> > > > > > > > > > > for
> > > > > > > > > > > > > more discussion.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Stepping back a little bit, it seems like the
> big
> > > > > problem
> > > > > > > you
> > > > > > > > > > > > > identified
> > > > > > > > > > > > > > > is the O(N^2) behavior of producing to X, then
> Y,
> > > > then
> > > > > Z,
> > > > > > > > etc.
> > > > > > > > > > etc.
> > > > > > > > > > > > > where
> > > > > > > > > > > > > > > each new produce to a fresh topic triggers a
> > > metadata
> > > > > > > request
> > > > > > > > > > with
> > > > > > > > > > > all
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > preceding topics included.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Of course we need to send out a metadata
> request
> > > > before
> > > > > > > > > producing
> > > > > > > > > > > to X,
> > > > > > > > > > > > > > > then Y, then Z, but that request could just
> > specify
> > > > X,
> > > > > or
> > > > > > > > just
> > > > > > > > > > > specify
> > > > > > > > > > > > > Y,
> > > > > > > > > > > > > > > etc. etc.  In other words, we could decouple
> > > decouple
> > > > > the
> > > > > > > > > routine
> > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > fetch which happens on a 5 minute timer from
> the
> > > need
> > > > > to
> > > > > > > > fetch
> > > > > > > > > > > > > metadata for
> > > > > > > > > > > > > > > something specific right now.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I guess my question is, do we really need to
> > allow
> > > > > > routine
> > > > > > > > > > metadata
> > > > > > > > > > > > > > > fetches to "piggyback" on the emergency
> metadata
> > > > > fetches?
> > > > > > > It
> > > > > > > > > > adds
> > > > > > > > > > > a
> > > > > > > > > > > > > lot of
> > > > > > > > > > > > > > > complexity, and we don't have any benchmarks
> > > proving
> > > > > that
> > > > > > > > it's
> > > > > > > > > > > better.
> > > > > > > > > > > > > > > Also, as I understand it, whether we piggyback
> or
> > > > not,
> > > > > > the
> > > > > > > > > number
> > > > > > > > > > > of
> > > > > > > > > > > > > > > metadata fetches is the same, right?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So it's possible to do as you suggest, but I
> would
> > > > argue
> > > > > > that
> > > > > > > > > it'd
> > > > > > > > > > be
> > > > > > > > > > > > > more
> > > > > > > > > > > > > > complex with how the code is structured and
> > wouldn't
> > > > add
> > > > > > any
> > > > > > > > > extra
> > > > > > > > > > > > > > complexity. The derived metadata class
> effectively
> > > > > respond
> > > > > > > to a
> > > > > > > > > > > > > > notification that a metadata RPC is going to be
> > > issued,
> > > > > > where
> > > > > > > > > they
> > > > > > > > > > > return
> > > > > > > > > > > > > > the metadata request structure with topics to
> > > refresh,
> > > > > > which
> > > > > > > is
> > > > > > > > > > > decoupled
> > > > > > > > > > > > > > from what generated the event (new topic, stale
> > > > metadata,
> > > > > > > > > periodic
> > > > > > > > > > > > > refresh
> > > > > > > > > > > > > > alarm). There is also a strict implementation
> > detail
> > > > that
> > > > > > > only
> > > > > > > > > one
> > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > request can be outstanding at any time, which
> lends
> > > > > itself
> > > > > > to
> > > > > > > > > > > consolidate
> > > > > > > > > > > > > > complexity in the base metadata class and have
> the
> > > > > derived
> > > > > > > use
> > > > > > > > > the
> > > > > > > > > > > > > "listen
> > > > > > > > > > > > > > for next update" model.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > By maintaining an ordered list of topics by their
> > > last
> > > > > > > metadata
> > > > > > > > > > > refresh
> > > > > > > > > > > > > > time (0 for new topics), it's a matter of pulling
> > > from
> > > > > the
> > > > > > > > front
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > > > list to see which topics should be included in
> the
> > > next
> > > > > > > update.
> > > > > > > > > > > Always
> > > > > > > > > > > > > > include all urgent topics, then include
> non-urgent
> > > > (i.e.
> > > > > > need
> > > > > > > > to
> > > > > > > > > be
> > > > > > > > > > > > > > refreshed soon-ish) up to the target batch size.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The number of metadata fetches could be reduced.
> > > > > Assuming a
> > > > > > > > > target
> > > > > > > > > > > batch
> > > > > > > > > > > > > > size of 20, a new topic might also refresh 19
> other
> > > > > > "refresh
> > > > > > > > > soon"
> > > > > > > > > > > topics
> > > > > > > > > > > > > > in the same RPC, as opposed to those 19 being
> > handled
> > > > in
> > > > > a
> > > > > > > > > > subsequent
> > > > > > > > > > > > > RPC.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Although to counter the above, the
> > > > batching/piggybacking
> > > > > > > logic
> > > > > > > > > > isn't
> > > > > > > > > > > > > > necessarily about reducing the total number of
> > RPCs,
> > > > but
> > > > > > > rather
> > > > > > > > > to
> > > > > > > > > > > > > > distribute the load more evenly over time. For
> > > example,
> > > > > if
> > > > > > a
> > > > > > > > > large
> > > > > > > > > > > number
> > > > > > > > > > > > > > of topics need to be refreshed at the approximate
> > > same
> > > > > time
> > > > > > > > > (common
> > > > > > > > > > > for
> > > > > > > > > > > > > > startups cases that hit a large number of
> topics),
> > > the
> > > > > > > updates
> > > > > > > > > are
> > > > > > > > > > > more
> > > > > > > > > > > > > > evenly distributed to avoid a flood.
> > > > > > > > > > > > >
> > > > > > > > > > > > > It wouldn't be a flood in the current case, right?
> > It
> > > > > would
> > > > > > > just
> > > > > > > > > be
> > > > > > > > > > a
> > > > > > > > > > > > > single metadata request for a lot of topics.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Let's compare the two cases.  In the current
> > scenario,
> > > we
> > > > > > have
> > > > > > > 1
> > > > > > > > > > > metadata
> > > > > > > > > > > > > request every 5 minutes.  This request is for 5,000
> > > > topics
> > > > > > > (let's
> > > > > > > > > > > say).  In
> > > > > > > > > > > > > the new scenario, we have a request every 10
> seconds
> > > for
> > > > > 167
> > > > > > > > topics
> > > > > > > > > > > each.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Which do you think will be more expensive?  I think
> > the
> > > > > > second
> > > > > > > > > > scenario
> > > > > > > > > > > > > certainly will because of the overhead of 30x as
> many
> > > > > > requests
> > > > > > > > send
> > > > > > > > > > > over
> > > > > > > > > > > > > the wire.  Metadata accesses are now lockless, so
> the
> > > big
> > > > > > > > metadata
> > > > > > > > > > > request
> > > > > > > > > > > > > just isn't that much of a problem.  I bet if you
> > > > benchmark
> > > > > > it,
> > > > > > > > > > sending
> > > > > > > > > > > back
> > > > > > > > > > > > > metadata for 167 topics won't be that much cheaper
> > than
> > > > > > sending
> > > > > > > > > back
> > > > > > > > > > > > > metadata for 5k.  Certainly not 30x cheaper.  There
> > > will
> > > > > > > > eventually
> > > > > > > > > > be
> > > > > > > > > > > a
> > > > > > > > > > > > > point where we need to split metadata requests, but
> > > it's
> > > > > > > > definitely
> > > > > > > > > > > not at
> > > > > > > > > > > > > 5,000 topics.
> > > > > > > > > > > > >
> > > > > > > > > > > > > regards,
> > > > > > > > > > > > > Colin
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Brian
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Jan 6, 2020, at 10:26, Lucas Bradstreet
> > > > wrote:
> > > > > > > > > > > > > > > > +1 (non-binding)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, Jan 2, 2020 at 11:15 AM Brian Byrne <
> > > > > > > > > > bby...@confluent.io
> > > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hello all,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > After further discussion and improvements,
> > I'd
> > > > like
> > > > > > to
> > > > > > > > > > > reinstate
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > voting
> > > > > > > > > > > > > > > > > process.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The updated KIP:
> > > > > > > > > > > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > > > > > > > > > > > > > > > >
> > > > > > > > > >
> > > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > > > > > > > > > > > <
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The continued discussion:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/b2f8f830ef04587144cf0840c7d4811bbf0a14f3c459723dbc5acf9e@%3Cdev.kafka.apache.org%3E
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I'd be happy to address any further
> > > > > > comments/feedback.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > Brian
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Mon, Dec 9, 2019 at 11:02 PM Guozhang
> > Wang <
> > > > > > > > > > > wangg...@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > With the concluded summary on the other
> > > > > discussion
> > > > > > > > > thread,
> > > > > > > > > > > I'm
> > > > > > > > > > > > > +1 on
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > proposal.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks Brian!
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Tue, Nov 19, 2019 at 8:00 PM deng
> > ziming <
> > > > > > > > > > > > > > > dengziming1...@gmail.com>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > For new (uncached) topics, one
> problem
> > > here
> > > > > is
> > > > > > > that
> > > > > > > > > we
> > > > > > > > > > > don't
> > > > > > > > > > > > > know
> > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > partition to map a record to in the
> > event
> > > > > that
> > > > > > it
> > > > > > > > > has a
> > > > > > > > > > > key
> > > > > > > > > > > > > or
> > > > > > > > > > > > > > > custom
> > > > > > > > > > > > > > > > > > > > partitioner, so the RecordAccumulator
> > > > > wouldn't
> > > > > > > know
> > > > > > > > > > which
> > > > > > > > > > > > > > > > > batch/broker
> > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > belongs. We'd need an intermediate
> > record
> > > > > queue
> > > > > > > > that
> > > > > > > > > > > > > subsequently
> > > > > > > > > > > > > > > > > moved
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > records into RecordAccumulators once
> > > > metadata
> > > > > > > > > > resolution
> > > > > > > > > > > was
> > > > > > > > > > > > > > > > > complete.
> > > > > > > > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > > > > > known topics, we don't currently
> block
> > at
> > > > all
> > > > > > in
> > > > > > > > > > > > > waitOnMetadata.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > You are right, I forget this fact, and
> > the
> > > > > > > > intermediate
> > > > > > > > > > > record
> > > > > > > > > > > > > > > queue
> > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > help, but I have some questions
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > if we add an intermediate record queue
> in
> > > > > > > > > KafkaProducer,
> > > > > > > > > > > when
> > > > > > > > > > > > > > > should we
> > > > > > > > > > > > > > > > > > > move the records into
> RecordAccumulators?
> > > > > > > > > > > > > > > > > > > only NetworkClient is aware of the
> > > > > > > MetadataResponse,
> > > > > > > > > here
> > > > > > > > > > > is
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > hierarchical structure of the related
> > > > classes:
> > > > > > > > > > > > > > > > > > > KafkaProducer
> > > > > > > > > > > > > > > > > > >     Accumulator
> > > > > > > > > > > > > > > > > > >     Sender
> > > > > > > > > > > > > > > > > > >         NetworkClient
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > >  metadataUpdater.handleCompletedMetadataResponse
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > so
> > > > > > > > > > > > > > > > > > > 1. we should also add a metadataUpdater
> > to
> > > > > > > > > KafkaProducer?
> > > > > > > > > > > > > > > > > > > 2. if the topic really does not exists?
> > the
> > > > > > > > > intermediate
> > > > > > > > > > > record
> > > > > > > > > > > > > > > queue
> > > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > become too large?
> > > > > > > > > > > > > > > > > > > 3. and should we `block` when the
> > > > intermediate
> > > > > > > record
> > > > > > > > > > > queue is
> > > > > > > > > > > > > too
> > > > > > > > > > > > > > > > > large?
> > > > > > > > > > > > > > > > > > > and this will again bring the blocking
> > > > problem?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, Nov 20, 2019 at 12:40 AM Brian
> > > Byrne
> > > > <
> > > > > > > > > > > > > bby...@confluent.io>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hi Deng,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks for the feedback.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Mon, Nov 18, 2019 at 6:56 PM deng
> > > > ziming <
> > > > > > > > > > > > > > > > > dengziming1...@gmail.com>
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > hi, I reviewed the current code,
> the
> > > > > > > > > ProduceMetadata
> > > > > > > > > > > > > maintains
> > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > expiry
> > > > > > > > > > > > > > > > > > > > > threshold for every topic, every
> time
> > > > when
> > > > > we
> > > > > > > > write
> > > > > > > > > > to
> > > > > > > > > > > a
> > > > > > > > > > > > > topic
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > set
> > > > > > > > > > > > > > > > > > > > > the expiry time to -1 to indicate
> it
> > > > should
> > > > > > be
> > > > > > > > > > updated,
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > does
> > > > > > > > > > > > > > > > > > work
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > reduce the size of the topic
> working
> > > set,
> > > > > but
> > > > > > > the
> > > > > > > > > > > producer
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > continue
> > > > > > > > > > > > > > > > > > > > > fetching metadata for these topics
> in
> > > > every
> > > > > > > > > metadata
> > > > > > > > > > > > > request
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > full
> > > > > > > > > > > > > > > > > > > > > expiry duration.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Indeed, you are correct, I terribly
> > > misread
> > > > > the
> > > > > > > > code
> > > > > > > > > > > here.
> > > > > > > > > > > > > > > > > Fortunately
> > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > was only a minor optimization in the
> > KIP
> > > > > that's
> > > > > > > no
> > > > > > > > > > longer
> > > > > > > > > > > > > > > necessary.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > and we can improve the situation by 2
> > > > means:
> > > > > > > > > > > > > > > > > > > > >     1. we maintain a refresh
> > threshold
> > > > for
> > > > > > > every
> > > > > > > > > > topic
> > > > > > > > > > > > > which
> > > > > > > > > > > > > > > is for
> > > > > > > > > > > > > > > > > > > > example
> > > > > > > > > > > > > > > > > > > > > 0.8 * expiry_threshold, and when we
> > > send
> > > > > > > > > > > `MetadataRequest`
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > brokers
> > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > just request unknownLeaderTopics +
> > > > > > > > > > > unknownPartitionTopics +
> > > > > > > > > > > > > > > topics
> > > > > > > > > > > > > > > > > > > > > reach refresh threshold.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Right, this is similar to what I
> > > suggested,
> > > > > > with
> > > > > > > a
> > > > > > > > > > larger
> > > > > > > > > > > > > window
> > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > "staleness" that permits for batching
> > to
> > > an
> > > > > > > > > appropriate
> > > > > > > > > > > size
> > > > > > > > > > > > > > > (except
> > > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > > > > there's any unknown topics, you'd
> want
> > to
> > > > > issue
> > > > > > > the
> > > > > > > > > > > request
> > > > > > > > > > > > > > > > > > immediately).
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >     2. we don't invoke
> > > > > > > > KafkaProducer#waitOnMetadata
> > > > > > > > > > > when we
> > > > > > > > > > > > > > > call
> > > > > > > > > > > > > > > > > > > > > KafkaProducer#send because of we
> just
> > > > send
> > > > > > data
> > > > > > > > to
> > > > > > > > > > > > > > > > > RecordAccumulator,
> > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > before we send data to brokers we
> > will
> > > > > invoke
> > > > > > > > > > > > > > > > > > > RecordAccumulator#ready(),
> > > > > > > > > > > > > > > > > > > > so
> > > > > > > > > > > > > > > > > > > > > we can only invoke waitOnMetadata
> to
> > > > block
> > > > > > when
> > > > > > > > > > (number
> > > > > > > > > > > > > topics
> > > > > > > > > > > > > > > > > > > > > reach refresh threshold)>(number of
> > all
> > > > > known
> > > > > > > > > > > topics)*0.2.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > For new (uncached) topics, one
> problem
> > > here
> > > > > is
> > > > > > > that
> > > > > > > > > we
> > > > > > > > > > > don't
> > > > > > > > > > > > > know
> > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > partition to map a record to in the
> > event
> > > > > that
> > > > > > it
> > > > > > > > > has a
> > > > > > > > > > > key
> > > > > > > > > > > > > or
> > > > > > > > > > > > > > > custom
> > > > > > > > > > > > > > > > > > > > partitioner, so the RecordAccumulator
> > > > > wouldn't
> > > > > > > know
> > > > > > > > > > which
> > > > > > > > > > > > > > > > > batch/broker
> > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > belongs. We'd need an intermediate
> > record
> > > > > queue
> > > > > > > > that
> > > > > > > > > > > > > subsequently
> > > > > > > > > > > > > > > > > moved
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > records into RecordAccumulators once
> > > > metadata
> > > > > > > > > > resolution
> > > > > > > > > > > was
> > > > > > > > > > > > > > > > > complete.
> > > > > > > > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > > > > > known topics, we don't currently
> block
> > at
> > > > all
> > > > > > in
> > > > > > > > > > > > > waitOnMetadata.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > The last major point of minimizing
> > > producer
> > > > > > > startup
> > > > > > > > > > > metadata
> > > > > > > > > > > > > > > RPCs may
> > > > > > > > > > > > > > > > > > > still
> > > > > > > > > > > > > > > > > > > > need to be improved, but this would
> be
> > a
> > > > > large
> > > > > > > > > > > improvement
> > > > > > > > > > > > > on the
> > > > > > > > > > > > > > > > > > current
> > > > > > > > > > > > > > > > > > > > situation.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > Brian
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I think the above 2 ways are enough
> > to
> > > > > solve
> > > > > > > the
> > > > > > > > > > > current
> > > > > > > > > > > > > > > problem.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Tue, Nov 19, 2019 at 3:20 AM
> Colin
> > > > > McCabe
> > > > > > <
> > > > > > > > > > > > > > > cmcc...@apache.org>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Mon, Nov 18, 2019, at 10:05,
> > Brian
> > > > > Byrne
> > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > On Fri, Nov 15, 2019 at 5:08 PM
> > > Colin
> > > > > > > McCabe
> > > > > > > > <
> > > > > > > > > > > > > > > > > cmcc...@apache.org
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Two seconds doesn't seem
> like a
> > > > > > > reasonable
> > > > > > > > > > > amount of
> > > > > > > > > > > > > > > time to
> > > > > > > > > > > > > > > > > > > leave
> > > > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > metadata fetch.  Fetching
> > halfway
> > > > > > through
> > > > > > > > the
> > > > > > > > > > > > > expiration
> > > > > > > > > > > > > > > > > period
> > > > > > > > > > > > > > > > > > > > seems
> > > > > > > > > > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > > > > > > > > reasonable.  It also doesn't
> > > > require
> > > > > us
> > > > > > > to
> > > > > > > > > > > create a
> > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > > > > > > key,
> > > > > > > > > > > > > > > > > > > > > > > > which is nice.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Another option is to just do
> > the
> > > > > > metadata
> > > > > > > > > fetch
> > > > > > > > > > > every
> > > > > > > > > > > > > > > > > > > > > > metadata.max.age.ms,
> > > > > > > > > > > > > > > > > > > > > > > > but not expire the topic
> until
> > we
> > > > > can't
> > > > > > > > fetch
> > > > > > > > > > the
> > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > 2
> > > > > > > > > > > > > > > > > > > *
> > > > > > > > > > > > > > > > > > > > > > > > metadata.max.age.ms.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > I'd expect two seconds to be
> > > > reasonable
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > common
> > > > > > > > > > > > > case.
> > > > > > > > > > > > > > > > > Keep
> > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > mind
> > > > > > > > > > > > > > > > > > > > > > > that this doesn't affect
> > > correctness,
> > > > > > and a
> > > > > > > > > > control
> > > > > > > > > > > > > > > operation
> > > > > > > > > > > > > > > > > > > > returning
> > > > > > > > > > > > > > > > > > > > > > > cached metadata should be on
> the
> > > > order
> > > > > of
> > > > > > > > > > > milliseconds.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Hi Brian,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks again for the KIP.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I think the issue here is not the
> > > > common
> > > > > > > case,
> > > > > > > > > but
> > > > > > > > > > > the
> > > > > > > > > > > > > > > uncommon
> > > > > > > > > > > > > > > > > > case
> > > > > > > > > > > > > > > > > > > > > where
> > > > > > > > > > > > > > > > > > > > > > the metadata fetch takes longer
> > than
> > > > > > > expected.
> > > > > > > > > In
> > > > > > > > > > > that
> > > > > > > > > > > > > > > case, we
> > > > > > > > > > > > > > > > > > > don't
> > > > > > > > > > > > > > > > > > > > > want
> > > > > > > > > > > > > > > > > > > > > > to be in the position of having
> our
> > > > > > metadata
> > > > > > > > > expire
> > > > > > > > > > > > > because
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > waited
> > > > > > > > > > > > > > > > > > > > too
> > > > > > > > > > > > > > > > > > > > > > long to renew it.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > This is one reason why I think
> that
> > > the
> > > > > > > > metadata
> > > > > > > > > > > > > expiration
> > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > longer than the metadata refresh
> > > time.
> > > > > In
> > > > > > > > fact,
> > > > > > > > > it
> > > > > > > > > > > > > might be
> > > > > > > > > > > > > > > > > worth
> > > > > > > > > > > > > > > > > > > > having
> > > > > > > > > > > > > > > > > > > > > > two separate configuration keys
> for
> > > > these
> > > > > > two
> > > > > > > > > > > values.  I
> > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > imagine
> > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > user who is having trouble with
> > > > metadata
> > > > > > > > > expiration
> > > > > > > > > > > > > wanting
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > increase
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > metadata expiration time, but
> > without
> > > > > > > > increasing
> > > > > > > > > > the
> > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > refresh
> > > > > > > > > > > > > > > > > > > > > > period.  In a sense, the metadata
> > > > > > expiration
> > > > > > > > time
> > > > > > > > > > is
> > > > > > > > > > > like
> > > > > > > > > > > > > > > the ZK
> > > > > > > > > > > > > > > > > > > > session
> > > > > > > > > > > > > > > > > > > > > > expiration time.  You might want
> to
> > > > turn
> > > > > it
> > > > > > > up
> > > > > > > > if
> > > > > > > > > > the
> > > > > > > > > > > > > > > cluster is
> > > > > > > > > > > > > > > > > > > > > > experiencing load spikes.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > But to the general
> > > > > > > > > > > > > > > > > > > > > > > point, defining the algorithm
> > would
> > > > > mean
> > > > > > > > > > enforcing
> > > > > > > > > > > it
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > fair
> > > > > > > > > > > > > > > > > > > > accuracy,
> > > > > > > > > > > > > > > > > > > > > > > whereas if the suggestion is
> that
> > > > it'll
> > > > > > be
> > > > > > > > > > > performed
> > > > > > > > > > > > > at a
> > > > > > > > > > > > > > > > > > > reasonable
> > > > > > > > > > > > > > > > > > > > > > time,
> > > > > > > > > > > > > > > > > > > > > > > it allows for batching and
> other
> > > > > > > > optimizations.
> > > > > > > > > > > > > Perhaps I
> > > > > > > > > > > > > > > > > > shouldn't
> > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > regarding what's defined in a
> KIP
> > > to
> > > > be
> > > > > > > > > > > contractual in
> > > > > > > > > > > > > > > these
> > > > > > > > > > > > > > > > > > cases,
> > > > > > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > > > > > > could consider a first
> > > implementation
> > > > > to
> > > > > > > > > collect
> > > > > > > > > > > topics
> > > > > > > > > > > > > > > whose
> > > > > > > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > > > > > > exceeded (metadata.max.age.ms
> /
> > > 2),
> > > > > and
> > > > > > > > > sending
> > > > > > > > > > > the
> > > > > > > > > > > > > batch
> > > > > > > > > > > > > > > > > once a
> > > > > > > > > > > > > > > > > > > > > > > constituent topic's metadata is
> > > near
> > > > > the
> > > > > > > > > expiry,
> > > > > > > > > > > or a
> > > > > > > > > > > > > > > > > sufficient
> > > > > > > > > > > > > > > > > > > > number
> > > > > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > > topics have been collected (10?
> > > 100?
> > > > > > > 1000?).
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I'm concerned that if we change
> the
> > > > > > metadata
> > > > > > > > > > caching
> > > > > > > > > > > > > strategy
> > > > > > > > > > > > > > > > > > without
> > > > > > > > > > > > > > > > > > > > > > discussing it first, it may
> improve
> > > > > certain
> > > > > > > > > > > workloads but
> > > > > > > > > > > > > > > make
> > > > > > > > > > > > > > > > > > others
> > > > > > > > > > > > > > > > > > > > > > worse.  We need to be concrete
> > about
> > > > what
> > > > > > the
> > > > > > > > > > > proposed
> > > > > > > > > > > > > > > strategy
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > so
> > > > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > we can really evaluate it.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > We should be specific about
> > what
> > > > > > happens
> > > > > > > if
> > > > > > > > > the
> > > > > > > > > > > > > first few
> > > > > > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > > > > > fetches
> > > > > > > > > > > > > > > > > > > > > > > > fail.  Do we use exponential
> > > > backoff
> > > > > to
> > > > > > > > > decide
> > > > > > > > > > > when
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > resend?
> > > > > > > > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > > > > > > > seems
> > > > > > > > > > > > > > > > > > > > > > > > like we really should, for
> all
> > > the
> > > > > > usual
> > > > > > > > > > reasons
> > > > > > > > > > > > > (reduce
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > load
> > > > > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > > > > > > brokers, ride out temporary
> > > service
> > > > > > > > > > disruptions,
> > > > > > > > > > > > > etc.)
> > > > > > > > > > > > > > > Maybe
> > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > > > > > > an exponential retry backoff
> > for
> > > > each
> > > > > > > > broker
> > > > > > > > > > (in
> > > > > > > > > > > > > other
> > > > > > > > > > > > > > > words,
> > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > should try
> > > > > > > > > > > > > > > > > > > > > > > > to contact a different broker
> > > > before
> > > > > > > > applying
> > > > > > > > > > the
> > > > > > > > > > > > > > > backoff.)
> > > > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > > > already sort of happens with
> > the
> > > > > > > disconnect
> > > > > > > > > > > timeout,
> > > > > > > > > > > > > but
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > > more general solution.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > I don't plan to change this
> > > behavior.
> > > > > > > > Currently
> > > > > > > > > > it
> > > > > > > > > > > > > retries
> > > > > > > > > > > > > > > > > after
> > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > fixed
> > > > > > > > > > > > > > > > > > > > > > > value of 'retry.backoff.ms'
> > > > (defaults
> > > > > to
> > > > > > > 100
> > > > > > > > > > ms).
> > > > > > > > > > > It's
> > > > > > > > > > > > > > > > > possible
> > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > > different brokers are
> attempted,
> > > but
> > > > I
> > > > > > > > haven't
> > > > > > > > > > dug
> > > > > > > > > > > > > into it.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I think it's critical to
> understand
> > > > what
> > > > > > the
> > > > > > > > > > current
> > > > > > > > > > > > > > > behavior is
> > > > > > > > > > > > > > > > > > > before
> > > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > try to change it.  The difference
> > > > between
> > > > > > > > > retrying
> > > > > > > > > > > the
> > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > trying a different one has a
> large
> > > > impact
> > > > > > it
> > > > > > > > has
> > > > > > > > > on
> > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > load
> > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > latency.  For what it's worth, I
> > > > believe
> > > > > > the
> > > > > > > > > > > behavior is
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > second
> > > > > > > > > > > > > > > > > > > > one,
> > > > > > > > > > > > > > > > > > > > > > but it has been a while since I
> > > > checked.
> > > > > > > Let's
> > > > > > > > > > > figure
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > out.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thanks for the clarification.
> > > > Fully
> > > > > > > > > > > asynchronous is
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > go, I
> > > > > > > > > > > > > > > > > > > > > > > > agree.  I'm having trouble
> > > > > > understanding
> > > > > > > > how
> > > > > > > > > > > > > timeouts are
> > > > > > > > > > > > > > > > > > handled
> > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > KIP.  It seems like if we
> can't
> > > > fetch
> > > > > > the
> > > > > > > > > > > metadata
> > > > > > > > > > > > > > > within the
> > > > > > > > > > > > > > > > > > > > > > designated
> > > > > > > > > > > > > > > > > > > > > > > > metadata timeout, the future
> /
> > > > > callback
> > > > > > > > > should
> > > > > > > > > > > > > receive a
> > > > > > > > > > > > > > > > > > > > > > TimeoutException
> > > > > > > > > > > > > > > > > > > > > > > > right?  We do not want the
> send
> > > > call
> > > > > to
> > > > > > > be
> > > > > > > > > > > deferred
> > > > > > > > > > > > > > > forever
> > > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > > > > > > > can't be fetched.  Eventually
> > it
> > > > > should
> > > > > > > > fail
> > > > > > > > > if
> > > > > > > > > > > it
> > > > > > > > > > > > > can't
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > performed.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > I do think this is something
> > that
> > > > > will
> > > > > > > have
> > > > > > > > > to
> > > > > > > > > > be
> > > > > > > > > > > > > > > mentioned
> > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > compatibility section.  There
> > is
> > > > some
> > > > > > > code
> > > > > > > > > out
> > > > > > > > > > > there
> > > > > > > > > > > > > > > that is
> > > > > > > > > > > > > > > > > > > > probably
> > > > > > > > > > > > > > > > > > > > > > > > prepared to handle a timeout
> > > > > exception
> > > > > > > from
> > > > > > > > > the
> > > > > > > > > > > send
> > > > > > > > > > > > > > > > > function,
> > > > > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > > > need to be updated to check
> > for a
> > > > > > timeout
> > > > > > > > > from
> > > > > > > > > > > the
> > > > > > > > > > > > > > > future or
> > > > > > > > > > > > > > > > > > > > > callback.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Correct, a timeout exception
> > would
> > > be
> > > > > > > > delivered
> > > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > > > future.
> > > > > > > > > > > > > > > > > > > Sure,
> > > > > > > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > > > > > add that note to the KIP.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > It seems like this is an
> > existing
> > > > > > > problem.
> > > > > > > > > You
> > > > > > > > > > > may
> > > > > > > > > > > > > fire
> > > > > > > > > > > > > > > off
> > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > lot
> > > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > send
> > > > > > > > > > > > > > > > > > > > > > > > calls that get blocked
> because
> > > the
> > > > > > broker
> > > > > > > > > that
> > > > > > > > > > > is the
> > > > > > > > > > > > > > > leader
> > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > certain
> > > > > > > > > > > > > > > > > > > > > > > > partition is not responding.
> > I'm
> > > > not
> > > > > > > sure
> > > > > > > > > that
> > > > > > > > > > > we
> > > > > > > > > > > > > need
> > > > > > > > > > > > > > > to do
> > > > > > > > > > > > > > > > > > > > > anything
> > > > > > > > > > > > > > > > > > > > > > > > special here.  On the other
> > hand,
> > > > we
> > > > > > > could
> > > > > > > > > make
> > > > > > > > > > > the
> > > > > > > > > > > > > case
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > generic
> > > > > > > > > > > > > > > > > > > > > > "max
> > > > > > > > > > > > > > > > > > > > > > > > number of outstanding sends"
> > > > > > > configuration
> > > > > > > > to
> > > > > > > > > > > prevent
> > > > > > > > > > > > > > > > > surprise
> > > > > > > > > > > > > > > > > > > OOMs
> > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > existing cases, plus the new
> > one
> > > > > we're
> > > > > > > > > adding.
> > > > > > > > > > > But
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > feels
> > > > > > > > > > > > > > > > > > > > like a
> > > > > > > > > > > > > > > > > > > > > > bit
> > > > > > > > > > > > > > > > > > > > > > > > of a scope expansion.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Right, this is an existing
> > problem,
> > > > > > however
> > > > > > > > the
> > > > > > > > > > > > > > > asynchronous
> > > > > > > > > > > > > > > > > send
> > > > > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > > > > > cause unexpected behavior. For
> > > > example,
> > > > > > if
> > > > > > > a
> > > > > > > > > > client
> > > > > > > > > > > > > pinned
> > > > > > > > > > > > > > > > > > > > > > > topics/partitions to individual
> > > send
> > > > > > > threads,
> > > > > > > > > > then
> > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > > > couldn't
> > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > exhausted by a single topic
> > since a
> > > > > > > blocking
> > > > > > > > > send
> > > > > > > > > > > would
> > > > > > > > > > > > > > > prevent
> > > > > > > > > > > > > > > > > > > > further
> > > > > > > > > > > > > > > > > > > > > > > records from being buffered on
> > that
> > > > > > topic.
> > > > > > > > The
> > > > > > > > > > > > > compromise
> > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > > only ever permit one
> outstanding
> > > > record
> > > > > > > batch
> > > > > > > > > > for a
> > > > > > > > > > > > > topic,
> > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > keep
> > > > > > > > > > > > > > > > > > > > > > > the code simple and wouldn't
> > > permit a
> > > > > > > single
> > > > > > > > > > topic
> > > > > > > > > > > to
> > > > > > > > > > > > > > > consume
> > > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > > > > available
> > > > > > > > > > > > > > > > > > > > > > > memory.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > They may be connected, but
> I'm
> > > not
> > > > > sure
> > > > > > > > they
> > > > > > > > > > > should
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > same.
> > > > > > > > > > > > > > > > > > > > > > Perhaps
> > > > > > > > > > > > > > > > > > > > > > > > expiry should be 4x the
> typical
> > > > fetch
> > > > > > > rate,
> > > > > > > > > for
> > > > > > > > > > > > > example.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > That's true. You could also
> make
> > > the
> > > > > case
> > > > > > > > for a
> > > > > > > > > > > faster
> > > > > > > > > > > > > > > expiry
> > > > > > > > > > > > > > > > > > than
> > > > > > > > > > > > > > > > > > > > > > refresh
> > > > > > > > > > > > > > > > > > > > > > > as well. Makes sense to
> separate
> > > this
> > > > > > out.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Hmm.... are you sure this is
> an
> > > N^2
> > > > > > > > problem?
> > > > > > > > > > If
> > > > > > > > > > > you
> > > > > > > > > > > > > > > have T1
> > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > T2,
> > > > > > > > > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > > > > > > > fetch metadata for T1 and T2,
> > > > > right?  I
> > > > > > > > guess
> > > > > > > > > > you
> > > > > > > > > > > > > could
> > > > > > > > > > > > > > > argue
> > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > often
> > > > > > > > > > > > > > > > > > > > > > > > fetch metadata for partitions
> > we
> > > > > don't
> > > > > > > care
> > > > > > > > > > > about,
> > > > > > > > > > > > > but
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > doesn't
> > > > > > > > > > > > > > > > > > > > > > make it
> > > > > > > > > > > > > > > > > > > > > > > > O(N^2).  Maybe there's
> > something
> > > > > about
> > > > > > > the
> > > > > > > > > > > > > implementation
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > I'm
> > > > > > > > > > > > > > > > > > > > > > missing.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > My apologies, I left out the
> > > context.
> > > > > One
> > > > > > > > issue
> > > > > > > > > > the
> > > > > > > > > > > > > KIP is
> > > > > > > > > > > > > > > > > trying
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > resolve is the metadata storm
> > > that's
> > > > > > caused
> > > > > > > > by
> > > > > > > > > > > > > producers
> > > > > > > > > > > > > > > > > starting
> > > > > > > > > > > > > > > > > > > up.
> > > > > > > > > > > > > > > > > > > > > In
> > > > > > > > > > > > > > > > > > > > > > > the worst case, where the send
> > call
> > > > is
> > > > > > only
> > > > > > > > > > > performed
> > > > > > > > > > > > > from
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > single
> > > > > > > > > > > > > > > > > > > > > > thread
> > > > > > > > > > > > > > > > > > > > > > > (i.e. no possible batching),
> > > fetching
> > > > > > > > metadata
> > > > > > > > > > for
> > > > > > > > > > > 1K
> > > > > > > > > > > > > > > topics
> > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > > > generate
> > > > > > > > > > > > > > > > > > > > > > > 1K RPCs, with payload
> 1+2+...+1K
> > > > > topics'
> > > > > > > > > > metadata.
> > > > > > > > > > > > > Being
> > > > > > > > > > > > > > > smart
> > > > > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > topics being refreshed would
> > still
> > > > > > generate
> > > > > > > > 1K
> > > > > > > > > > RPCs
> > > > > > > > > > > > > for 1
> > > > > > > > > > > > > > > topic
> > > > > > > > > > > > > > > > > > > each,
> > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > asynchronous behavior would
> > permit
> > > > > > batching
> > > > > > > > > (note
> > > > > > > > > > > > > > > steady-state
> > > > > > > > > > > > > > > > > > > > > refreshing
> > > > > > > > > > > > > > > > > > > > > > > doesn't require the
> asynchronous
> > > > > behavior
> > > > > > > to
> > > > > > > > > > > batch).
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > In general, we need to take
> > > > advantage
> > > > > > of
> > > > > > > > > > > batching to
> > > > > > > > > > > > > do
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > well
> > > > > > > > > > > > > > > > > > > > > (one
> > > > > > > > > > > > > > > > > > > > > > > > reason why I think we should
> > > steer
> > > > > > clear
> > > > > > > of
> > > > > > > > > > > > > > > ultra-granular
> > > > > > > > > > > > > > > > > > > timeout
> > > > > > > > > > > > > > > > > > > > > > > > tracking).  It's best to do 1
> > RPC
> > > > > > asking
> > > > > > > > for
> > > > > > > > > 10
> > > > > > > > > > > > > topics,
> > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > 10
> > > > > > > > > > > > > > > > > > > RPCs
> > > > > > > > > > > > > > > > > > > > > > asking
> > > > > > > > > > > > > > > > > > > > > > > > for a single topic each, even
> > if
> > > > that
> > > > > > > means
> > > > > > > > > > some
> > > > > > > > > > > of
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > topic
> > > > > > > > > > > > > > > > > > > > > timeouts
> > > > > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > > > not *exactly* aligned with
> the
> > > > > > configured
> > > > > > > > > > value.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Absolutely, agreed.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > Brian
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > > Brian
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Nov 11, 2019 at
> 11:47
> > > AM
> > > > > > Colin
> > > > > > > > > > McCabe <
> > > > > > > > > > > > > > > > > > > > cmcc...@apache.org>
> > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Hi Brian,
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the KIP.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Starting the metadata
> fetch
> > > > > before
> > > > > > we
> > > > > > > > > need
> > > > > > > > > > > the
> > > > > > > > > > > > > > > result is
> > > > > > > > > > > > > > > > > > > > > > definitely a
> > > > > > > > > > > > > > > > > > > > > > > > > > great idea.  This way,
> the
> > > > > metadata
> > > > > > > > fetch
> > > > > > > > > > > can be
> > > > > > > > > > > > > > > done in
> > > > > > > > > > > > > > > > > > > > parallel
> > > > > > > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > > > > > > > > the other stuff e
> producer
> > is
> > > > > > doing,
> > > > > > > > > rather
> > > > > > > > > > > than
> > > > > > > > > > > > > > > forcing
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > producer
> > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > periodically come to a
> halt
> > > > > > > > periodically
> > > > > > > > > > > while
> > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > fetched.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Maybe I missed it, but
> > there
> > > > > seemed
> > > > > > > to
> > > > > > > > be
> > > > > > > > > > > some
> > > > > > > > > > > > > > > details
> > > > > > > > > > > > > > > > > > > missing
> > > > > > > > > > > > > > > > > > > > > > here.
> > > > > > > > > > > > > > > > > > > > > > > > When
> > > > > > > > > > > > > > > > > > > > > > > > > > do we start the metadata
> > > fetch?
> > > > > > For
> > > > > > > > > > > example, if
> > > > > > > > > > > > > > > topic
> > > > > > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > > > > > expires
> > > > > > > > > > > > > > > > > > > > > > > > > > every 5 minutes, perhaps
> we
> > > > > should
> > > > > > > > wait 4
> > > > > > > > > > > > > minutes,
> > > > > > > > > > > > > > > then
> > > > > > > > > > > > > > > > > > > > starting
> > > > > > > > > > > > > > > > > > > > > > > > fetching
> > > > > > > > > > > > > > > > > > > > > > > > > > the new metadata, which
> we
> > > > would
> > > > > > > expect
> > > > > > > > > to
> > > > > > > > > > > > > arrive by
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > 5
> > > > > > > > > > > > > > > > > > > > minute
> > > > > > > > > > > > > > > > > > > > > > > > > > deadline.  Or perhaps we
> > > should
> > > > > > start
> > > > > > > > the
> > > > > > > > > > > fetch
> > > > > > > > > > > > > even
> > > > > > > > > > > > > > > > > > earlier,
> > > > > > > > > > > > > > > > > > > > > > around
> > > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > 2.5 minute mark.  In any
> > > case,
> > > > > > there
> > > > > > > > > should
> > > > > > > > > > > be
> > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > > discussion
> > > > > > > > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > > > > > > > > > the actual policy is.
> > Given
> > > > that
> > > > > > > > > > > > > > > metadata.max.age.ms is
> > > > > > > > > > > > > > > > > > > > > > configurable,
> > > > > > > > > > > > > > > > > > > > > > > > > > maybe that policy  needs
> to
> > > be
> > > > > > > > expressed
> > > > > > > > > in
> > > > > > > > > > > > > terms of
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > percentage
> > > > > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > refresh period rather
> than
> > in
> > > > > terms
> > > > > > > of
> > > > > > > > an
> > > > > > > > > > > > > absolute
> > > > > > > > > > > > > > > delay.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > The KIP correctly points
> > out
> > > > that
> > > > > > the
> > > > > > > > > > current
> > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > > fetching
> > > > > > > > > > > > > > > > > > > > > > policy
> > > > > > > > > > > > > > > > > > > > > > > > > > causes us to "[block] in
> a
> > > > > function
> > > > > > > > > that's
> > > > > > > > > > > > > > > advertised as
> > > > > > > > > > > > > > > > > > > > > > asynchronous."
> > > > > > > > > > > > > > > > > > > > > > > > > > However, the KIP doesn't
> > seem
> > > > to
> > > > > > > spell
> > > > > > > > > out
> > > > > > > > > > > > > whether we
> > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > > > continue to
> > > > > > > > > > > > > > > > > > > > > > > > > > block if metadata can't
> be
> > > > found,
> > > > > > or
> > > > > > > if
> > > > > > > > > > this
> > > > > > > > > > > > > will be
> > > > > > > > > > > > > > > > > > > abolished.
> > > > > > > > > > > > > > > > > > > > > > > > Clearly,
> > > > > > > > > > > > > > > > > > > > > > > > > > starting the metadata
> fetch
> > > > early
> > > > > > > will
> > > > > > > > > > reduce
> > > > > > > > > > > > > > > blocking in
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > common
> > > > > > > > > > > > > > > > > > > > > > > > case,
> > > > > > > > > > > > > > > > > > > > > > > > > > but will there still be
> > > > blocking
> > > > > in
> > > > > > > the
> > > > > > > > > > > uncommon
> > > > > > > > > > > > > case
> > > > > > > > > > > > > > > > > where
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > early
> > > > > > > > > > > > > > > > > > > > > > > > fetch
> > > > > > > > > > > > > > > > > > > > > > > > > > doesn't succeed in time?
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >  > To address (2), the
> > > producer
> > > > > > > > currently
> > > > > > > > > > > > > maintains
> > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > expiry
> > > > > > > > > > > > > > > > > > > > > > threshold
> > > > > > > > > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > > > > > >  > every topic, which is
> > used
> > > > to
> > > > > > > > remove a
> > > > > > > > > > > topic
> > > > > > > > > > > > > from
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > working
> > > > > > > > > > > > > > > > > > > > > > set
> > > > > > > > > > > > > > > > > > > > > > > > at a
> > > > > > > > > > > > > > > > > > > > > > > > > >  > future time (currently
> > > > > > hard-coded
> > > > > > > > to 5
> > > > > > > > > > > > > minutes,
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > > modified
> > > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > >  > use
> metadata.max.age.ms
> > ).
> > > > > While
> > > > > > > > this
> > > > > > > > > > does
> > > > > > > > > > > > > work to
> > > > > > > > > > > > > > > > > > reduce
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > size
> > > > > > > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > >  > topic working set, the
> > > > > producer
> > > > > > > will
> > > > > > > > > > > continue
> > > > > > > > > > > > > > > fetching
> > > > > > > > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > > > > > > these
> > > > > > > > > > > > > > > > > > > > > > > > > >  > topics in every
> metadata
> > > > > request
> > > > > > > for
> > > > > > > > > the
> > > > > > > > > > > full
> > > > > > > > > > > > > > > expiry
> > > > > > > > > > > > > > > > > > > > duration.
> > > > > > > > > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > > > > > > > > > > logic
> > > > > > > > > > > > > > > > > > > > > > > > > >  > can be made more
> > > intelligent
> > > > > by
> > > > > > > > > managing
> > > > > > > > > > > the
> > > > > > > > > > > > > > > expiry
> > > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > topic
> > > > > > > > > > > > > > > > > > > > > > > > > >  > was last used,
> enabling
> > > the
> > > > > > expiry
> > > > > > > > > > > duration
> > > > > > > > > > > > > to be
> > > > > > > > > > > > > > > > > > reduced
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > > > > > > > > > > > >  > where a large number
> of
> > > > topics
> > > > > > are
> > > > > > > > > > touched
> > > > > > > > > > > > > > > > > > intermittently.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Can you clarify this
> part a
> > > > bit?
> > > > > > It
> > > > > > > > > seems
> > > > > > > > > > > like
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > have a
> > > > > > > > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > > > > > > > > > expiration policy now for
> > > > topics,
> > > > > > and
> > > > > > > > we
> > > > > > > > > > will
> > > > > > > > > > > > > have
> > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > > after
> > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > KIP,
> > > > > > > > > > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > > > > > > > > > > they will be somewhat
> > > > different.
> > > > > > But
> > > > > > > > > it's
> > > > > > > > > > > not
> > > > > > > > > > > > > clear
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > me
> > > > > > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > differences are.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > In general, if load is a
> > > > problem,
> > > > > > we
> > > > > > > > > should
> > > > > > > > > > > > > probably
> > > > > > > > > > > > > > > > > > consider
> > > > > > > > > > > > > > > > > > > > > > adding
> > > > > > > > > > > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > > > > > > > > > kind of jitter on the
> > client
> > > > > side.
> > > > > > > > There
> > > > > > > > > > are
> > > > > > > > > > > > > > > definitely
> > > > > > > > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > > > > > > > > where
> > > > > > > > > > > > > > > > > > > > > > > > people
> > > > > > > > > > > > > > > > > > > > > > > > > > start up a lot of clients
> > at
> > > > the
> > > > > > same
> > > > > > > > > time
> > > > > > > > > > in
> > > > > > > > > > > > > > > parallel
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > there
> > > > > > > > > > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > > > > > > > > thundering herd problem
> > with
> > > > > > metadata
> > > > > > > > > > > updates.
> > > > > > > > > > > > > > > Adding
> > > > > > > > > > > > > > > > > > jitter
> > > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > > > spread
> > > > > > > > > > > > > > > > > > > > > > > > > > the load across time
> rather
> > > > than
> > > > > > > > > creating a
> > > > > > > > > > > spike
> > > > > > > > > > > > > > > every 5
> > > > > > > > > > > > > > > > > > > > minutes
> > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > > > > > case.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > best,
> > > > > > > > > > > > > > > > > > > > > > > > > > Colin
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Nov 8, 2019, at
> > > 08:59,
> > > > > > Ismael
> > > > > > > > > Juma
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > I think this KIP
> affects
> > > when
> > > > > we
> > > > > > > > block
> > > > > > > > > > > which is
> > > > > > > > > > > > > > > > > actually
> > > > > > > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > > > > > > > > visible
> > > > > > > > > > > > > > > > > > > > > > > > > > > behavior. Right?
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Ismael
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Nov 8, 2019,
> 8:50
> > > AM
> > > > > > Brian
> > > > > > > > > Byrne
> > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > bby...@confluent.io>
> > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Guozhang,
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Regarding metadata
> > > expiry,
> > > > no
> > > > > > > > access
> > > > > > > > > > > times
> > > > > > > > > > > > > other
> > > > > > > > > > > > > > > than
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > initial
> > > > > > > > > > > > > > > > > > > > > > > > > > lookup[1]
> > > > > > > > > > > > > > > > > > > > > > > > > > > > are used for
> > determining
> > > > when
> > > > > > to
> > > > > > > > > expire
> > > > > > > > > > > > > producer
> > > > > > > > > > > > > > > > > > > metadata.
> > > > > > > > > > > > > > > > > > > > > > > > Therefore,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > frequently used
> topics'
> > > > > > metadata
> > > > > > > > will
> > > > > > > > > > be
> > > > > > > > > > > aged
> > > > > > > > > > > > > > > out and
> > > > > > > > > > > > > > > > > > > > > > subsequently
> > > > > > > > > > > > > > > > > > > > > > > > > > > > refreshed (in a
> > blocking
> > > > > > manner)
> > > > > > > > > every
> > > > > > > > > > > five
> > > > > > > > > > > > > > > minutes,
> > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > infrequently
> > > > > > > > > > > > > > > > > > > > > > > > > > used
> > > > > > > > > > > > > > > > > > > > > > > > > > > > topics will be
> retained
> > > > for a
> > > > > > > > minimum
> > > > > > > > > > of
> > > > > > > > > > > five
> > > > > > > > > > > > > > > minutes
> > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > currently
> > > > > > > > > > > > > > > > > > > > > > > > > > > > refetched on every
> > > metadata
> > > > > > > update
> > > > > > > > > > during
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > > period.
> > > > > > > > > > > > > > > > > > > > > The
> > > > > > > > > > > > > > > > > > > > > > > > > > sentence is
> > > > > > > > > > > > > > > > > > > > > > > > > > > > suggesting that we
> > could
> > > > > reduce
> > > > > > > the
> > > > > > > > > > > expiry
> > > > > > > > > > > > > time
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > latter
> > > > > > > > > > > > > > > > > > > > > > > > > > > > without affecting
> > (rather
> > > > > > > slightly
> > > > > > > > > > > > > improving) the
> > > > > > > > > > > > > > > > > > former.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Keep in mind that in
> > most
> > > > all
> > > > > > > > cases,
> > > > > > > > > I
> > > > > > > > > > > > > wouldn't
> > > > > > > > > > > > > > > > > > > anticipate
> > > > > > > > > > > > > > > > > > > > > > much of
> > > > > > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > > > > > > difference with
> > producer
> > > > > > > behavior,
> > > > > > > > > and
> > > > > > > > > > > the
> > > > > > > > > > > > > extra
> > > > > > > > > > > > > > > > > logic
> > > > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > > > > implemented
> > > > > > > > > > > > > > > > > > > > > > > > > > > > to have insignificant
> > > cost.
> > > > > > It's
> > > > > > > > the
> > > > > > > > > > > > > > > large/dynamic
> > > > > > > > > > > > > > > > > > topic
> > > > > > > > > > > > > > > > > > > > > corner
> > > > > > > > > > > > > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > > > > > > > we're trying to
> > improve.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > It'd be convenient if
> > the
> > > > KIP
> > > > > > is
> > > > > > > no
> > > > > > > > > > > longer
> > > > > > > > > > > > > > > necessary.
> > > > > > > > > > > > > > > > > > > > You're
> > > > > > > > > > > > > > > > > > > > > > right
> > > > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > > > > > > > there's no public API
> > > > changes
> > > > > > and
> > > > > > > > the
> > > > > > > > > > > > > behavioral
> > > > > > > > > > > > > > > > > > changes
> > > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > > > entirely
> > > > > > > > > > > > > > > > > > > > > > > > > > > > internal. I'd be
> happy
> > to
> > > > > > > continue
> > > > > > > > > the
> > > > > > > > > > > > > discussion
> > > > > > > > > > > > > > > > > > around
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > KIP,
> > > > > > > > > > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > > > > > > > > > > > > unless otherwise
> > > objected,
> > > > it
> > > > > > can
> > > > > > > > be
> > > > > > > > > > > retired.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] Not entirely
> > > accurate,
> > > > > it's
> > > > > > > > > > actually
> > > > > > > > > > > the
> > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > client
> > > > > > > > > > > > > > > > > > > > > > > > > > > > calculates whether to
> > > > retain
> > > > > > the
> > > > > > > > > topic
> > > > > > > > > > > in its
> > > > > > > > > > > > > > > > > metadata.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Brian
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Nov 7, 2019
> at
> > > 4:48
> > > > > PM
> > > > > > > > > Guozhang
> > > > > > > > > > > Wang
> > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > > wangg...@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hello Brian,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Could you
> elaborate a
> > > bit
> > > > > > more
> > > > > > > on
> > > > > > > > > > this
> > > > > > > > > > > > > > > sentence:
> > > > > > > > > > > > > > > > > > "This
> > > > > > > > > > > > > > > > > > > > > logic
> > > > > > > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > > > > made
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > more intelligent by
> > > > > managing
> > > > > > > the
> > > > > > > > > > expiry
> > > > > > > > > > > > > from
> > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > topic
> > > > > > > > > > > > > > > > > > > > > > was
> > > > > > > > > > > > > > > > > > > > > > > > last
> > > > > > > > > > > > > > > > > > > > > > > > > > > > used,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > enabling the expiry
> > > > > duration
> > > > > > to
> > > > > > > > be
> > > > > > > > > > > reduced
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > > > > > > > > > > where a
> > > > > > > > > > > > > > > > > > > > > > > > > > large
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > number of topics
> are
> > > > > touched
> > > > > > > > > > > > > intermittently."
> > > > > > > > > > > > > > > Not
> > > > > > > > > > > > > > > > > > sure
> > > > > > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > > > > > > fully
> > > > > > > > > > > > > > > > > > > > > > > > > > understand
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > the proposal.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Also since now this
> > KIP
> > > > did
> > > > > > not
> > > > > > > > > make
> > > > > > > > > > > any
> > > > > > > > > > > > > > > public API
> > > > > > > > > > > > > > > > > > > > changes
> > > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > behavioral changes
> > are
> > > > not
> > > > > > > > > > considered a
> > > > > > > > > > > > > public
> > > > > > > > > > > > > > > API
> > > > > > > > > > > > > > > > > > > > contract
> > > > > > > > > > > > > > > > > > > > > > (i.e.
> > > > > > > > > > > > > > > > > > > > > > > > > > how we
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > maintain the topic
> > > > metadata
> > > > > > in
> > > > > > > > > > producer
> > > > > > > > > > > > > cache
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > never
> > > > > > > > > > > > > > > > > > > > > > committed
> > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > users),
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > I wonder if we
> still
> > > > need a
> > > > > > KIP
> > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > > > > > > proposed
> > > > > > > > > > > > > > > > > > change
> > > > > > > > > > > > > > > > > > > > any
> > > > > > > > > > > > > > > > > > > > > > more?
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Guozhang
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Nov 7, 2019
> > at
> > > > > 12:43
> > > > > > PM
> > > > > > > > > Brian
> > > > > > > > > > > > > Byrne <
> > > > > > > > > > > > > > > > > > > > > > bby...@confluent.io
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hello all,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'd like to
> > propose a
> > > > > vote
> > > > > > > for
> > > > > > > > a
> > > > > > > > > > > producer
> > > > > > > > > > > > > > > change
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > > > > > > > > > producer
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > behavior when
> > dealing
> > > > > with
> > > > > > a
> > > > > > > > > large
> > > > > > > > > > > > > number of
> > > > > > > > > > > > > > > > > > topics,
> > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > part by
> > > > > > > > > > > > > > > > > > > > > > > > > > > > reducing
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > the amount of
> > > metadata
> > > > > > > fetching
> > > > > > > > > > > > > performed.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The full KIP is
> > > > provided
> > > > > > > here:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > And the
> discussion
> > > > > thread:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/b2f8f830ef04587144cf0840c7d4811bbf0a14f3c459723dbc5acf9e@%3Cdev.kafka.apache.org%3E
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Brian
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > -- Guozhang
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Guozhang
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Reply via email to