Hi Ivan,

I believe the logic you've linked is only applicable for the producer and
consumer clients; the admin client does something different (see [1]).

Either way, it'd be nice to have a definition of when re-bootstrapping
would occur that doesn't rely on internal implementation details. What
user-visible phenomena can we identify that would lead to a
re-bootstrapping? I also believe that if someone has "
reconnect.backoff.max.ms" set to a low-enough value,
NetworkClient::leastLoadedNode may never return null. In that case,
shouldn't we still attempt a re-bootstrap at some point (if the user has
enabled this feature)? Would it make sense to re-bootstrap only after "
metadata.max.age.ms" has elapsed since the last metadata update, and when
at least one request has been made to contact each known server and been
met with failure?

[1] -
https://github.com/apache/kafka/blob/c9a42c85e2c903329b3550181d230527e90e3646/clients/src/main/java/org/apache/kafka/clients/admin/internals/AdminMetadataManager.java#L100

Cheers,

Chris

On Sun, Feb 19, 2023 at 3:39 PM Ivan Yurchenko <ivan0yurche...@gmail.com>
wrote:

> Hi Chris,
>
> Thank you for your question. As a part of various lifecycle phases
> (including node disconnect), NetworkClient can request metadata update
> eagerly (the `Metadata.requestUpdate` method), which results in
> `MetadataUpdater.maybeUpdate` being called during next poll. Inside, it has
> a way to find a known node it can connect to for the fresh metadata. If no
> such node is found, it backs off. (Code [1]). I'm thinking of piggybacking
> on this logic and injecting the rebootstrap attempt before the backoff.
>
> As of the second part of you question: the re-bootstrapping means replacing
> the node addresses in the client with the original bootstrap addresses, so
> if the first bootstrap attempt fails, the client will continue using the
> bootstrap addresses until success -- pretty much as if it were recreated
> from scratch.
>
> Best,
> Ivan
>
> [1]
>
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L1045-L1049
>
> On Thu, 16 Feb 2023 at 17:18, Chris Egerton <chr...@aiven.io.invalid>
> wrote:
>
> > Hi Ivan,
> >
> > I'm not very familiar with the clients side of things but the proposal
> > seems reasonable.
> >
> > I like the flexibility of the "metadata.recovery.strategy" property as a
> > string instead of, e.g., a "rebootstrap.enabled" boolean. We may want to
> > adapt a different approach in the future, like the background thread
> > mentioned in the rejected alternatives section.
> >
> > I also like handling this via configuration property instead of adding a
> > Java-level API or suggesting that users re-instantiate their clients
> since
> > we may want to enable this new behavior by default in the future, and it
> > also reduces the level of effort required for users to benefit from this
> > improvement.
> >
> > One question I have--that may have an obvious answer to anyone more
> > familiar with client internals--is under which conditions we will
> determine
> > a rebootstrap is appropriate. Taking the admin client as an example, the
> "
> > default.api.timeout.ms" property gives us a limit on the time an
> operation
> > will be allowed to take before it completes or fails (with optional
> > per-request overrides in the various *Options classes), and the "
> > request.timeout.ms" property gives us a limit on the time each request
> > issued for that operation will be allowed to take before it completes, is
> > retried, or causes the operation to fail (if no more retries can be
> > performed). If all of the known servers (i.e., bootstrap servers for the
> > first operation, or discovered brokers if bootstrapping has already been
> > completed) are unavailable, the admin client will keep (re)trying to
> fetch
> > metadata until the API timeout is exhausted, issuing multiple requests to
> > the same server if necessary. When would a re-bootstrapping occur here?
> > Ideally we could find some approach that minimizes false positives
> (where a
> > re-bootstrapping is performed even though the current set of known
> brokers
> > is only temporarily unavailable, as opposed to permanently moved). Of
> > course, given the opt-in nature of the re-bootstrapping feature, we can
> > always shoot for "good enough" on that front, but, it'd be nice to
> > understand some of the potential pitfalls of enabling it.
> >
> > Following up on the above, would we cache the need to perform a
> > re-bootstrap across separate operations? For example, if I try to
> describe
> > a cluster, then a re-bootstrapping takes place and fails, and then I try
> to
> > describe the cluster a second time. With that second attempt, would we
> > immediately resort to the bootstrap servers for any initial metadata
> > updates, or would we still try to go through the last-known set of
> brokers
> > first?
> >
> > Cheers,
> >
> > Chris
> >
> > On Mon, Feb 6, 2023 at 4:32 AM Ivan Yurchenko <ivan0yurche...@gmail.com>
> > wrote:
> >
> > > Hi!
> > >
> > > There seems to be not much more discussion going, so I'm planning to
> > start
> > > the vote in a couple of days.
> > >
> > > Thanks,
> > >
> > > Ivan
> > >
> > > On Wed, 18 Jan 2023 at 12:06, Ivan Yurchenko <ivan0yurche...@gmail.com
> >
> > > wrote:
> > >
> > > > Hello!
> > > > I would like to start the discussion thread on KIP-899: Allow clients
> > to
> > > > rebootstrap.
> > > > This KIP proposes to allow Kafka clients to repeat the bootstrap
> > process
> > > > when fetching metadata if none of the known nodes are available.
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-899%3A+Allow+clients+to+rebootstrap
> > > >
> > > > A question right away: should we eventually change the default
> behavior
> > > or
> > > > it can remain configurable "forever"? The latter is proposed in the
> > KIP.
> > > >
> > > > Thank you!
> > > >
> > > > Ivan
> > > >
> > > >
> > > >
> > >
> >
>

Reply via email to