Re: [DISCUSS] KIP-411: Add option to make Kafka Connect task client ID values unique

Ryanne Dolan Wed, 20 Feb 2019 13:18:33 -0800

Thanks Paul, this is great. This will make monitoring Connect a ton easier.


Ryanne

On Wed, Feb 20, 2019 at 1:24 PM Paul Davidson
<[email protected]> wrote:

> I have updated KIP-411 to propose changing the default client id - see:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-411%3A+Make+default+Kafka+Connect+worker+task+client+IDs+distinct
>
>
> There is also an PR ready to go here:
> https://github.com/apache/kafka/pull/6097
>
> On Fri, Jan 11, 2019 at 3:39 PM Paul Davidson <[email protected]>
> wrote:
>
> > Hi everyone.  We seem to have agreement that the ideal approach is to
> > alter the default client ids. Now I'm wondering about the best process to
> > proceed. Will the change in default behaviour require a new KIP, given it
> > will affect existing deployments?  Would I be best to repurpose this
> > KIP-411, or am I best to  create a new KIP? Thanks!
> >
> > Paul
> >
> > On Tue, Jan 8, 2019 at 7:16 PM Randall Hauch <[email protected]> wrote:
> >
> >> Hi, Paul.
> >>
> >> I concur with the others, and I like the new approach that avoids a new
> >> configuration, especially because it does not change the behavior for
> >> anyone already using `producer.client.id` and/or `consumer.client.id`.
> I
> >> did leave a few comments on the PR. Perhaps the biggest one is whether
> the
> >> producer used for the sink task error reporter (for dead letter queue)
> >> should be `connector-producer-<sink-task-id>`, and whether that is
> >> distinct
> >> enough from source tasks, which will be of the form
> >> `connector-producer-<source-task-id>`. Maybe it is fine. (The other
> >> comments were minor.)
> >>
> >> Best regards,
> >>
> >> Randall
> >>
> >> On Mon, Jan 7, 2019 at 1:19 PM Paul Davidson <[email protected]>
> >> wrote:
> >>
> >> > Thanks all. I've submitted a new PR with a possible implementation:
> >> > https://github.com/apache/kafka/pull/6097. Note I did not include the
> >> > group
> >> > ID as part of the default client ID, mainly to avoid the connector
> name
> >> > appearing twice by default. As noted in the original Jira (
> >> > https://issues.apache.org/jira/browse/KAFKA-5061), leaving out the
> >> group
> >> > ID
> >> > could lead to naming conflicts if multiple clusters run the same Kafka
> >> > cluster. This would probably not be a problem for many (including us)
> as
> >> > metrics exporters can usually be configured to include a cluster ID
> and
> >> > guarantee uniqueness. Will be interested to hear your thoughts on
> this.
> >> >
> >> > Paul
> >> >
> >> >
> >> >
> >> > On Mon, Jan 7, 2019 at 10:27 AM Ryanne Dolan <[email protected]>
> >> > wrote:
> >> >
> >> > > I'd also prefer to avoid the new configuration property if possible.
> >> > Seems
> >> > > like a lighter touch without it.
> >> > >
> >> > > Ryanne
> >> > >
> >> > > On Sun, Jan 6, 2019 at 7:25 PM Paul Davidson <
> >> [email protected]>
> >> > > wrote:
> >> > >
> >> > > > Hi Konstantine,
> >> > > >
> >> > > > Thanks for your feedback!  I think my reply to Ewen covers most of
> >> your
> >> > > > points, and I mostly agree.  If there is general agreement that
> >> > changing
> >> > > > the default behavior is preferable to a config change I will
> update
> >> my
> >> > PR
> >> > > > to use  that approach.
> >> > > >
> >> > > > Paul
> >> > > >
> >> > > > On Fri, Jan 4, 2019 at 5:55 PM Konstantine Karantasis <
> >> > > > [email protected]> wrote:
> >> > > >
> >> > > > > Hi Paul.
> >> > > > >
> >> > > > > I second Ewen and I intended to give similar feedback:
> >> > > > >
> >> > > > > 1) Can we avoid a config altogether?
> >> > > > > 2) If we prefer to add a config anyways, can we use a set of
> >> allowed
> >> > > > values
> >> > > > > instead of a boolean, even if initially these values are only
> >> two? As
> >> > > the
> >> > > > > discussion on Jira highlights, there is a potential for more
> >> naming
> >> > > > > conventions in the future, even if now the extra functionality
> >> > doesn't
> >> > > > seem
> >> > > > > essential. It's not optimal to have to deprecate a config
> instead
> >> of
> >> > > just
> >> > > > > extending its set of values.
> >> > > > > 3) I agree, the config name sounds too general. How about
> >> > > > > "client.ids.naming.policy" or "client.ids.naming" if you want
> two
> >> > more
> >> > > > > options?
> >> > > > >
> >> > > > > Konstantine
> >> > > > >
> >> > > > > On Fri, Jan 4, 2019 at 7:38 AM Ewen Cheslack-Postava <
> >> > > [email protected]>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hi Paul,
> >> > > > > >
> >> > > > > > Thanks for the KIP. A few comments.
> >> > > > > >
> >> > > > > > To me, biggest question here is if we can fix this behavior
> >> without
> >> > > > > adding
> >> > > > > > a config. In particular, today, we don't even set the
> client.id
> >> > for
> >> > > > the
> >> > > > > > producer and consumer at all, right? The *only* way it is set
> >> is if
> >> > > you
> >> > > > > > include an override in the worker config, but in that case you
> >> need
> >> > > to
> >> > > > be
> >> > > > > > explicitly opting in with a `producer.` or `consumer.` prefix,
> >> i.e.
> >> > > the
> >> > > > > > settings are `producer.client.id` and `consumer.client.id`.
> >> > > > Otherwise, I
> >> > > > > > think we're getting the default behavior where we generate
> >> unique,
> >> > > > > > per-process IDs, i.e. via this logic
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L662-L664
> >> > > > > >
> >> > > > > > If that's the case, would it maybe be possible to compatibly
> >> change
> >> > > the
> >> > > > > > default to use task IDs in the client ID, but only if we don't
> >> see
> >> > an
> >> > > > > > existing override from the worker config? This would only
> change
> >> > the
> >> > > > > > behavior when someone is using the default, but since the
> >> default
> >> > > would
> >> > > > > > just use what is effectively a random ID that is useless for
> >> > > monitoring
> >> > > > > > metrics, presumably this wouldn't affect any existing users. I
> >> > think
> >> > > > that
> >> > > > > > would avoid having to introduce the config, give better out of
> >> the
> >> > > box
> >> > > > > > behavior, and still be a safe, compatible change to make.
> >> > > > > >
> >> > > > > >
> >> > > > > > Other than that, just two minor comments. On the config
> naming,
> >> not
> >> > > > sure
> >> > > > > > about a better name, but I think the config name could be a
> bit
> >> > > clearer
> >> > > > > if
> >> > > > > > we need to have it. Maybe something including "task" like
> >> > > > > > "task.based.client.ids" or something like that (or change the
> >> type
> >> > to
> >> > > > be
> >> > > > > an
> >> > > > > > enum and make it something like task.client.ids=[default|task]
> >> and
> >> > > > leave
> >> > > > > it
> >> > > > > > open for extension in the future if needed).
> >> > > > > >
> >> > > > > > Finally, you have this:
> >> > > > > >
> >> > > > > > *"Allow overriding client.id <http://client.id/> on a
> >> > per-connector
> >> > > > > > basis"*
> >> > > > > > >
> >> > > > > > > This is a much more complex change, and would require
> >> individual
> >> > > > > > > connectors to be updated to support the change. In contrast,
> >> the
> >> > > > > proposed
> >> > > > > > > approach would immediately allow detailed consumer/producer
> >> > > > monitoring
> >> > > > > > for
> >> > > > > > > all existing connectors.
> >> > > > > > >
> >> > > > > >
> >> > > > > > I don't think this is quite accurate. I think the reason to
> >> reject
> >> > is
> >> > > > > that
> >> > > > > > for your particular requirement for metrics, it simply doesn't
> >> give
> >> > > > > enough
> >> > > > > > granularity (there's only one value per entire connector), but
> >> it
> >> > > > doesn't
> >> > > > > > require any changes to connectors. The framework allocates all
> >> of
> >> > > these
> >> > > > > and
> >> > > > > > there are already framework-defined config values that all
> >> > connectors
> >> > > > > share
> >> > > > > > (some for only sinks or sources), so the framework can handle
> >> all
> >> > of
> >> > > > this
> >> > > > > > without changes to connectors. Further, with
> connector-specific
> >> > > > > overrides,
> >> > > > > > you could get task-specific values if interpolation were
> >> supported
> >> > in
> >> > > > the
> >> > > > > > config value (as we now do with managed secrets). For example,
> >> it
> >> > > could
> >> > > > > > support something like client.id=connector-${taskId} and the
> >> task
> >> > ID
> >> > > > > would
> >> > > > > > be substituted automatically into the string.
> >> > > > > >
> >> > > > > > I don't necessarily like that solution (seems complicated and
> >> not a
> >> > > > great
> >> > > > > > user experience), but it could work.
> >> > > > > >
> >> > > > > > -Ewen
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > On Thu, Dec 20, 2018 at 5:05 PM Paul Davidson <
> >> > > > [email protected]>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Hi everyone,
> >> > > > > > >
> >> > > > > > > I would like to start a discussion around the following KIP:
> >> > > > > > > *
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-411%3A+Add+option+to+make+Kafka+Connect+task+client+ID+values+unique
> >> > > > > > > <
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-411%3A+Add+option+to+make+Kafka+Connect+task+client+ID+values+unique
> >> > > > > > > >*
> >> > > > > > >
> >> > > > > > > This proposes a small change to allow Kafka Connect the
> >> option to
> >> > > > > > > auto-generate unique client IDs for each task. This enables
> >> > > granular
> >> > > > > > > monitoring of the producer / consumer client in each task.
> >> > > > > > >
> >> > > > > > > Feedback is appreciated, thanks in advance!
> >> > > > > > >
> >> > > > > > > Paul
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] KIP-411: Add option to make Kafka Connect task client ID values unique

Reply via email to