Re: [DISCUSS] KIP-411: Add option to make Kafka Connect task client ID values unique

Paul Davidson Wed, 20 Feb 2019 11:26:19 -0800

I have updated KIP-411 to propose changing the default client id - see:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-411%3A+Make+default+Kafka+Connect+worker+task+client+IDs+distinct



There is also an PR ready to go here:
https://github.com/apache/kafka/pull/6097

On Fri, Jan 11, 2019 at 3:39 PM Paul Davidson <[email protected]>
wrote:

> Hi everyone.  We seem to have agreement that the ideal approach is to
> alter the default client ids. Now I'm wondering about the best process to
> proceed. Will the change in default behaviour require a new KIP, given it
> will affect existing deployments?  Would I be best to repurpose this
> KIP-411, or am I best to  create a new KIP? Thanks!
>
> Paul
>
> On Tue, Jan 8, 2019 at 7:16 PM Randall Hauch <[email protected]> wrote:
>
>> Hi, Paul.
>>
>> I concur with the others, and I like the new approach that avoids a new
>> configuration, especially because it does not change the behavior for
>> anyone already using `producer.client.id` and/or `consumer.client.id`. I
>> did leave a few comments on the PR. Perhaps the biggest one is whether the
>> producer used for the sink task error reporter (for dead letter queue)
>> should be `connector-producer-<sink-task-id>`, and whether that is
>> distinct
>> enough from source tasks, which will be of the form
>> `connector-producer-<source-task-id>`. Maybe it is fine. (The other
>> comments were minor.)
>>
>> Best regards,
>>
>> Randall
>>
>> On Mon, Jan 7, 2019 at 1:19 PM Paul Davidson <[email protected]>
>> wrote:
>>
>> > Thanks all. I've submitted a new PR with a possible implementation:
>> > https://github.com/apache/kafka/pull/6097. Note I did not include the
>> > group
>> > ID as part of the default client ID, mainly to avoid the connector name
>> > appearing twice by default. As noted in the original Jira (
>> > https://issues.apache.org/jira/browse/KAFKA-5061), leaving out the
>> group
>> > ID
>> > could lead to naming conflicts if multiple clusters run the same Kafka
>> > cluster. This would probably not be a problem for many (including us) as
>> > metrics exporters can usually be configured to include a cluster ID and
>> > guarantee uniqueness. Will be interested to hear your thoughts on this.
>> >
>> > Paul
>> >
>> >
>> >
>> > On Mon, Jan 7, 2019 at 10:27 AM Ryanne Dolan <[email protected]>
>> > wrote:
>> >
>> > > I'd also prefer to avoid the new configuration property if possible.
>> > Seems
>> > > like a lighter touch without it.
>> > >
>> > > Ryanne
>> > >
>> > > On Sun, Jan 6, 2019 at 7:25 PM Paul Davidson <
>> [email protected]>
>> > > wrote:
>> > >
>> > > > Hi Konstantine,
>> > > >
>> > > > Thanks for your feedback!  I think my reply to Ewen covers most of
>> your
>> > > > points, and I mostly agree.  If there is general agreement that
>> > changing
>> > > > the default behavior is preferable to a config change I will update
>> my
>> > PR
>> > > > to use  that approach.
>> > > >
>> > > > Paul
>> > > >
>> > > > On Fri, Jan 4, 2019 at 5:55 PM Konstantine Karantasis <
>> > > > [email protected]> wrote:
>> > > >
>> > > > > Hi Paul.
>> > > > >
>> > > > > I second Ewen and I intended to give similar feedback:
>> > > > >
>> > > > > 1) Can we avoid a config altogether?
>> > > > > 2) If we prefer to add a config anyways, can we use a set of
>> allowed
>> > > > values
>> > > > > instead of a boolean, even if initially these values are only
>> two? As
>> > > the
>> > > > > discussion on Jira highlights, there is a potential for more
>> naming
>> > > > > conventions in the future, even if now the extra functionality
>> > doesn't
>> > > > seem
>> > > > > essential. It's not optimal to have to deprecate a config instead
>> of
>> > > just
>> > > > > extending its set of values.
>> > > > > 3) I agree, the config name sounds too general. How about
>> > > > > "client.ids.naming.policy" or "client.ids.naming" if you want two
>> > more
>> > > > > options?
>> > > > >
>> > > > > Konstantine
>> > > > >
>> > > > > On Fri, Jan 4, 2019 at 7:38 AM Ewen Cheslack-Postava <
>> > > [email protected]>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi Paul,
>> > > > > >
>> > > > > > Thanks for the KIP. A few comments.
>> > > > > >
>> > > > > > To me, biggest question here is if we can fix this behavior
>> without
>> > > > > adding
>> > > > > > a config. In particular, today, we don't even set the client.id
>> > for
>> > > > the
>> > > > > > producer and consumer at all, right? The *only* way it is set
>> is if
>> > > you
>> > > > > > include an override in the worker config, but in that case you
>> need
>> > > to
>> > > > be
>> > > > > > explicitly opting in with a `producer.` or `consumer.` prefix,
>> i.e.
>> > > the
>> > > > > > settings are `producer.client.id` and `consumer.client.id`.
>> > > > Otherwise, I
>> > > > > > think we're getting the default behavior where we generate
>> unique,
>> > > > > > per-process IDs, i.e. via this logic
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L662-L664
>> > > > > >
>> > > > > > If that's the case, would it maybe be possible to compatibly
>> change
>> > > the
>> > > > > > default to use task IDs in the client ID, but only if we don't
>> see
>> > an
>> > > > > > existing override from the worker config? This would only change
>> > the
>> > > > > > behavior when someone is using the default, but since the
>> default
>> > > would
>> > > > > > just use what is effectively a random ID that is useless for
>> > > monitoring
>> > > > > > metrics, presumably this wouldn't affect any existing users. I
>> > think
>> > > > that
>> > > > > > would avoid having to introduce the config, give better out of
>> the
>> > > box
>> > > > > > behavior, and still be a safe, compatible change to make.
>> > > > > >
>> > > > > >
>> > > > > > Other than that, just two minor comments. On the config naming,
>> not
>> > > > sure
>> > > > > > about a better name, but I think the config name could be a bit
>> > > clearer
>> > > > > if
>> > > > > > we need to have it. Maybe something including "task" like
>> > > > > > "task.based.client.ids" or something like that (or change the
>> type
>> > to
>> > > > be
>> > > > > an
>> > > > > > enum and make it something like task.client.ids=[default|task]
>> and
>> > > > leave
>> > > > > it
>> > > > > > open for extension in the future if needed).
>> > > > > >
>> > > > > > Finally, you have this:
>> > > > > >
>> > > > > > *"Allow overriding client.id <http://client.id/> on a
>> > per-connector
>> > > > > > basis"*
>> > > > > > >
>> > > > > > > This is a much more complex change, and would require
>> individual
>> > > > > > > connectors to be updated to support the change. In contrast,
>> the
>> > > > > proposed
>> > > > > > > approach would immediately allow detailed consumer/producer
>> > > > monitoring
>> > > > > > for
>> > > > > > > all existing connectors.
>> > > > > > >
>> > > > > >
>> > > > > > I don't think this is quite accurate. I think the reason to
>> reject
>> > is
>> > > > > that
>> > > > > > for your particular requirement for metrics, it simply doesn't
>> give
>> > > > > enough
>> > > > > > granularity (there's only one value per entire connector), but
>> it
>> > > > doesn't
>> > > > > > require any changes to connectors. The framework allocates all
>> of
>> > > these
>> > > > > and
>> > > > > > there are already framework-defined config values that all
>> > connectors
>> > > > > share
>> > > > > > (some for only sinks or sources), so the framework can handle
>> all
>> > of
>> > > > this
>> > > > > > without changes to connectors. Further, with connector-specific
>> > > > > overrides,
>> > > > > > you could get task-specific values if interpolation were
>> supported
>> > in
>> > > > the
>> > > > > > config value (as we now do with managed secrets). For example,
>> it
>> > > could
>> > > > > > support something like client.id=connector-${taskId} and the
>> task
>> > ID
>> > > > > would
>> > > > > > be substituted automatically into the string.
>> > > > > >
>> > > > > > I don't necessarily like that solution (seems complicated and
>> not a
>> > > > great
>> > > > > > user experience), but it could work.
>> > > > > >
>> > > > > > -Ewen
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Thu, Dec 20, 2018 at 5:05 PM Paul Davidson <
>> > > > [email protected]>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hi everyone,
>> > > > > > >
>> > > > > > > I would like to start a discussion around the following KIP:
>> > > > > > > *
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-411%3A+Add+option+to+make+Kafka+Connect+task+client+ID+values+unique
>> > > > > > > <
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-411%3A+Add+option+to+make+Kafka+Connect+task+client+ID+values+unique
>> > > > > > > >*
>> > > > > > >
>> > > > > > > This proposes a small change to allow Kafka Connect the
>> option to
>> > > > > > > auto-generate unique client IDs for each task. This enables
>> > > granular
>> > > > > > > monitoring of the producer / consumer client in each task.
>> > > > > > >
>> > > > > > > Feedback is appreciated, thanks in advance!
>> > > > > > >
>> > > > > > > Paul
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-411: Add option to make Kafka Connect task client ID values unique

Reply via email to