Paul, Thanks for updating the KIP with the proposal. I do think the KIP should at least mention that the prior behavior is to allow the worker to override the `producer.client.id` or `consumer.client.id`, which is entirely possible (though unlikely since there would be an MBean conflict, as pointed out in the discussion). It might be sufficient to just add a sentence to the "Compatibility, Deprecation, and Migration Plan" section, like "Any client IDs specified in the worker configuration via ` producer.client.id` or `consumer.client.id` properties will be unchanged, as those will take precedence." Thoughts?
Ryanne, IIUC your last message, I think the latest KIP proposal will align pretty closely with your suggestion. Can you review and confirm? Best regards, Randall On Fri, Mar 1, 2019 at 3:04 PM Ryanne Dolan <ryannedo...@gmail.com> wrote: > Paul, Randall, I don't think most people will care to exercise so much > control over the client IDs, so long as they are filled in automatically in > a way that eliminates duplicate metrics and remains somewhat legible. If we > let the user specify a pattern or something, we're really just making the > user worry about these requirements. > > For example, if they specify "foo" as the client.id, they'll get a bunch > of > exceptions about that MBean already existing. So they'll try > "${connectorName}-foo", which won't work because connectors that get > restarted will re-use the same client ID and the same MBean again. And so > on, until they end up solving the same problem we are trying to solve here. > > I think you at least need something like "connect-<taskId>-producer-dlq" to > avoid MBeans being re-registered within the same JVM. I believe the task ID > is based on the connector name, so you'd get e.g. > "connect-myconnector-1-producer". > > Ryanne > > > On Fri, Mar 1, 2019 at 12:44 PM Paul Davidson > <pdavid...@salesforce.com.invalid> wrote: > > > Thanks Randall. I like your suggestion: as you say, this would make it > > possible to usefully override the default client id properties. > > > > I'm not sure how we would handle the dead-letter queue case though - > maybe > > we could automatically add a "dlq-" prefix to the producer client id? > > > > If there is agreement on this change I will update the KIP and the PR > (when > > I find some time). > > > > > > On Thu, Feb 21, 2019 at 8:12 AM Randall Hauch <rha...@gmail.com> wrote: > > > > > Hi, Paul. Thanks for the update to KIP-411 to reflect adding defaults, > > and > > > creating/updating https://github.com/apache/kafka/pull/6097 to reflect > > > this > > > approach. > > > > > > Now that we've avoided adding a new config and have changed the > default ` > > > client.id` to include some context, the connector name, and task > > number, I > > > think it makes overriding the client ID via worker config ` > > > producer.client.id` or `consumer.client.id` properties less valuable > > > because those overridden client IDs will be exactly the same for all > > > connectors and tasks. > > > > > > One one hand, we can leave this as-is, and any users that include ` > > > producer.client.id` and `consumer.client.id` in their worker configs > > keep > > > the same (sort of useless) behavior. In fact, most users would probably > > be > > > better off by removing these worker config properties and instead > relying > > > upon the defaults. > > > > > > On the other, similar to what Ewen suggested earlier (in a different > > > context), we could add support for users to optionally use > > > "${connectorName}" and ${task}" in their overridden client ID property > > and > > > have Connect replace these (if found) with the connector name and task > > > number. Any existing properties that don't use these variables would > > behave > > > as-is, but this way the users could define their own client IDs yet > still > > > get the benefit of uniquely identifying each of the clients. For > example, > > > if my worker config contained the following: > > > > > > producer.client.id > > =connect-cluster-A-${connectorName}-${task}-producer > > > consumer.client.id > > =connect-cluster-A-${connectorName}-${task}-consumer > > > > > > Thoughts? > > > > > > Randall > > > > > > On Wed, Feb 20, 2019 at 3:18 PM Ryanne Dolan <ryannedo...@gmail.com> > > > wrote: > > > > > > > Thanks Paul, this is great. This will make monitoring Connect a ton > > > easier. > > > > > > > > Ryanne > > > > > > > > On Wed, Feb 20, 2019 at 1:24 PM Paul Davidson > > > > <pdavid...@salesforce.com.invalid> wrote: > > > > > > > > > I have updated KIP-411 to propose changing the default client id - > > see: > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-411%3A+Make+default+Kafka+Connect+worker+task+client+IDs+distinct > > > > > > > > > > > > > > > There is also an PR ready to go here: > > > > > https://github.com/apache/kafka/pull/6097 > > > > > > > > > > On Fri, Jan 11, 2019 at 3:39 PM Paul Davidson < > > > pdavid...@salesforce.com> > > > > > wrote: > > > > > > > > > > > Hi everyone. We seem to have agreement that the ideal approach > is > > to > > > > > > alter the default client ids. Now I'm wondering about the best > > > process > > > > to > > > > > > proceed. Will the change in default behaviour require a new KIP, > > > given > > > > it > > > > > > will affect existing deployments? Would I be best to repurpose > > this > > > > > > KIP-411, or am I best to create a new KIP? Thanks! > > > > > > > > > > > > Paul > > > > > > > > > > > > On Tue, Jan 8, 2019 at 7:16 PM Randall Hauch <rha...@gmail.com> > > > wrote: > > > > > > > > > > > >> Hi, Paul. > > > > > >> > > > > > >> I concur with the others, and I like the new approach that > avoids > > a > > > > new > > > > > >> configuration, especially because it does not change the > behavior > > > for > > > > > >> anyone already using `producer.client.id` and/or ` > > > consumer.client.id > > > > `. > > > > > I > > > > > >> did leave a few comments on the PR. Perhaps the biggest one is > > > whether > > > > > the > > > > > >> producer used for the sink task error reporter (for dead letter > > > queue) > > > > > >> should be `connector-producer-<sink-task-id>`, and whether that > is > > > > > >> distinct > > > > > >> enough from source tasks, which will be of the form > > > > > >> `connector-producer-<source-task-id>`. Maybe it is fine. (The > > other > > > > > >> comments were minor.) > > > > > >> > > > > > >> Best regards, > > > > > >> > > > > > >> Randall > > > > > >> > > > > > >> On Mon, Jan 7, 2019 at 1:19 PM Paul Davidson < > > > > pdavid...@salesforce.com> > > > > > >> wrote: > > > > > >> > > > > > >> > Thanks all. I've submitted a new PR with a possible > > > implementation: > > > > > >> > https://github.com/apache/kafka/pull/6097. Note I did not > > include > > > > the > > > > > >> > group > > > > > >> > ID as part of the default client ID, mainly to avoid the > > connector > > > > > name > > > > > >> > appearing twice by default. As noted in the original Jira ( > > > > > >> > https://issues.apache.org/jira/browse/KAFKA-5061), leaving > out > > > the > > > > > >> group > > > > > >> > ID > > > > > >> > could lead to naming conflicts if multiple clusters run the > same > > > > Kafka > > > > > >> > cluster. This would probably not be a problem for many > > (including > > > > us) > > > > > as > > > > > >> > metrics exporters can usually be configured to include a > cluster > > > ID > > > > > and > > > > > >> > guarantee uniqueness. Will be interested to hear your thoughts > > on > > > > > this. > > > > > >> > > > > > > >> > Paul > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > On Mon, Jan 7, 2019 at 10:27 AM Ryanne Dolan < > > > ryannedo...@gmail.com > > > > > > > > > > >> > wrote: > > > > > >> > > > > > > >> > > I'd also prefer to avoid the new configuration property if > > > > possible. > > > > > >> > Seems > > > > > >> > > like a lighter touch without it. > > > > > >> > > > > > > > >> > > Ryanne > > > > > >> > > > > > > > >> > > On Sun, Jan 6, 2019 at 7:25 PM Paul Davidson < > > > > > >> pdavid...@salesforce.com> > > > > > >> > > wrote: > > > > > >> > > > > > > > >> > > > Hi Konstantine, > > > > > >> > > > > > > > > >> > > > Thanks for your feedback! I think my reply to Ewen covers > > > most > > > > of > > > > > >> your > > > > > >> > > > points, and I mostly agree. If there is general agreement > > > that > > > > > >> > changing > > > > > >> > > > the default behavior is preferable to a config change I > will > > > > > update > > > > > >> my > > > > > >> > PR > > > > > >> > > > to use that approach. > > > > > >> > > > > > > > > >> > > > Paul > > > > > >> > > > > > > > > >> > > > On Fri, Jan 4, 2019 at 5:55 PM Konstantine Karantasis < > > > > > >> > > > konstant...@confluent.io> wrote: > > > > > >> > > > > > > > > >> > > > > Hi Paul. > > > > > >> > > > > > > > > > >> > > > > I second Ewen and I intended to give similar feedback: > > > > > >> > > > > > > > > > >> > > > > 1) Can we avoid a config altogether? > > > > > >> > > > > 2) If we prefer to add a config anyways, can we use a > set > > of > > > > > >> allowed > > > > > >> > > > values > > > > > >> > > > > instead of a boolean, even if initially these values are > > > only > > > > > >> two? As > > > > > >> > > the > > > > > >> > > > > discussion on Jira highlights, there is a potential for > > more > > > > > >> naming > > > > > >> > > > > conventions in the future, even if now the extra > > > functionality > > > > > >> > doesn't > > > > > >> > > > seem > > > > > >> > > > > essential. It's not optimal to have to deprecate a > config > > > > > instead > > > > > >> of > > > > > >> > > just > > > > > >> > > > > extending its set of values. > > > > > >> > > > > 3) I agree, the config name sounds too general. How > about > > > > > >> > > > > "client.ids.naming.policy" or "client.ids.naming" if you > > > want > > > > > two > > > > > >> > more > > > > > >> > > > > options? > > > > > >> > > > > > > > > > >> > > > > Konstantine > > > > > >> > > > > > > > > > >> > > > > On Fri, Jan 4, 2019 at 7:38 AM Ewen Cheslack-Postava < > > > > > >> > > e...@confluent.io> > > > > > >> > > > > wrote: > > > > > >> > > > > > > > > > >> > > > > > Hi Paul, > > > > > >> > > > > > > > > > > >> > > > > > Thanks for the KIP. A few comments. > > > > > >> > > > > > > > > > > >> > > > > > To me, biggest question here is if we can fix this > > > behavior > > > > > >> without > > > > > >> > > > > adding > > > > > >> > > > > > a config. In particular, today, we don't even set the > > > > > client.id > > > > > >> > for > > > > > >> > > > the > > > > > >> > > > > > producer and consumer at all, right? The *only* way it > > is > > > > set > > > > > >> is if > > > > > >> > > you > > > > > >> > > > > > include an override in the worker config, but in that > > case > > > > you > > > > > >> need > > > > > >> > > to > > > > > >> > > > be > > > > > >> > > > > > explicitly opting in with a `producer.` or `consumer.` > > > > prefix, > > > > > >> i.e. > > > > > >> > > the > > > > > >> > > > > > settings are `producer.client.id` and ` > > consumer.client.id > > > `. > > > > > >> > > > Otherwise, I > > > > > >> > > > > > think we're getting the default behavior where we > > generate > > > > > >> unique, > > > > > >> > > > > > per-process IDs, i.e. via this logic > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L662-L664 > > > > > >> > > > > > > > > > > >> > > > > > If that's the case, would it maybe be possible to > > > compatibly > > > > > >> change > > > > > >> > > the > > > > > >> > > > > > default to use task IDs in the client ID, but only if > we > > > > don't > > > > > >> see > > > > > >> > an > > > > > >> > > > > > existing override from the worker config? This would > > only > > > > > change > > > > > >> > the > > > > > >> > > > > > behavior when someone is using the default, but since > > the > > > > > >> default > > > > > >> > > would > > > > > >> > > > > > just use what is effectively a random ID that is > useless > > > for > > > > > >> > > monitoring > > > > > >> > > > > > metrics, presumably this wouldn't affect any existing > > > > users. I > > > > > >> > think > > > > > >> > > > that > > > > > >> > > > > > would avoid having to introduce the config, give > better > > > out > > > > of > > > > > >> the > > > > > >> > > box > > > > > >> > > > > > behavior, and still be a safe, compatible change to > > make. > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > Other than that, just two minor comments. On the > config > > > > > naming, > > > > > >> not > > > > > >> > > > sure > > > > > >> > > > > > about a better name, but I think the config name could > > be > > > a > > > > > bit > > > > > >> > > clearer > > > > > >> > > > > if > > > > > >> > > > > > we need to have it. Maybe something including "task" > > like > > > > > >> > > > > > "task.based.client.ids" or something like that (or > > change > > > > the > > > > > >> type > > > > > >> > to > > > > > >> > > > be > > > > > >> > > > > an > > > > > >> > > > > > enum and make it something like > > > > task.client.ids=[default|task] > > > > > >> and > > > > > >> > > > leave > > > > > >> > > > > it > > > > > >> > > > > > open for extension in the future if needed). > > > > > >> > > > > > > > > > > >> > > > > > Finally, you have this: > > > > > >> > > > > > > > > > > >> > > > > > *"Allow overriding client.id <http://client.id/> on a > > > > > >> > per-connector > > > > > >> > > > > > basis"* > > > > > >> > > > > > > > > > > > >> > > > > > > This is a much more complex change, and would > require > > > > > >> individual > > > > > >> > > > > > > connectors to be updated to support the change. In > > > > contrast, > > > > > >> the > > > > > >> > > > > proposed > > > > > >> > > > > > > approach would immediately allow detailed > > > > consumer/producer > > > > > >> > > > monitoring > > > > > >> > > > > > for > > > > > >> > > > > > > all existing connectors. > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > I don't think this is quite accurate. I think the > reason > > > to > > > > > >> reject > > > > > >> > is > > > > > >> > > > > that > > > > > >> > > > > > for your particular requirement for metrics, it simply > > > > doesn't > > > > > >> give > > > > > >> > > > > enough > > > > > >> > > > > > granularity (there's only one value per entire > > connector), > > > > but > > > > > >> it > > > > > >> > > > doesn't > > > > > >> > > > > > require any changes to connectors. The framework > > allocates > > > > all > > > > > >> of > > > > > >> > > these > > > > > >> > > > > and > > > > > >> > > > > > there are already framework-defined config values that > > all > > > > > >> > connectors > > > > > >> > > > > share > > > > > >> > > > > > (some for only sinks or sources), so the framework can > > > > handle > > > > > >> all > > > > > >> > of > > > > > >> > > > this > > > > > >> > > > > > without changes to connectors. Further, with > > > > > connector-specific > > > > > >> > > > > overrides, > > > > > >> > > > > > you could get task-specific values if interpolation > were > > > > > >> supported > > > > > >> > in > > > > > >> > > > the > > > > > >> > > > > > config value (as we now do with managed secrets). For > > > > example, > > > > > >> it > > > > > >> > > could > > > > > >> > > > > > support something like client.id=connector-${taskId} > > and > > > > the > > > > > >> task > > > > > >> > ID > > > > > >> > > > > would > > > > > >> > > > > > be substituted automatically into the string. > > > > > >> > > > > > > > > > > >> > > > > > I don't necessarily like that solution (seems > > complicated > > > > and > > > > > >> not a > > > > > >> > > > great > > > > > >> > > > > > user experience), but it could work. > > > > > >> > > > > > > > > > > >> > > > > > -Ewen > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > On Thu, Dec 20, 2018 at 5:05 PM Paul Davidson < > > > > > >> > > > pdavid...@salesforce.com> > > > > > >> > > > > > wrote: > > > > > >> > > > > > > > > > > >> > > > > > > Hi everyone, > > > > > >> > > > > > > > > > > > >> > > > > > > I would like to start a discussion around the > > following > > > > KIP: > > > > > >> > > > > > > * > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-411%3A+Add+option+to+make+Kafka+Connect+task+client+ID+values+unique > > > > > >> > > > > > > < > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-411%3A+Add+option+to+make+Kafka+Connect+task+client+ID+values+unique > > > > > >> > > > > > > >* > > > > > >> > > > > > > > > > > > >> > > > > > > This proposes a small change to allow Kafka Connect > > the > > > > > >> option to > > > > > >> > > > > > > auto-generate unique client IDs for each task. This > > > > enables > > > > > >> > > granular > > > > > >> > > > > > > monitoring of the producer / consumer client in each > > > task. > > > > > >> > > > > > > > > > > > >> > > > > > > Feedback is appreciated, thanks in advance! > > > > > >> > > > > > > > > > > > >> > > > > > > Paul > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > >