[ 
https://issues.apache.org/jira/browse/KAFKA-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509303#comment-17509303
 ] 

Chris Egerton edited comment on KAFKA-7509 at 3/19/22, 4:38 PM:
----------------------------------------------------------------

RE backwards compatibility: That's a fair point about feature flag gating. You 
could have that gating at multiple levels in Connect:
 # Gating for the top-level worker config (either on an all-or-nothing basis, 
with one property per case, or something else entirely), which would affect 
these cases that are currently configured using the entire worker config:
 ## The Kafka clients that the worker creates for managing internal topics, 
fetching the Kafka cluster ID, etc. (possibly even one property for each type 
of client–admin, producer, and consumer)
 ## REST extensions brought up on the worker
 ## The connector client config override policy used by the worker
 ## Admin clients that are brought up on behalf of connectors (see KAFKA-9046 
for more context on this special case)
 # Gating for interceptors used by each Kafka client that's configured in the 
worker config
 ## The Kafka clients used for internal topics, etc. described in section 1.1
 ## The admin clients brought up on behalf of connectors described in section 
1.4
 ## The Kafka clients used for connectors (configured using the 
{{{}consumer.{}}}, {{{}producer.{}}}, and {{admin.}} prefixes)
 # Gating for interceptors used by each Kafka client that's configured in a 
connector config
 ## There's just one case for this (configured using the 
{{{}consumer.override.{}}}, {{{}producer.override.{}}}, and {{admin.override.}} 
prefixes)

 

 

We could also theoretically introduce namespacing for connector configs so that 
{{Connector}} instances don't receive the {{{}key.converter.*{}}}, 
{{{}value.converter.*{}}}, {{{}header.converter.*{}}}, {{{}transforms.*{}}}, 
and {{predicates.*}} config properties, although given the 
currently-mostly-benign KAFKA-9228, we might want to hold off on that. It would 
allow {{Connector}} (and probably {{{}Task{}}}) implementations to log warnings 
on unrecognized config properties though, which they can't realistically do 
right now without producing a ton of false positives.

At the Kafka client level, this approach seems pretty simple, but given all of 
the nesting that can happen in Kafka Connect with fairly standard usage, it 
does get more complicated. I think a reasonable way to make this happen could 
be to use the same config property in the Connect worker config that we would 
in Kafka clients–something like {{config.namespacing.enabled}} (name obviously 
subject to change), which could be specified in the worker config at the top 
level and would control both namespacing behavior for all 1.* cases outlined 
above, and establish a default for all 2.* and 3.* cases, which could then be 
overridden using existing mechanisms in Connect for overriding worker defaults. 
For example, to disable namespacing by default for all connector producers on 
your worker, you could specify {{{}producer.config.namespacing.enabled = 
false{}}}.

 

RE encapsulation: There's a few cases where it's been useful for Connect worker 
plugins to be aware of the worker config that I've encountered:
 * REST extensions:
 ** Monitoring the internal topics of the Connect worker in order to expose 
extra metadata about connector health, like injecting a "last modified" time 
into the response from the {{GET /connector/\{connector}/status}} endpoint
 ** Modifying key/value/header converter configurations before a connector is 
created, which requires knowledge of the {{key.converter.* / 
value.converter.{*}/ header.converter.{*}}} properties in the worker config in 
order to handle the case that the connector config doesn't contain overrides 
for the given converter type
 * Config providers (which don't receive the full worker config at the moment, 
but could possibly benefit from receiving it):
 ** Store and retrieve secrets using a Kafka topic on the same cluster that the 
Connect worker uses for its internal topics

I also wonder if a consumer interceptor might benefit from knowing how to 
access the Kafka cluster that the consumer is reading from in order to write 
custom metrics to a Kafka topic, although I haven't personally done too much 
work in this space so I don't know for sure if it's a realistic use case.

All of these cases can be easily addressed by requiring users to duplicate 
top-level configs into namespaced configs (e.g., specify the 
{{{}bootstrap.servers{}}}, {{{}sasl.jaas.config{}}}, {{{}security.protocol{}}}, 
etc. properties twice in your worker config: once at the top level, and once 
with something like a {{rest.extension.foo.*}} namespace. But that'd be awful 
for end users who will inevitably ask why we couldn't come up with something a 
little more sophisticated in order to save them work copy+pasting.

Introducing a {{forward.}} prefix might help but the more I think about it, the 
likelier it seems to cause confusion in some cases. What would the behavior be 
for a property in the top-level worker config that comes with the {{forward.}} 
prefix w/r/t case 2.3 (interceptors for Kafka clients configured for connectors 
in the worker config)?

 

RE disabling useful warnings: I don't really understand this point. If 
disabling these warnings is an opt-in feature in every context except the ones 
where the signal-to-noise ratio is already so low as to render these messages 
useless at best and actively harmful at worst, what's the actual risk here? For 
a concrete example, see the attachments I've provided here: 
[^connect-distributed.log], [^connect-spurious-warnings.log]

The first is the raw output of running {{bin/connect-distributed.sh 
config/connect-distributed.properties}} after building the latest trunk (commit 
{{8d6968e}} at the time), up until the Kafka Connect worker [finished 
startup|https://github.com/apache/kafka/blob/8d6968e8322d74ebb0fde513113d42bef69fb72b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Connect.java#L57].
 The second is just the result of running {{grep ' WARN The configuration'}} on 
that file.

The first file is 1838 lines long. The second is 179 lines long. 9.7% of the 
log lines emitted during startup were spurious warnings about unrecognized 
config properties. This volume of spurious warnings is basically unavoidable 
with any released version of Connect today and completely destroys any 
potential they have to provide value to users. In fact, they're actively 
confusing users who worry that something is going wrong–see 
[here|https://stackoverflow.com/questions/55455709/kafka-connect-the-configuration-xxx-was-supplied-but-isnt-a-known-config-in-ad],
 [here|https://github.com/confluentinc/cp-helm-charts/issues/481], and 
[here|https://groups.google.com/g/confluent-platform/c/10SjqweFFmk/m/acoFmN_-AQAJ]
 for a few examples I was able to find after a 30-second Google search.


was (Author: chrisegerton):
RE backwards compatibility: That's a fair point about feature flag gating. You 
could have that gating at multiple levels in Connect:
 # Gating for the top-level worker config (either on an all-or-nothing basis, 
with one property per case, or something else entirely), which would affect 
these cases that are currently configured using the entire worker config:
 ## The Kafka clients that the worker creates for managing internal topics, 
fetching the Kafka cluster ID, etc. (possibly even one property for each type 
of client–admin, producer, and consumer)
 ## REST extensions brought up on the worker
 ## The connector client config override policy used by the worker
 ## Admin clients that are brought up on behalf of connectors (see KAFKA-9046 
for more context on this special case)
 # Gating for interceptors used by each Kafka client that's configured in the 
worker config
 ## The Kafka clients used for internal topics, etc. described in section 1.1
 ## The admin clients brought up on behalf of connectors described in section 
1.4
 ## The Kafka clients used for connectors (configured using the 
{{{}consumer.{}}}, {{{}producer.{}}}, and {{admin.}} prefixes)
 # Gating for interceptors used by each Kafka client that's configured in a 
connector config
 ## There's just one case for this (configured using the 
{{{}consumer.override.{}}}, {{{}producer.override.{}}}, and {{admin.override.}} 
prefixes)

We could also theoretically introduce namespacing for connector configs so that 
{{Connector}} instances don't receive the {{{}key.converter.*{}}}, 
{{{}value.converter.{}}}, {{{}header.converter.*{}}}{*},{*} 
{{{}transforms.*{}}}, and {{predicates.*}} config properties, although given 
the currently-mostly-benign KAFKA-9228, we might want to hold off on that. It 
would allow {{Connector}} (and probably {{{}Task{}}}) implementations to log 
warnings on unrecognized config properties though, which they can't 
realistically do right now without producing a ton of false positives.

At the Kafka client level, this approach seems pretty simple, but given all of 
the nesting that can happen in Kafka Connect with fairly standard usage, it 
does get more complicated. I think a reasonable way to make this happen could 
be to use the same config property in the Connect worker config that we would 
in Kafka clients–something like {{config.namespacing.enabled}} (name obviously 
subject to change), which could be specified in the worker config at the top 
level and would control both namespacing behavior for all 1.* cases outlined 
above, and establish a default for all 2.* and 3.* cases, which could then be 
overridden using existing mechanisms in Connect for overriding worker defaults. 
For example, to disable namespacing by default for all connector producers on 
your worker, you could specify {{{}producer.config.namespacing.enabled = 
false{}}}.

 

RE encapsulation: There's a few cases where it's been useful for Connect worker 
plugins to be aware of the worker config that I've encountered:
 * REST extensions:
 ** Monitoring the internal topics of the Connect worker in order to expose 
extra metadata about connector health, like injecting a "last modified" time 
into the response from the {{GET /connector/\{connector}/status}} endpoint
 ** Modifying key/value/header converter configurations before a connector is 
created, which requires knowledge of the {{key.converter.* / 
value.converter.{*}/ header.converter.{*}}} properties in the worker config in 
order to handle the case that the connector config doesn't contain overrides 
for the given converter type
 * Config providers (which don't receive the full worker config at the moment, 
but could possibly benefit from receiving it):
 ** Store and retrieve secrets using a Kafka topic on the same cluster that the 
Connect worker uses for its internal topics

I also wonder if a consumer interceptor might benefit from knowing how to 
access the Kafka cluster that the consumer is reading from in order to write 
custom metrics to a Kafka topic, although I haven't personally done too much 
work in this space so I don't know for sure if it's a realistic use case.

All of these cases can be easily addressed by requiring users to duplicate 
top-level configs into namespaced configs (e.g., specify the 
{{{}bootstrap.servers{}}}, {{{}sasl.jaas.config{}}}, {{{}security.protocol{}}}, 
etc. properties twice in your worker config: once at the top level, and once 
with something like a {{rest.extension.foo.*}} namespace. But that'd be awful 
for end users who will inevitably ask why we couldn't come up with something a 
little more sophisticated in order to save them work copy+pasting.

Introducing a {{forward.}} prefix might help but the more I think about it, the 
likelier it seems to cause confusion in some cases. What would the behavior be 
for a property in the top-level worker config that comes with the {{forward.}} 
prefix w/r/t case 2.3 (interceptors for Kafka clients configured for connectors 
in the worker config)?

 

RE disabling useful warnings: I don't really understand this point. If 
disabling these warnings is an opt-in feature in every context except the ones 
where the signal-to-noise ratio is already so low as to render these messages 
useless at best and actively harmful at worst, what's the actual risk here? For 
a concrete example, see the attachments I've provided here: 
[^connect-distributed.log], [^connect-spurious-warnings.log]

The first is the raw output of running {{bin/connect-distributed.sh 
config/connect-distributed.properties}} after building the latest trunk (commit 
{{8d6968e}} at the time), up until the Kafka Connect worker [finished 
startup|https://github.com/apache/kafka/blob/8d6968e8322d74ebb0fde513113d42bef69fb72b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Connect.java#L57].
 The second is just the result of running {{grep ' WARN The configuration'}} on 
that file.

The first file is 1838 lines long. The second is 179 lines long. 9.7% of the 
log lines emitted during startup were spurious warnings about unrecognized 
config properties. This volume of spurious warnings is basically unavoidable 
with any released version of Connect today and completely destroys any 
potential they have to provide value to users. In fact, they're actively 
confusing users who worry that something is going wrong–see 
[here|https://stackoverflow.com/questions/55455709/kafka-connect-the-configuration-xxx-was-supplied-but-isnt-a-known-config-in-ad],
 [here|https://github.com/confluentinc/cp-helm-charts/issues/481], and 
[here|https://groups.google.com/g/confluent-platform/c/10SjqweFFmk/m/acoFmN_-AQAJ]
 for a few examples I was able to find after a 30-second Google search.

> Kafka Connect logs unnecessary warnings about unused configurations
> -------------------------------------------------------------------
>
>                 Key: KAFKA-7509
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7509
>             Project: Kafka
>          Issue Type: Improvement
>          Components: clients, KafkaConnect
>    Affects Versions: 0.10.2.0
>            Reporter: Randall Hauch
>            Priority: Major
>
> When running Connect, the logs contain quite a few warnings about "The 
> configuration '{}' was supplied but isn't a known config." This occurs when 
> Connect creates producers, consumers, and admin clients, because the 
> AbstractConfig is logging unused configuration properties upon construction. 
> It's complicated by the fact that the Producer, Consumer, and AdminClient all 
> create their own AbstractConfig instances within the constructor, so we can't 
> even call its {{ignore(String key)}} method.
> See also KAFKA-6793 for a similar issue with Streams.
> There are no arguments in the Producer, Consumer, or AdminClient constructors 
> to control  whether the configs log these warnings, so a simpler workaround 
> is to only pass those configuration properties to the Producer, Consumer, and 
> AdminClient that the ProducerConfig, ConsumerConfig, and AdminClientConfig 
> configdefs know about.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to