[
https://issues.apache.org/jira/browse/KAFKA-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias J. Sax reopened KAFKA-8177:
------------------------------------
> Allow for separate connect instances to have sink connectors with the same
> name
> -------------------------------------------------------------------------------
>
> Key: KAFKA-8177
> URL: https://issues.apache.org/jira/browse/KAFKA-8177
> Project: Kafka
> Issue Type: Improvement
> Components: KafkaConnect
> Reporter: Paul Whalen
> Priority: Minor
> Labels: connect
>
> If you have multiple Connect instances (either a single standalone or
> distributed group of workers) running against the same Kafka cluster, the
> connect instances cannot each have a sink connector with the same name and
> still operate independently. This is because the consumer group ID used
> internally for reading from the source topic(s) is entirely derived from the
> connector's name:
> [https://github.com/apache/kafka/blob/d0e436c471ba4122ddcc0f7a1624546f97c4a517/connect/runtime/src/main/java/org/apache/kafka/connect/util/SinkUtils.java#L24]
> The documentation of Connect implies to me that it supports "multi-tenancy,"
> that is, as long as...
> * In standalone mode, the {{offset.storage.file.filename}} is not shared
> between instances
> * In distributed mode, {{group.id}} and {{config.storage.topic}},
> {{offset.storage.topic}}, and {{status.storage.topic}} are not the same
> between instances
> ... then the connect instances can operate completely independently without
> fear of conflict. But the sink connector consumer group naming policy makes
> this untrue. Obviously this can be achieved by uniquely naming connectors
> across instances, but in some environments that could be a bit of a nuisance,
> or a challenging policy to enforce. For instance, imagine a large group of
> developers or data analysts all running their own standalone Connect to load
> into a SQL database for their own analysis, or replicating to mirroring to
> their own local cluster for testing.
> The obvious solution is allow supplying config that gives a Connect instance
> some notion of identity, and to use that when creating the sink task consumer
> group. Distributed mode already has this obviously ({{group.id}}), but it
> would need to be added for standalone mode. Maybe {{instance.id}}? Given that
> solution it seems like this would need a small KIP.
> I could also imagine this solving this problem through better documentation
> ("ensure your connector names are unique!"), but having that subtlety doesn't
> seem worth it to me. (Optionally) assigning identity to every Connect
> instance seems strictly more clear, without any downside.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)