Paul Whalen created KAFKA-8177:
----------------------------------

             Summary: Allow for separate connect instances to have sink 
connectors with the same name
                 Key: KAFKA-8177
                 URL: https://issues.apache.org/jira/browse/KAFKA-8177
             Project: Kafka
          Issue Type: Improvement
          Components: KafkaConnect
            Reporter: Paul Whalen


If you have multiple Connect instances (either a single standalone or 
distributed group of workers) running against the same Kafka cluster, the 
connect instances cannot each have a sink connector with the same name and 
still operate independently. This is because the consumer group ID used 
internally for reading from the source topic(s) is entirely derived from the 
connector's name: 
[https://github.com/apache/kafka/blob/d0e436c471ba4122ddcc0f7a1624546f97c4a517/connect/runtime/src/main/java/org/apache/kafka/connect/util/SinkUtils.java#L24]

The documentation of Connect implies to me that it supports "multi-tenancy," 
that is, as long as...
 * In standalone mode, the {{offset.storage.file.filename}} is not shared 
between instances
 * In distributed mode, {{group.id}} and {{config.storage.topic}}, 
{{offset.storage.topic}}, and {{status.storage.topic}} are not the same between 
instances

... then the connect instances can operate completely independently without 
fear of conflict.  But the sink connector consumer group naming policy makes 
this untrue. Obviously this can be achieved by uniquely naming connectors 
across instances, but in some environments that could be a bit of a nuisance, 
or a challenging policy to enforce. For instance, imagine a large group of 
developers or data analysts all running their own standalone Connect to load 
into a SQL database for their own analysis, or replicating to mirroring to 
their own local cluster for testing.

The obvious solution is allow supplying config that gives a Connect instance 
some notion of identity, and to use that when creating the sink task consumer 
group. Distributed mode already has this obviously ({{group.id}}), but it would 
need to be added for standalone mode. Maybe {{instance.id}}? Given that 
solution it seems like this would need a small KIP.

I could also imagine this solving this problem through better documentation 
("ensure your connector names are unique!"), but having that subtlety doesn't 
seem worth it to me. (Optionally) assigning identity to every Connect instance 
seems strictly more clear, without any downside.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to