[ https://issues.apache.org/jira/browse/KAFKA-15841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825454#comment-17825454 ]
Greg Harris commented on KAFKA-15841: ------------------------------------- [~henriquemota] Okay I think i understand better what you're trying to achieve. > ... one topic per table... > We have a JDBC Sink for each table. Okay, you're using scenario (1), one connector per-topic, which should come to at most 90 * 100 = 9000 connectors per Connect cluster. That is certainly too many to fit on a single machine, and certainly needs a cluster to distribute the work. In this scenario, Connect should be able to distribute approximately 9000/M connectors and 9000/M tasks to each of the M workers in a distributed cluster, barring any other practical limits/timeouts that i'm not aware of, so check for ERROR messages. > We tried to change the 'topics' property in the configurations using the > 'taskConfigs(int maxTasks)' method, but Kafka Connect ignores this property > when it is returned by 'taskConfigs(int maxTasks)'. The reason it does this is because the `topics` property is passed to the consumers to have them subscribe to the input topics, and the Consumer/Connect processing model has this subscription be the same for all consumers. This doesn't mean that every consumer is consuming every topic, however. Having a uniform subscription across all of the consumers in a group tells the consumers to assign the work among themselves, assigning the topic-partitions to each of the consumers according to the configured assignor. As an example, say your connector config had `topics=a,b`, and these two topics had 2 partitions, and tasks.max=2. The `topics` configs for both task-0 and task-1 would both be `a,b`, but the 4 partitions could be distributed like this by the consumer partition assignor: task-0: a-0, b-0 task-1: a-1, b-1 Or any permutation. This is where the partitioner I mentioned is important; The RangeAssignor can generate some pretty unbalanced assignments: [https://kafka.apache.org/37/javadoc/org/apache/kafka/clients/consumer/RangeAssignor.html] If you choose a different assignor (RoundRobin, Sticky, etc), then you can switch to scenario (2), with one connector per client, and some tasks.max around 10. This would give you ~90 connectors with 900 tasks, each working on 10 topics. Tou can tune tasks.max up and down if you need more throughput or want less consumer/task overhead. > Add Support for Topic-Level Partitioning in Kafka Connect > --------------------------------------------------------- > > Key: KAFKA-15841 > URL: https://issues.apache.org/jira/browse/KAFKA-15841 > Project: Kafka > Issue Type: Improvement > Components: connect > Reporter: Henrique Mota > Priority: Trivial > Attachments: image-2024-02-19-13-48-55-875.png > > > In our organization, we utilize JDBC sink connectors to consume data from > various topics, where each topic is dedicated to a specific tenant with a > single partition. Recently, we developed a custom sink based on the standard > JDBC sink, enabling us to pause consumption of a topic when encountering > problematic records. > However, we face limitations within Kafka Connect, as it doesn't allow for > appropriate partitioning of topics among workers. We attempted a workaround > by breaking down the topics list within the 'topics' parameter. > Unfortunately, Kafka Connect overrides this parameter after invoking the > {{taskConfigs(int maxTasks)}} method from the > {{org.apache.kafka.connect.connector.Connector}} class. > We request the addition of support in Kafka Connect to enable the > partitioning of topics among workers without requiring a fork. This > enhancement would facilitate better load distribution and allow for more > flexible configurations, particularly in scenarios where topics are dedicated > to different tenants. -- This message was sent by Atlassian Jira (v8.20.10#820010)