[ 
https://issues.apache.org/jira/browse/KAFKA-15841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825454#comment-17825454
 ] 

Greg Harris commented on KAFKA-15841:
-------------------------------------

[~henriquemota] Okay I think i understand better what you're trying to achieve.

> ... one topic per table...
> We have a JDBC Sink for each table.

Okay, you're using scenario (1), one connector per-topic, which should come to 
at most 90 * 100 = 9000 connectors per Connect cluster. That is certainly too 
many to fit on a single machine, and certainly needs a cluster to distribute 
the work.

In this scenario, Connect should be able to distribute approximately 9000/M 
connectors and 9000/M tasks to each of the M workers in a distributed cluster, 
barring any other practical limits/timeouts that i'm not aware of, so check for 
ERROR messages.

> We tried to change the 'topics' property in the configurations using the 
> 'taskConfigs(int maxTasks)' method, but Kafka Connect ignores this property 
> when it is returned by 'taskConfigs(int maxTasks)'.

The reason it does this is because the `topics` property is passed to the 
consumers to have them subscribe to the input topics, and the Consumer/Connect 
processing model has this subscription be the same for all consumers.
This doesn't mean that every consumer is consuming every topic, however. Having 
a uniform subscription across all of the consumers in a group tells the 
consumers to assign the work among themselves, assigning the topic-partitions 
to each of the consumers according to the configured assignor.

As an example, say your connector config had `topics=a,b`, and these two topics 
had 2 partitions, and tasks.max=2.

The `topics` configs for both task-0 and task-1 would both be `a,b`, but the 4 
partitions could be distributed like this by the consumer partition assignor:

task-0: a-0, b-0

task-1: a-1, b-1

Or any permutation. This is where the partitioner I mentioned is important; The 
RangeAssignor can generate some pretty unbalanced assignments: 
[https://kafka.apache.org/37/javadoc/org/apache/kafka/clients/consumer/RangeAssignor.html]

If you choose a different assignor (RoundRobin, Sticky, etc), then you can 
switch to scenario (2), with one connector per client, and some tasks.max 
around 10. This would give you ~90 connectors with 900 tasks, each working on 
10 topics.

Tou can tune tasks.max up and down if you need more throughput or want less 
consumer/task overhead.

> Add Support for Topic-Level Partitioning in Kafka Connect
> ---------------------------------------------------------
>
>                 Key: KAFKA-15841
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15841
>             Project: Kafka
>          Issue Type: Improvement
>          Components: connect
>            Reporter: Henrique Mota
>            Priority: Trivial
>         Attachments: image-2024-02-19-13-48-55-875.png
>
>
> In our organization, we utilize JDBC sink connectors to consume data from 
> various topics, where each topic is dedicated to a specific tenant with a 
> single partition. Recently, we developed a custom sink based on the standard 
> JDBC sink, enabling us to pause consumption of a topic when encountering 
> problematic records.
> However, we face limitations within Kafka Connect, as it doesn't allow for 
> appropriate partitioning of topics among workers. We attempted a workaround 
> by breaking down the topics list within the 'topics' parameter. 
> Unfortunately, Kafka Connect overrides this parameter after invoking the 
> {{taskConfigs(int maxTasks)}} method from the 
> {{org.apache.kafka.connect.connector.Connector}} class.
> We request the addition of support in Kafka Connect to enable the 
> partitioning of topics among workers without requiring a fork. This 
> enhancement would facilitate better load distribution and allow for more 
> flexible configurations, particularly in scenarios where topics are dedicated 
> to different tenants.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to