[jira] [Commented] (KAFKA-15575) Prevent Connectors from exceeding tasks.max configuration

2023-10-11 Thread Mickael Maison (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773988#comment-17773988
 ] 

Mickael Maison commented on KAFKA-15575:


Thanks for opening this item. Regardless of KIP-987, I think this should be 
fixed. I think as you said in the thread on the mailing list, I can't think of 
a case where tasks.max was not correctly used in taskConfigs() and it was not a 
bug.

> Prevent Connectors from exceeding tasks.max configuration
> -
>
> Key: KAFKA-15575
> URL: https://issues.apache.org/jira/browse/KAFKA-15575
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Reporter: Greg Harris
>Priority: Minor
>
> The Connector::taskConfigs(int maxTasks) function is used by Connectors to 
> enumerate tasks configurations. This takes an argument which comes from the 
> tasks.max connector config. This is the Javadoc for that method:
> {noformat}
> /**
>  * Returns a set of configurations for Tasks based on the current 
> configuration,
>  * producing at most {@code maxTasks} configurations.
>  *
>  * @param maxTasks maximum number of configurations to generate
>  * @return configurations for Tasks
>  */
> public abstract List> taskConfigs(int maxTasks);
> {noformat}
> This includes the constraint that the number of tasks is at most maxTasks, 
> but this constraint is not enforced by the framework.
>  
> We should begin enforcing this constraint by dropping configs that exceed the 
> limit, and logging a warning. For sink connectors this should harmlessly 
> rebalance the consumer subscriptions onto the remaining tasks. For source 
> connectors that distribute their work via task configs, this may result in an 
> interruption in data transfer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15575) Prevent Connectors from exceeding tasks.max configuration

2023-10-18 Thread Chris Egerton (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776849#comment-17776849
 ] 

Chris Egerton commented on KAFKA-15575:
---

I hate to say it, but I think this may need a KIP. I can see the motivation for 
silently dropping extra task configs, but I'd be more in favor of failing the 
connector by default, with an optional override.

 

While it's true that most sink connectors will be able to transparently adjust 
to a reduced set of task configs, it's not guaranteed that all will: some sink 
connectors may assign special responsibilities to the Nth task, like 
periodically triggering flushes from a shared buffer to the external system. 
And silently halting the flow of data for a subset of source tasks with no 
notice besides a warning message in logs seems likely to lead to headaches for 
the poor on-call engineer that sees the flow of data suddenly stop but no other 
obvious indications of failure.

 

If we fail a connector that attempts to generate more than the permitted 
maximum number of tasks, we can immediately surface to the user that something 
is wrong, and suggest in the failure message a remediation step of increasing 
the value for the {{tasks.max}} property. Of course, this is risky if the 
connector is programmed to always generate greater than the permitted maximum 
number of tasks, but in that case, we can allow the user to disable enforcement 
of the {{tasks.max}} property in order to run the full set of tasks.

> Prevent Connectors from exceeding tasks.max configuration
> -
>
> Key: KAFKA-15575
> URL: https://issues.apache.org/jira/browse/KAFKA-15575
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Reporter: Greg Harris
>Priority: Minor
>
> The Connector::taskConfigs(int maxTasks) function is used by Connectors to 
> enumerate tasks configurations. This takes an argument which comes from the 
> tasks.max connector config. This is the Javadoc for that method:
> {noformat}
> /**
>  * Returns a set of configurations for Tasks based on the current 
> configuration,
>  * producing at most {@code maxTasks} configurations.
>  *
>  * @param maxTasks maximum number of configurations to generate
>  * @return configurations for Tasks
>  */
> public abstract List> taskConfigs(int maxTasks);
> {noformat}
> This includes the constraint that the number of tasks is at most maxTasks, 
> but this constraint is not enforced by the framework.
>  
> We should begin enforcing this constraint by dropping configs that exceed the 
> limit, and logging a warning. For sink connectors this should harmlessly 
> rebalance the consumer subscriptions onto the remaining tasks. For source 
> connectors that distribute their work via task configs, this may result in an 
> interruption in data transfer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15575) Prevent Connectors from exceeding tasks.max configuration

2023-11-10 Thread Chris Egerton (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785147#comment-17785147
 ] 

Chris Egerton commented on KAFKA-15575:
---

I've published 
[KIP-1004|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1004%3A+Enforce+tasks.max+property+in+Kafka+Connect]
 to try to address this gap.

> Prevent Connectors from exceeding tasks.max configuration
> -
>
> Key: KAFKA-15575
> URL: https://issues.apache.org/jira/browse/KAFKA-15575
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Reporter: Greg Harris
>Assignee: Chris Egerton
>Priority: Minor
>  Labels: kip
>
> The Connector::taskConfigs(int maxTasks) function is used by Connectors to 
> enumerate tasks configurations. This takes an argument which comes from the 
> tasks.max connector config. This is the Javadoc for that method:
> {noformat}
> /**
>  * Returns a set of configurations for Tasks based on the current 
> configuration,
>  * producing at most {@code maxTasks} configurations.
>  *
>  * @param maxTasks maximum number of configurations to generate
>  * @return configurations for Tasks
>  */
> public abstract List> taskConfigs(int maxTasks);
> {noformat}
> This includes the constraint that the number of tasks is at most maxTasks, 
> but this constraint is not enforced by the framework.
>  
> To enforce this constraint, we could begin dropping configs that exceed the 
> limit, and log a warning. For sink connectors this should harmlessly 
> rebalance the consumer subscriptions onto the remaining tasks. For source 
> connectors that distribute their work via task configs, this may result in an 
> interruption in data transfer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)