[ 
https://issues.apache.org/jira/browse/KAFKA-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022251#comment-17022251
 ] 

Randall Hauch commented on KAFKA-9468:
--------------------------------------

Thanks, [~EeveeB]. I think it makes sense that the Connect worker should fail 
upon startup if the # of partitions on the config topic is not 1, as this can 
lead to serious problems.

[~ewencp], do you think this requires a KIP? Technically it is changing 
behavior, but distributed Connect is not really functional if the # of config 
topic partitions is not 1 and the fact we're not already checking this is 
probably a bug that can be fixed without a KIP. WDYT?

> config.storage.topic partition count issue is hard to debug
> -----------------------------------------------------------
>
>                 Key: KAFKA-9468
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9468
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>    Affects Versions: 1.0.2, 1.1.1, 2.0.1, 2.1.1, 2.2.2, 2.4.0, 2.3.1
>            Reporter: Evelyn Bayes
>            Priority: Minor
>
> When you run connect distributed with 2 or more workers and 
> config.storage.topic has more then 1 partition, you can end up with one of 
> the workers rebalancing endlessly:
> [2020-01-13 12:53:23,535] INFO [Worker clientId=connect-1, 
> groupId=connect-cluster] Current config state offset 37 is behind group 
> assignment 63, reading to end of config log 
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
>  [2020-01-13 12:53:23,584] INFO [Worker clientId=connect-1, 
> groupId=connect-cluster] Finished reading to end of log and updated config 
> snapshot, new config log offset: 37 
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
>  [2020-01-13 12:53:23,584] INFO [Worker clientId=connect-1, 
> groupId=connect-cluster] Current config state offset 37 does not match group 
> assignment 63. Forcing rebalance. 
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
>  
> In case any person viewing this doesn't know you are only ever meant to 
> create this topic with one partition.
>  
> *Suggested Solution*
> Make the connect worker check the partition count when it starts and if 
> partition count is > 1 Kafka Connect stops and logs the reason why.
> I think this is reasonable as it would stop users just starting out from 
> building it incorrectly and would be easy to fix early. For those upgrading 
> this would easily be caught in a PRE-PROD environment. And even if they 
> upgraded directly in PROD you would only be impacted if upgraded all connect 
> workers at the same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to