[ 
https://issues.apache.org/jira/browse/FLINK-34995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833850#comment-17833850
 ] 

Martijn Visser commented on FLINK-34995:
----------------------------------------

Why do you think this is a Flink bug, and not an issue on the Kafka side 
because you have no Kafka leader elected? 

> flink kafka connector source stuck when partition leader invalid
> ----------------------------------------------------------------
>
>                 Key: FLINK-34995
>                 URL: https://issues.apache.org/jira/browse/FLINK-34995
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Kafka
>    Affects Versions: 1.17.0, 1.19.0, 1.18.1
>            Reporter: yansuopeng
>            Priority: Major
>
> when partition leader invalid(leader=-1),  the flink streaming job using 
> KafkaSource can't restart or start a new instance with a new groupid,  it 
> will stuck and got following exception:
> "{*}org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms 
> expired before the position for partition aaa-1 could be determined{*}"
> when leader=-1,  kafka api like KafkaConsumer.position() will block until 
> either the position could be determined or an unrecoverable error is 
> encountered 
> infact,  leader=-1 not easy to avoid,  even replica=3, three disk offline 
> together will trigger the problem, especially when the cluster size is 
> relatively large.    it rely on kafka administrator to fix in time,  but it 
> take risk when in kafka cluster peak period.
> I have solve this problem, and want to create a PR. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to