yansuopeng created FLINK-34995:
----------------------------------

             Summary: flink kafka connector source stuck when partition leader 
invalid
                 Key: FLINK-34995
                 URL: https://issues.apache.org/jira/browse/FLINK-34995
             Project: Flink
          Issue Type: Bug
          Components: Connectors / Kafka
    Affects Versions: 1.18.1, 1.19.0, 1.17.0
            Reporter: yansuopeng


when partition leader invalid(leader=-1),  the flink streaming job using 
KafkaSource can't restart or start a new instance with a new groupid,  it will 
stuck and got following exception:

"{*}org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired 
before the position for partition aaa-1 could be determined{*}"

when leader=-1,  kafka api like KafkaConsumer.position() will block until 
either the position could be determined or an unrecoverable error is 
encountered 

infact,  leader=-1 not easy to avoid,  even replica=3, three disk offline 
together will trigger the problem, especially when the cluster size is 
relatively large.    it rely on kafka administrator to fix in time,  but it 
take risk when in kafka cluster peak period.

 
 
 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to