[ https://issues.apache.org/jira/browse/FLINK-34995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833850#comment-17833850 ]
Martijn Visser commented on FLINK-34995: ---------------------------------------- Why do you think this is a Flink bug, and not an issue on the Kafka side because you have no Kafka leader elected? > flink kafka connector source stuck when partition leader invalid > ---------------------------------------------------------------- > > Key: FLINK-34995 > URL: https://issues.apache.org/jira/browse/FLINK-34995 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka > Affects Versions: 1.17.0, 1.19.0, 1.18.1 > Reporter: yansuopeng > Priority: Major > > when partition leader invalid(leader=-1), the flink streaming job using > KafkaSource can't restart or start a new instance with a new groupid, it > will stuck and got following exception: > "{*}org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms > expired before the position for partition aaa-1 could be determined{*}" > when leader=-1, kafka api like KafkaConsumer.position() will block until > either the position could be determined or an unrecoverable error is > encountered > infact, leader=-1 not easy to avoid, even replica=3, three disk offline > together will trigger the problem, especially when the cluster size is > relatively large. it rely on kafka administrator to fix in time, but it > take risk when in kafka cluster peak period. > I have solve this problem, and want to create a PR. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)