[ https://issues.apache.org/jira/browse/FLINK-26726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509555#comment-17509555 ]
zoucao commented on FLINK-26726: -------------------------------- Gentle ping [~lzljs3620320], could you help me to confirm it? The exception message exists in the attachment, and plz correct me in time if i missed something. > Remove the unregistered task from readersAwaitingSplit > ------------------------------------------------------- > > Key: FLINK-26726 > URL: https://issues.apache.org/jira/browse/FLINK-26726 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Ecosystem > Reporter: zoucao > Priority: Major > Attachments: stack.txt > > > Recently, we faced a problem caused by the unregistered task when using the > hive table as a source to do streaming reading. > I think the problem is that we do not remove the unregistered task from > `readersAwaitingSplit` in `ContinuousHiveSplitEnumerator` and > `ContinuousFileSplitEnumerator`. > Assuming that we have two tasks 0 and 1, they all exist in > `readersAwaitingSplit`, if there does not exist any new file in the path for > a long time. Then, a new split is generated, and it is assigned to task-1. > Unfortunately, task-1 can not consume the split successfully, and the > exception will be thrown and cause all tasks to restart. The failover will > not affect the `readersAwaitingSplit`, but it will clear the > `SourceCoordinatorContext#registeredReaders`. > After restarting, task-0 exists in `readersAwaitingSplit` but not in > `registeredReaders`. if task-1 register first and send the request to get > split, the SplitEnumerator will assign splits for both task-1 and task-0, but > task-0 has not been registered. > The stack exists in the attachment. -- This message was sent by Atlassian Jira (v8.20.1#820001)