[ https://issues.apache.org/jira/browse/FLINK-31008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686917#comment-17686917 ]
Jingsong Lee commented on FLINK-31008: -------------------------------------- [~Ming Li] Thanks for reporting! Wow, you're right. This should be a blocker issue. Do you want to contribute this jira? > [Flink][Table Store] The Split allocation of the same bucket in > ContinuousFileSplitEnumerator may be out of order > ----------------------------------------------------------------------------------------------------------------- > > Key: FLINK-31008 > URL: https://issues.apache.org/jira/browse/FLINK-31008 > Project: Flink > Issue Type: Bug > Components: Table Store > Reporter: ming li > Priority: Major > > There are two places in {{ContinuousFileSplitEnumerator}} that add > {{FileStoreSourceSplit}} to {{{}bucketSplits{}}}: {{addSplitsBack}} and > {{{}processDiscoveredSplits{}}}. {{processDiscoveredSplits}} will > continuously check for new splits and add them to the queue. At this time, > the order of the splits is in order. > {code:java} > private void addSplits(Collection<FileStoreSourceSplit> splits) { > splits.forEach(this::addSplit); > } > private void addSplit(FileStoreSourceSplit split) { > bucketSplits > .computeIfAbsent(((DataSplit) split.split()).bucket(), i -> new > LinkedList<>()) > .add(split); > }{code} > However, when the task failover, the splits that have been allocated before > will be returned. At this time, these returned splits are also added to the > end of the queue, which leads to disorder in the allocation of splits. > > I think these returned splits should be added to the head of the queue to > ensure the order of allocation. -- This message was sent by Atlassian Jira (v8.20.10#820010)