ming li created FLINK-31008:
-------------------------------

             Summary: [Flink][Table Store] The Split allocation of the same 
bucket in ContinuousFileSplitEnumerator may be out of order
                 Key: FLINK-31008
                 URL: https://issues.apache.org/jira/browse/FLINK-31008
             Project: Flink
          Issue Type: Bug
          Components: Table Store
            Reporter: ming li


There are two places in {{ContinuousFileSplitEnumerator}} that add 
{{FileStoreSourceSplit}} to {{{}bucketSplits{}}}: {{addSplitsBack}} and 
{{{}processDiscoveredSplits{}}}. {{processDiscoveredSplits}} will continuously 
check for new splits and add them to the queue.  At this time, the order of the 
splits is in order.
{code:java}
private void addSplits(Collection<FileStoreSourceSplit> splits) {
    splits.forEach(this::addSplit);
}

private void addSplit(FileStoreSourceSplit split) {
    bucketSplits
            .computeIfAbsent(((DataSplit) split.split()).bucket(), i -> new 
LinkedList<>())
            .add(split);
}{code}
However, when the task failover, the splits that have been allocated before 
will be returned. At this time, these returned splits are also added to the end 
of the queue, which leads to disorder in the allocation of splits.

 

I think these returned splits should be added to the head of the queue to 
ensure the order of allocation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to