Yaroslav Tkachenko created FLINK-31285:
------------------------------------------

             Summary: FileSource should support reading files in order
                 Key: FLINK-31285
                 URL: https://issues.apache.org/jira/browse/FLINK-31285
             Project: Flink
          Issue Type: New Feature
          Components: Connectors / FileSystem
    Affects Versions: 1.18.0
            Reporter: Yaroslav Tkachenko


Currently, Flink's *FileSource* uses *LocalityAwareSplitAssigner* as a default 
*FileSplitAssigner* and it doesn't guarantee any order. In many scenarios 
involving processing historical data, reading files in order can be a 
requirement, especially when using event-time processing. 

I believe a new FileSplitAssigner should be implemented that supports ordering. 
FileSourceBuilder should be extended to allow choosing a different 
FileSplitAssigner.

It's also clear that the files may not be read in _perfect_ order with 
parallelism > 1. However, in some cases, using parallelism of 1 might be fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to