Yaroslav Tkachenko created FLINK-31285:
------------------------------------------
Summary: FileSource should support reading files in order
Key: FLINK-31285
URL: https://issues.apache.org/jira/browse/FLINK-31285
Project: Flink
Issue Type: New Feature
Components: Connectors / FileSystem
Affects Versions: 1.18.0
Reporter: Yaroslav Tkachenko
Currently, Flink's *FileSource* uses *LocalityAwareSplitAssigner* as a default
*FileSplitAssigner* and it doesn't guarantee any order. In many scenarios
involving processing historical data, reading files in order can be a
requirement, especially when using event-time processing.
I believe a new FileSplitAssigner should be implemented that supports ordering.
FileSourceBuilder should be extended to allow choosing a different
FileSplitAssigner.
It's also clear that the files may not be read in _perfect_ order with
parallelism > 1. However, in some cases, using parallelism of 1 might be fine.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)