Martijn Visser created FLINK-25672: -------------------------------------- Summary: FileSource enumerator remembers paths of all already processed files which can result in large state Key: FLINK-25672 URL: https://issues.apache.org/jira/browse/FLINK-25672 Project: Flink Issue Type: Improvement Components: Connectors / FileSystem Reporter: Martijn Visser
As mentioned in the Filesystem documentation, for Unbounded File Sources, the {{FileEnumerator}} currently remembers paths of all already processed files, which is a state that can in come cases grow rather large. We should look into possibilities to reduce this. We could look into adding a compressed form of tracking already processed files (for example by keeping modification timestamps lower boundaries). When fixed, this should also be reflected in the documentation, as mentioned in https://github.com/apache/flink/pull/18288#discussion_r785707311 -- This message was sent by Atlassian Jira (v8.20.1#820001)