Adam Binford created SPARK-48314: ------------------------------------ Summary: FileStreamSource shouldn't double cache files for availableNow Key: SPARK-48314 URL: https://issues.apache.org/jira/browse/SPARK-48314 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.1 Reporter: Adam Binford
FileStreamSource loads and saves all files at initialization for Trigger.AvailableNow. However files will also be cached in unreadFiles, which is a waste and causes issues identified in https://issues.apache.org/jira/browse/SPARK-44924 for streams that are reading more than 10k files per batch. We should always skip using the unreadFiles cache when using available now trigger, as there is no need for it anyway. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org