Adam Binford created SPARK-48314:
------------------------------------

             Summary: FileStreamSource shouldn't double cache files for 
availableNow
                 Key: SPARK-48314
                 URL: https://issues.apache.org/jira/browse/SPARK-48314
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.5.1
            Reporter: Adam Binford


FileStreamSource loads and saves all files at initialization for 
Trigger.AvailableNow. However files will also be cached in unreadFiles, which 
is a waste and causes issues identified in 
https://issues.apache.org/jira/browse/SPARK-44924 for streams that are reading 
more than 10k files per batch. We should always skip using the unreadFiles 
cache when using available now trigger, as there is no need for it anyway.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to