Prashant Sharma created SPARK-31371:
---------------------------------------

             Summary: FileStreamSource: Decide seen files on the checksum, 
instead of filename.
                 Key: SPARK-31371
                 URL: https://issues.apache.org/jira/browse/SPARK-31371
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 2.4.5, 3.0.0
            Reporter: Prashant Sharma


At the moment structured streaming's file source, ignores updates to the same 
file, it has processed earlier. However, for reasons beyond our control, a 
software might update the same file with new data. A case in point can be 
rolling logs, where the latest log file is always e.g. log.txt and the rolled 
logs could be log-1.txt etc... 
So by supporting this, it may not actually be a special casing but supporting a 
genuine use case. 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to