Prashant Sharma created SPARK-31371: ---------------------------------------
Summary: FileStreamSource: Decide seen files on the checksum, instead of filename. Key: SPARK-31371 URL: https://issues.apache.org/jira/browse/SPARK-31371 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 2.4.5, 3.0.0 Reporter: Prashant Sharma At the moment structured streaming's file source, ignores updates to the same file, it has processed earlier. However, for reasons beyond our control, a software might update the same file with new data. A case in point can be rolling logs, where the latest log file is always e.g. log.txt and the rolled logs could be log-1.txt etc... So by supporting this, it may not actually be a special casing but supporting a genuine use case. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org