[ https://issues.apache.org/jira/browse/SPARK-31371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077048#comment-17077048 ]
Prashant Sharma commented on SPARK-31371: ----------------------------------------- [~tdas] What do you think? > FileStreamSource: Decide seen files on the checksum, instead of filename. > ------------------------------------------------------------------------- > > Key: SPARK-31371 > URL: https://issues.apache.org/jira/browse/SPARK-31371 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 2.4.5, 3.0.0 > Reporter: Prashant Sharma > Priority: Major > > At the moment structured streaming's file source, ignores updates to the same > file, it has processed earlier. However, for reasons beyond our control, a > software might update the same file with new data. A case in point can be > rolling logs, where the latest log file is always e.g. log.txt and the rolled > logs could be log-1.txt etc... > So by supporting this, it may not actually be a special casing but supporting > a genuine use case. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org