[ https://issues.apache.org/jira/browse/SPARK-30281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marcelo Masiero Vanzin resolved SPARK-30281. -------------------------------------------- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26920 [https://github.com/apache/spark/pull/26920] > 'archive' option in FileStreamSource misses to consider partitioned and > recursive option > ---------------------------------------------------------------------------------------- > > Key: SPARK-30281 > URL: https://issues.apache.org/jira/browse/SPARK-30281 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 3.0.0 > Reporter: Jungtaek Lim > Assignee: Jungtaek Lim > Priority: Major > Fix For: 3.0.0 > > > Cleanup option for FileStreamSource is introduced in SPARK-20568. > To simplify the condition of verifying archive path, it took the fact that > FileStreamSource reads the files where these files meet one of conditions: 1) > parent directory matches the source pattern 2) the file itself matches the > source pattern. > We found there're other cases during post-hoc review which invalidate above > fact: partitioned, and recursive option. With these options, FileStreamSource > can read the arbitrary files in subdirectories which match the source > pattern, so simply checking the depth of archive path doesn't work. > We need to restore the path check logic, though it would be not easy to > explain to end users. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org