[ 
https://issues.apache.org/jira/browse/SPARK-30281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30281.
--------------------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

Issue resolved by pull request 26920
[https://github.com/apache/spark/pull/26920]

> 'archive' option in FileStreamSource misses to consider partitioned and 
> recursive option
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-30281
>                 URL: https://issues.apache.org/jira/browse/SPARK-30281
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 3.0.0
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Cleanup option for FileStreamSource is introduced in SPARK-20568.
> To simplify the condition of verifying archive path, it took the fact that 
> FileStreamSource reads the files where these files meet one of conditions: 1) 
> parent directory matches the source pattern 2) the file itself matches the 
> source pattern.
> We found there're other cases during post-hoc review which invalidate above 
> fact: partitioned, and recursive option. With these options, FileStreamSource 
> can read the arbitrary files in subdirectories which match the source 
> pattern, so simply checking the depth of archive path doesn't work.
> We need to restore the path check logic, though it would be not easy to 
> explain to end users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to