[ 
https://issues.apache.org/jira/browse/SPARK-19524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861869#comment-15861869
 ] 

Egor Pahomov commented on SPARK-19524:
--------------------------------------

[~sowen], probably yes. I don't know. "Should process only new files and ignore 
existing files in the directory" if you really think about it, than I agree 
than setting this field to false does not mean to process old files. IMHO, 
everything around this field seems to be poorly documented or architectured. 
Since there is no documentation about spark.streaming.minRememberDuration in 
http://spark.apache.org/docs/2.0.2/configuration.html#spark-streaming I do not 
feel very comfortable changing it. More than that, it would be strange to 
change it to process old files, when purpose of this field very different. And 
nevertheless I was given an API with newFilesOnly, about which I made false 
assumption, but not totally unreasonable, based on all accessible 
documentation. I was wrong, but it still feels like a trap, I walked into, 
which can easily not be there. 

> newFilesOnly does not work according to docs. 
> ----------------------------------------------
>
>                 Key: SPARK-19524
>                 URL: https://issues.apache.org/jira/browse/SPARK-19524
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.0.2
>            Reporter: Egor Pahomov
>
> Docs says:
> newFilesOnly
> Should process only new files and ignore existing files in the directory
> It's not working. 
> http://stackoverflow.com/questions/29852249/how-spark-streaming-identifies-new-files
>  says, that it shouldn't work as expected. 
> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
>  not clear at all in terms, what code tries to do



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to