I guess it goes through that 500k files
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala#L193for
the first time and then use a filter from next time.
Thanks
Best Regards
On Fri, Jul 31, 2015 at 4:39 AM, Tathagata Das
Is this a known bottle neck for Spark Streaming textFileStream? Does it
need to list all the current files in a directory before he gets the new
files? Say I have 500k files in a directory, does it list them all in order
to get the new files?
For the first time it needs to list them. AFter that the list should be
cached by the file stream implementation (as far as I remember).
On Thu, Jul 30, 2015 at 3:55 PM, Brandon White bwwintheho...@gmail.com
wrote:
Is this a known bottle neck for Spark Streaming textFileStream? Does it
need