Re: Spark Streaming-- for each new file in HDFS

2016-09-16 Thread Steve Loughran
On 16 Sep 2016, at 01:03, Peyman Mohajerian > wrote: You can listen to files in a specific directory using: Take a look at: http://spark.apache.org/docs/latest/streaming-programming-guide.html streamingContext.fileStream yes, this works here's

Re: Spark Streaming-- for each new file in HDFS

2016-09-15 Thread Peyman Mohajerian
You can listen to files in a specific directory using: Take a look at: http://spark.apache.org/docs/latest/streaming-programming-guide.html streamingContext.fileStream On Thu, Sep 15, 2016 at 10:31 AM, Jörn Franke wrote: > Hi, > I recommend that the third party

Re: Spark Streaming-- for each new file in HDFS

2016-09-15 Thread Jörn Franke
Hi, I recommend that the third party application puts an empty file with the same filename as the original file, but the extension ".uploaded". This is an indicator that the file has been fully (!) written to the fs. Otherwise you risk only reading parts of the file. Then, you can have a file

Spark Streaming-- for each new file in HDFS

2016-09-15 Thread Kappaganthu, Sivaram (ES)
Hello, I am a newbie to spark and I have below requirement. Problem statement : A third party application is dumping files continuously in a server. Typically the count of files is 100 files per hour and each file is of size less than 50MB. My application has to process those files. Here