StreamingContext.textFileStream(...)

muthu Wed, 07 Dec 2016 16:33:54 -0800

Hello there,

I am trying to find a way to get the file-name of the current file being
processed from the monitored directory for HDFS...


Meaning, 

Let's say...
val lines = ssc.textFileStream("my_hdfs_location")
lines.map { (row: String) => ... } //No access to file-name here
Also, let's say I have some RDD created in time
t1    t2    t3    t3...
Let's say in every unit of time, I would like to perform foldLeft() like
operation these DStreams as they arrive. Meaning, let's say before t1, I
would have an Inital/Zero value (or load value from some persistence store)
and after every interval t1, t2, ... I would like this value to update into
the accumulator of the fold (as commonly seen in Scala collection's
foldLeft()).  
reduceByKeyAndWindow() allows me to do something similar, but I have to
perform reduce during every iteration. But with foldLeft() like semantics I
have to only combine the previously combined result with the result from the
current time tn.    

Please advice,
Muthu



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/StreamingContext-textFileStream-tp28183.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

StreamingContext.textFileStream(...)

Reply via email to