Hello there, I am trying to find a way to get the file-name of the current file being processed from the monitored directory for HDFS...
Meaning, Let's say... val lines = ssc.textFileStream("my_hdfs_location") lines.map { (row: String) => ... } //No access to file-name here Also, let's say I have some RDD created in time t1 t2 t3 t3... Let's say in every unit of time, I would like to perform foldLeft() like operation these DStreams as they arrive. Meaning, let's say before t1, I would have an Inital/Zero value (or load value from some persistence store) and after every interval t1, t2, ... I would like this value to update into the accumulator of the fold (as commonly seen in Scala collection's foldLeft()). reduceByKeyAndWindow() allows me to do something similar, but I have to perform reduce during every iteration. But with foldLeft() like semantics I have to only combine the previously combined result with the result from the current time tn. Please advice, Muthu -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/StreamingContext-textFileStream-tp28183.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org