If the size of each file is small, you may try |SparkContext.wholeTextFiles|. Otherwise you can try something like this:

|val  filenames:  Seq[String] = ...
val  combined:  RDD[(String,String)] = filenames.map { name =>
  sc.textFile(name).map(line => name -> line)
}.reduce(_ ++ _)
|

On 9/26/14 6:45 PM, Shekhar Bansal wrote:

Hi
In one of our usecase, filename contains timestamp and we have to append it in the record for aggregation.
How can I access filename in map function?

Thanks!

Reply via email to