Did you try:
val data = indexed_files.groupByKey
val *modified_data* = data.map { a =
var name = a._2.mkString(,)
(a._1, name)
}
*modified_data*.foreach { a =
var file = sc.textFile(a._2)
println(file.count)
}
Thanks
Best Regards
On Wed, Jul 22, 2015 at 2:18 AM, MorEru
I have a number of CSV files and need to combine them into a RDD by part of
their filenames.
For example, for the below files
$ ls
20140101_1.csv 20140101_3.csv 20140201_2.csv 20140301_1.csv
20140301_3.csv 20140101_2.csv 20140201_1.csv 20140201_3.csv
I need to combine files with names