Re: sc.textFileGroupByPath(*/*.txt)

2014-06-01 Thread Anwar Rizal
I presume that you need to have access to the path of each file you are reading. I don't know whether there is a good way to do that for HDFS, I need to read the files myself, something like: def openWithPath(inputPath: String, sc:SparkContext) = { val fs= (new

Re: sc.textFileGroupByPath(*/*.txt)

2014-06-01 Thread Oleg Proudnikov
Anwar, Will try this as it might do exactly what I need. I will follow your pattern but use sc.textFile() for each file. I am now thinking that I could start with an RDD of file paths and map it into (path, content) pairs, provided I could read a file on the server. Thank you, Oleg On 1 June