Re: get and append file name in record being reading

2016-06-02 Thread Sun Rui
You can use RDD.wholeTextFiles().

For example, suppose all your files are under /tmp/ABC_input/,

val rdd  = sc.wholeTextFiles("file:///tmp/ABC_input”)
val rdd1 = rdd.flatMap { case (path, content) => 
  val fileName = new java.io.File(path).getName
  content.split("\n").map { line => (line, fileName) }
}
val df = sqlContext.createDataFrame(rdd1).toDF("line", "file")
> On Jun 2, 2016, at 03:13, Vikash Kumar  wrote:
> 
> 100,abc,299
> 200,xyz,499



get and append file name in record being reading

2016-06-01 Thread Vikash Kumar
How I can get the file name of each record being reading?

suppose input file ABC_input_0528.txt contains
111,abc,234
222,xyz,456

suppose input file ABC_input_0531.txt contains
100,abc,299
200,xyz,499

and I need to create one final output with file name in each record using
dataframes
my output file should looks like this:
111,abc,234,ABC_input_0528.txt
222,xyz,456,ABC_input_0528.txt
100,abc,299,ABC_input_0531.txt
200,xyz,499,ABC_input_0531.txt

I need some working code.