Re: how to get file name of record being reading in spark

Ajay Chander Tue, 31 May 2016 10:57:00 -0700

Hi Vikash,

These are my thoughts, read the input directory using wholeTextFiles()
which would give a paired RDD with key as file name and value as file
content. Then you can apply a map function to read each line and append key
to the content.


Thank you,
Aj

On Tuesday, May 31, 2016, Vikash Kumar <vikashsp...@gmail.com> wrote:

> I have a requirement in which I need to read the input files from a
> directory and append the file name in each record while output.
>
> e.g. I have directory /input/files/ which have folllowing files:
> ABC_input_0528.txt
> ABC_input_0531.txt
>
> suppose input file ABC_input_0528.txt contains
> 111,abc,234
> 222,xyz,456
>
> suppose input file ABC_input_0531.txt contains
> 100,abc,299
> 200,xyz,499
>
> and I need to create one final output with file name in each record using
> dataframes
> my output file should looks like this:
> 111,abc,234,ABC_input_0528.txt
> 222,xyz,456,ABC_input_0528.txt
> 100,abc,299,ABC_input_0531.txt
> 200,xyz,499,ABC_input_0531.txt
>
> I am trying to use this inputFileName function but it is showing blank.
>
> https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/functions.html#inputFileName()
>
> Can anybody help me?
>
>

Re: how to get file name of record being reading in spark

Reply via email to