orums / support
> line instead of the Apache group.
>
> On Thu, Jul 27, 2017 at 10:54 AM, Vikash Kumar
> <vikash.ku...@oneconvergence.com> wrote:
> > I have installed spark2 parcel through cloudera CDH 12.0. I see some
> issue
> > there. Look like it didn't got configure
I have installed spark2 parcel through cloudera CDH 12.0. I see some issue
there. Look like it didn't got configured properly.
$ spark2-shell
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/hadoop/fs/FSDataInputStream
at
I need to spilt RDD [keys, Iterable[Value]] to save each key into
different file.
e.g I have records like: customerId, name, age, sex
111,abc,34,M
122, xyz,32,F
111,def,31,F
122.trp,30,F
133,jkl,35,M
I need to write 3 different files based on customerId
file1:
111,abc,34,M
111,def,31,F
file2:
How I can get the file name of each record being reading?
suppose input file ABC_input_0528.txt contains
111,abc,234
222,xyz,456
suppose input file ABC_input_0531.txt contains
100,abc,299
200,xyz,499
and I need to create one final output with file name in each record using
dataframes
my output
Can anybody suggest different solution using inputFileName or
input_file_name
On Tue, May 31, 2016 at 11:43 PM, Vikash Kumar <vikashsp...@gmail.com>
wrote:
> thanks Ajay but I have this below code to generate dataframes, So I wanted
> to change in df only to achieve this. I thought i
file(s)...")
*val df: DataFrame = readTextFile(sqlContext)*
On Tue, May 31, 2016 at 11:26 PM, Ajay Chander <itsche...@gmail.com> wrote:
> Hi Vikash,
>
> These are my thoughts, read the input directory using wholeTextFiles()
> which would give a paired RDD with key as
I have a requirement in which I need to read the input files from a
directory and append the file name in each record while output.
e.g. I have directory /input/files/ which have folllowing files:
ABC_input_0528.txt
ABC_input_0531.txt
suppose input file ABC_input_0528.txt contains
111,abc,234
Can we implement nested for/while loop in spark? I have to convert some SQL
procedure code into Spark. And it has multiple loops and processing and I
want to implement this in spark. How to implement this.
1. open cursor and fetch for personType
2. open cursor and fetch for personGroup