Union of multiple RDDs

2016-06-21 Thread Apurva Nandan
Hello, I am trying to combine several small text files (each file is approx hundreds of MBs to 2-3 gigs) into one big parquet file. I am loading each one of them and trying to take a union, however this leads to enormous amounts of partitions, as union keeps on adding the partitions of the input

RDD generated from Dataframes

2016-04-21 Thread Apurva Nandan
Hello everyone, Generally speaking, I guess it's well known that dataframes are much faster than RDD when it comes to performance. My question is how do you go around when it comes to transforming a dataframe using map. I mean then the dataframe gets converted into RDD, hence now do you again