Re: what is the optimized way to combine multiple dataframes into one dataframe ?

2016-11-16 Thread Deepak Sharma
Can you try caching the individual dataframes and then union them? It may save you time. Thanks Deepak On Wed, Nov 16, 2016 at 12:35 PM, Devi P.V wrote: > Hi all, > > I have 4 data frames with three columns, > > client_id,product_id,interest > > I want to combine these 4

RE: what is the optimized way to combine multiple dataframes into one dataframe ?

2016-11-15 Thread Shreya Agarwal
If you are reading all these datasets from files in persistent storage, functions like sc.textFile can take folders/patterns as input and read all of the files matching into the same RDD. Then you can convert it to a dataframe. When you say it is time consuming with union, how are you measuring

what is the optimized way to combine multiple dataframes into one dataframe ?

2016-11-15 Thread Devi P.V
Hi all, I have 4 data frames with three columns, client_id,product_id,interest I want to combine these 4 dataframes into one dataframe.I used union like following df1.union(df2).union(df3).union(df4) But it is time consuming for bigdata.what is the optimized way for doing this using spark 2.0