Union large number of DataFrames

julio . cesare Mon, 24 Jul 2017 01:36:03 -0700

Hi there !

Let's imagine I have a large number of very small dataframe with thesame schema ( a list of DataFrames : allDFs)

and I want to create one large dataset with this.


I have been trying this :
-> allDFs.reduce ( (a,b) => a.union(b) )

And after this one :
-> allDFs.reduce ( (a,b) => a.union(b).repartition(200) )
to prevent df with large number of partitions


Two questions :

1) Will the reduce operation be done in parallel in the previous code ?or may be should I replace my reduce by allDFs.par.reduce ?

2) Is there a better way to concatenate them ?


Thanks !
Julio

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Union large number of DataFrames

Reply via email to