Dataframe Partitioning

2015-05-28 Thread Masf
Hi. I have 2 dataframe with 1 and 12 partitions respectively. When I do a inner join between these dataframes, the result contains 200 partitions. *Why?* df1.join(df2, df1(id) === df2(id), Inner) = returns 200 partitions Thanks!!! -- Regards. Miguel Ángel

Re: Dataframe Partitioning

2015-05-28 Thread Silvio Fiorito
That’s due to the config setting spark.sql.shuffle.partitions which defaults to 200 From: Masf Date: Thursday, May 28, 2015 at 10:02 AM To: user@spark.apache.orgmailto:user@spark.apache.org Subject: Dataframe Partitioning Hi. I have 2 dataframe with 1 and 12 partitions respectively. When I do