Re: Spark SQL, dataframe join questions.

2017-03-29 Thread vaquar khan
will shuffle, and following join COULD cause another shuffle. >> So I am not sure if it is a smart way. >> >> Yong >> >> -- >> *From:* shyla deshpande <deshpandesh...@gmail.com> >> *Sent:* Wednesday, March 29, 2017 12:33 PM

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread Vidya Sujeet
it is a smart way. > > Yong > > -- > *From:* shyla deshpande <deshpandesh...@gmail.com> > *Sent:* Wednesday, March 29, 2017 12:33 PM > *To:* user > *Subject:* Re: Spark SQL, dataframe join questions. > > > > On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande <deshpa

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread Yong Zhang
join COULD cause another shuffle. So I am not sure if it is a smart way. Yong From: shyla deshpande <deshpandesh...@gmail.com> Sent: Wednesday, March 29, 2017 12:33 PM To: user Subject: Re: Spark SQL, dataframe join questions. On Tue, Mar 28, 2017 at 2

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread shyla deshpande
On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande wrote: > Following are my questions. Thank you. > > 1. When joining dataframes is it a good idea to repartition on the key column > that is used in the join or > the optimizer is too smart so forget it. > > 2. In RDD