Re: Subquery performance

2016-03-20 Thread Michael Armbrust
t? > > > > y > > > > *From:* Michael Armbrust [mailto:mich...@databricks.com] > *Sent:* March-17-16 8:59 PM > *To:* Younes Naguib > *Cc:* user@spark.apache.org > *Subject:* Re: Subquery performance > > > > Try running EXPLAIN on both version of the query.

Re: Subquery performance

2016-03-19 Thread Michael Armbrust
Try running EXPLAIN on both version of the query. Likely when you cache the subquery we know that its going to be small so use a broadcast join instead of a shuffling the data. On Thu, Mar 17, 2016 at 5:53 PM, Younes Naguib < younes.nag...@tritondigital.com> wrote: > Hi all, > > > > I’m running

RE: Subquery performance

2016-03-19 Thread Younes Naguib
Anyways to cache the subquery or force a broadcast join without persisting it? y From: Michael Armbrust [mailto:mich...@databricks.com] Sent: March-17-16 8:59 PM To: Younes Naguib Cc: user@spark.apache.org Subject: Re: Subquery performance Try running EXPLAIN on both version of the query