Try running EXPLAIN on both version of the query. Likely when you cache the subquery we know that its going to be small so use a broadcast join instead of a shuffling the data.
On Thu, Mar 17, 2016 at 5:53 PM, Younes Naguib < younes.nag...@tritondigital.com> wrote: > Hi all, > > > > I’m running a query that looks like the following: > > Select col1, count(1) > > From (Select col2, count(1) from tab2 group by col2) > > Inner join tab1 on (col1=col2) > > Group by col1 > > > > This creates a very large shuffle, 10 times the data size, as if the > subquery was executed for each row. > > Anything can be done to tune to help tune this? > > When the subquery in persisted, it runs much faster, and the shuffle is 50 > times smaller! > > > > *Thanks,* > > *Younes* >