Try running EXPLAIN on both version of the query.

Likely when you cache the subquery we know that its going to be small so
use a broadcast join instead of a shuffling the data.

On Thu, Mar 17, 2016 at 5:53 PM, Younes Naguib <
younes.nag...@tritondigital.com> wrote:

> Hi all,
>
>
>
> I’m running a query that looks like the following:
>
> Select col1, count(1)
>
> From (Select col2, count(1) from tab2 group by col2)
>
> Inner join tab1 on (col1=col2)
>
> Group by col1
>
>
>
> This creates a very large shuffle, 10 times the data size, as if the
> subquery was executed for each row.
>
> Anything can be done to tune to help tune this?
>
> When the subquery in persisted, it runs much faster, and the shuffle is 50
> times smaller!
>
>
>
> *Thanks,*
>
> *Younes*
>

Reply via email to