[jira] [Commented] (HIVE-14257) CBO: Push Join through Groupby to trigger shuffle reductions

Gopal V (JIRA) Sun, 17 Jul 2016 13:35:36 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-14257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15381512#comment-15381512
 ]


Gopal V commented on HIVE-14257:
--------------------------------

Yes, that does help - the stats based PPD does rewrite the store_sales scan 
into a "Predicate: false" & then null-scan detection reduces it to 1 split + 0 
rows.

This is sort of a meaningless query. The Query59 d_month_seq case doesn't 
filter to "false" (btw, it is one of the queries for which store_sales CTE 
*should* not be merged between sub-queries, since they scan different 
partitions without overlap).

{code}
with wss as 
 (select d_week_seq,
        ss_store_sk,
        sum(case when (d_day_name='Sunday') then ss_sales_price else null end) 
sun_sales,
        sum(case when (d_day_name='Monday') then ss_sales_price else null end) 
mon_sales,
        sum(case when (d_day_name='Tuesday') then ss_sales_price else  null 
end) tue_sales,
        sum(case when (d_day_name='Wednesday') then ss_sales_price else null 
end) wed_sales,
        sum(case when (d_day_name='Thursday') then ss_sales_price else null 
end) thu_sales,
        sum(case when (d_day_name='Friday') then ss_sales_price else null end) 
fri_sales,
        sum(case when (d_day_name='Saturday') then ss_sales_price else null 
end) sat_sales
 from store_sales,date_dim
 where d_date_sk = ss_sold_date_sk
 group by d_week_seq,ss_store_sk
 )
  select  s_store_name1,s_store_id1,d_week_seq1
       ,sun_sales1/sun_sales2,mon_sales1/mon_sales2
       ,tue_sales1/tue_sales1,wed_sales1/wed_sales2,thu_sales1/thu_sales2
       ,fri_sales1/fri_sales2,sat_sales1/sat_sales2
 from
 (select s_store_name s_store_name1,wss.d_week_seq d_week_seq1
        ,s_store_id s_store_id1,sun_sales sun_sales1
        ,mon_sales mon_sales1,tue_sales tue_sales1
        ,wed_sales wed_sales1,thu_sales thu_sales1
        ,fri_sales fri_sales1,sat_sales sat_sales1
  from wss,store,date_dim d
  where d.d_week_seq = wss.d_week_seq and
        ss_store_sk = s_store_sk and 
        d_month_seq between 1185 and 1185 + 11) y,
 (select s_store_name s_store_name2,wss.d_week_seq d_week_seq2
        ,s_store_id s_store_id2,sun_sales sun_sales2
        ,mon_sales mon_sales2,tue_sales tue_sales2
        ,wed_sales wed_sales2,thu_sales thu_sales2
        ,fri_sales fri_sales2,sat_sales sat_sales2
  from wss,store,date_dim d
  where d.d_week_seq = wss.d_week_seq and
        ss_store_sk = s_store_sk and 
        d_month_seq between 1185+ 12 and 1185 + 23) x
 where s_store_id1=s_store_id2
   and d_week_seq1=d_week_seq2-52
 order by s_store_name1,s_store_id1,d_week_seq1
limit 100;
{code}

> CBO: Push Join through Groupby to trigger shuffle reductions
> ------------------------------------------------------------
>
>                 Key: HIVE-14257
>                 URL: https://issues.apache.org/jira/browse/HIVE-14257
>             Project: Hive
>          Issue Type: Improvement
>          Components: CBO
>            Reporter: Gopal V
>
> Similar to the optimizations in hive, already which push aggregates through a 
> join (hive.transpose.aggr.join=true).
> {code}
> select count(v) from (select d_year, count(ss_item_sk) as v from store_sales, 
> date_dim where ss_sold_date_sk=d_Date_sk group by d_year) w, date_dim d where 
> d.d_year = w.d_year and d_date_sk = 1;
> {code}
> currently produces an entire aggregate of all years before discarding all of 
> that (because obviously, there's no data for d_date_sk=1;
> This particular example is a simplified version of TPC-DS Query59's join 
> condition, which can have a reduction in scans by applying the d_month_seq 
> between 1185 and 1185 + 11 into the wss alias.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14257) CBO: Push Join through Groupby to trigger shuffle reductions

Reply via email to