[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities

2020-03-23 Thread Vineet Garg (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065259#comment-17065259
 ] 

Vineet Garg commented on HIVE-22262:


Of course! I don't know why I was thinking that count(*) could be computed over 
already grouped set of data

> Aggregate pushdown through join may generate additional rewriting 
> opportunities
> ---
>
> Key: HIVE-22262
> URL: https://issues.apache.org/jira/browse/HIVE-22262
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Assignee: Vineet Garg
>Priority: Major
> Attachments: eager-v2.sql
>
>
> In this case, there is a function used in the query and materialized view, 
> but the aggregate is not being pushed down.  Script is attached.
> Example query and materialized view:
>  create materialized view av1 stored as orc as select fk1, fk2, fk3, 
> to_date(fk4), sum(1) from fact group by 1, 2, 3, 4;
> explain cbo select pk1, dim2.fk4, sum(1), count(c1)
> from fact, dim2
> where to_date(fact.fk4) = dim2.fk4
> group by 1, 2
> order by 1, 2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities

2020-03-23 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065257#comment-17065257
 ] 

Jesus Camacho Rodriguez commented on HIVE-22262:


The count in the query is computing the number of rows in each group; without 
storing the count per group in the MV, we will not be able to compute the 
former (in fact, the rollup in the rewriting would be done with a sum).

> Aggregate pushdown through join may generate additional rewriting 
> opportunities
> ---
>
> Key: HIVE-22262
> URL: https://issues.apache.org/jira/browse/HIVE-22262
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Assignee: Vineet Garg
>Priority: Major
> Attachments: eager-v2.sql
>
>
> In this case, there is a function used in the query and materialized view, 
> but the aggregate is not being pushed down.  Script is attached.
> Example query and materialized view:
>  create materialized view av1 stored as orc as select fk1, fk2, fk3, 
> to_date(fk4), sum(1) from fact group by 1, 2, 3, 4;
> explain cbo select pk1, dim2.fk4, sum(1), count(c1)
> from fact, dim2
> where to_date(fact.fk4) = dim2.fk4
> group by 1, 2
> order by 1, 2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities

2020-03-23 Thread Vineet Garg (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065249#comment-17065249
 ] 

Vineet Garg commented on HIVE-22262:


[~jcamachorodriguez] My thinking was that while introducing aggregate on top of 
view we keep this count beside adding necessary aggregations like sum(fk3). 
Since this count doesn' have any aggregate column it should be logically 
equivalent.
What do you think?

> Aggregate pushdown through join may generate additional rewriting 
> opportunities
> ---
>
> Key: HIVE-22262
> URL: https://issues.apache.org/jira/browse/HIVE-22262
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Assignee: Vineet Garg
>Priority: Major
> Attachments: eager-v2.sql
>
>
> In this case, there is a function used in the query and materialized view, 
> but the aggregate is not being pushed down.  Script is attached.
> Example query and materialized view:
>  create materialized view av1 stored as orc as select fk1, fk2, fk3, 
> to_date(fk4), sum(1) from fact group by 1, 2, 3, 4;
> explain cbo select pk1, dim2.fk4, sum(1), count(c1)
> from fact, dim2
> where to_date(fact.fk4) = dim2.fk4
> group by 1, 2
> order by 1, 2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities

2020-03-23 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065232#comment-17065232
 ] 

Jesus Camacho Rodriguez commented on HIVE-22262:


{quote}
More specific repro
{quote}

Can you compute the {{count}} in the query if you do not store it in the MV?

> Aggregate pushdown through join may generate additional rewriting 
> opportunities
> ---
>
> Key: HIVE-22262
> URL: https://issues.apache.org/jira/browse/HIVE-22262
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Assignee: Vineet Garg
>Priority: Major
> Attachments: eager-v2.sql
>
>
> In this case, there is a function used in the query and materialized view, 
> but the aggregate is not being pushed down.  Script is attached.
> Example query and materialized view:
>  create materialized view av1 stored as orc as select fk1, fk2, fk3, 
> to_date(fk4), sum(1) from fact group by 1, 2, 3, 4;
> explain cbo select pk1, dim2.fk4, sum(1), count(c1)
> from fact, dim2
> where to_date(fact.fk4) = dim2.fk4
> group by 1, 2
> order by 1, 2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities

2020-03-23 Thread Vineet Garg (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065207#comment-17065207
 ] 

Vineet Garg commented on HIVE-22262:


Once we have improvement for the above, AggregateJoinTransposeRule should help 
us trigger more rewritings

> Aggregate pushdown through join may generate additional rewriting 
> opportunities
> ---
>
> Key: HIVE-22262
> URL: https://issues.apache.org/jira/browse/HIVE-22262
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Assignee: Vineet Garg
>Priority: Major
> Attachments: eager-v2.sql
>
>
> In this case, there is a function used in the query and materialized view, 
> but the aggregate is not being pushed down.  Script is attached.
> Example query and materialized view:
>  create materialized view av1 stored as orc as select fk1, fk2, fk3, 
> to_date(fk4), sum(1) from fact group by 1, 2, 3, 4;
> explain cbo select pk1, dim2.fk4, sum(1), count(c1)
> from fact, dim2
> where to_date(fact.fk4) = dim2.fk4
> group by 1, 2
> order by 1, 2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities

2020-03-23 Thread Vineet Garg (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065206#comment-17065206
 ] 

Vineet Garg commented on HIVE-22262:


Unfortunately rewriting still isn't triggered for above query. 
AggregateJoinTransposeRule does group by pushdown (along with eager count) and 
adds an extra count() on both side generating the following plan
{noformat}
HiveAggregate(group=[{3, 4}], agg#0=[sum($8)], agg#1=[$SUM0($9)])
  HiveProject(TO_DATE=[$0], $f1=[$1], count=[$2], pk1=[$3], fk4=[$4], 
CAST=[$5], count0=[$6], $f4=[$7], $f8=[*($1, $6)], $f9=[*($2, $7)])
HiveJoin(condition=[=($0, $5)], joinType=[inner], algorithm=[none], 
cost=[not available])
  HiveAggregate(group=[{1}], agg#0=[sum($0)], count=[count()])
HiveProject(subset=[rel#500:Subset#2.HIVE.[]], fk3=[$2], 
TO_DATE=[TO_DATE($3)])
  HiveFilter(subset=[rel#498:Subset#1.HIVE.[]], condition=[IS NOT 
NULL(TO_DATE($3))])
HiveTableScan(subset=[rel#496:Subset#0.HIVE.[]], table=[[default, 
fact]], table:alias=[fact])
  HiveAggregate(group=[{0, 1, 3}], count=[count()], agg#1=[count($2)])
HiveProject(subset=[rel#505:Subset#5.HIVE.[]], pk1=[$0], fk4=[$1], 
c1=[$2], CAST=[CAST($1):DATE])
  HiveFilter(subset=[rel#503:Subset#4.HIVE.[]], condition=[IS NOT 
NULL(CAST($1):DATE)])
HiveTableScan(subset=[rel#501:Subset#3.HIVE.[]], table=[[default, 
dim2]], table:alias=[dim2]) {noformat}
Currently MV rewriting isn't capable of rewriting if there is an aggregate on 
query which isn't in view

 

More specific repro

{code:sql}
create materialized view av3 stored as orc as select fk1, fk2, fk3, fk4, 
sum(fk3) from fact group by fk1,fk2,fk3,fk4;
explain cbo select sum(fk3), count(*) from fact group by fk4;
{code}

> Aggregate pushdown through join may generate additional rewriting 
> opportunities
> ---
>
> Key: HIVE-22262
> URL: https://issues.apache.org/jira/browse/HIVE-22262
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Assignee: Vineet Garg
>Priority: Major
> Attachments: eager-v2.sql
>
>
> In this case, there is a function used in the query and materialized view, 
> but the aggregate is not being pushed down.  Script is attached.
> Example query and materialized view:
>  create materialized view av1 stored as orc as select fk1, fk2, fk3, 
> to_date(fk4), sum(1) from fact group by 1, 2, 3, 4;
> explain cbo select pk1, dim2.fk4, sum(1), count(c1)
> from fact, dim2
> where to_date(fact.fk4) = dim2.fk4
> group by 1, 2
> order by 1, 2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities

2020-03-23 Thread Vineet Garg (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065205#comment-17065205
 ] 

Vineet Garg commented on HIVE-22262:


The reason {{HiveAggregateProjectMergeRule}} isn't able to merge is because 
Project is introducing a constant expression in this case.


 Following query better encapsulate case where {{AggregateJoinTransposeRule}} 
should help trigger rewriting
{code:sql}
explain cbo select pk1, dim2.fk4, sum(fk3), count(c1)
from fact, dim2
where to_date(fact.fk4) = dim2.fk4
group by pk1,dim2.fk4
order by pk1,dim2.fk4;
{code}

> Aggregate pushdown through join may generate additional rewriting 
> opportunities
> ---
>
> Key: HIVE-22262
> URL: https://issues.apache.org/jira/browse/HIVE-22262
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Assignee: Vineet Garg
>Priority: Major
> Attachments: eager-v2.sql
>
>
> In this case, there is a function used in the query and materialized view, 
> but the aggregate is not being pushed down.  Script is attached.
> Example query and materialized view:
>  create materialized view av1 stored as orc as select fk1, fk2, fk3, 
> to_date(fk4), sum(1) from fact group by 1, 2, 3, 4;
> explain cbo select pk1, dim2.fk4, sum(1), count(c1)
> from fact, dim2
> where to_date(fact.fk4) = dim2.fk4
> group by 1, 2
> order by 1, 2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities

2020-03-23 Thread Vineet Garg (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065099#comment-17065099
 ] 

Vineet Garg commented on HIVE-22262:


[~jcamachorodriguez] This is due to {{HiveAggregateJoinTransposeRule}} not 
matching because there is a Project b/w Aggregate and Join. I am trying 
{{HiveAggregateProjectMergeRule}} but it still doesn't match.

> Aggregate pushdown through join may generate additional rewriting 
> opportunities
> ---
>
> Key: HIVE-22262
> URL: https://issues.apache.org/jira/browse/HIVE-22262
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Assignee: Vineet Garg
>Priority: Major
> Attachments: eager-v2.sql
>
>
> In this case, there is a function used in the query and materialized view, 
> but the aggregate is not being pushed down.  Script is attached.
> Example query and materialized view:
>  create materialized view av1 stored as orc as select fk1, fk2, fk3, 
> to_date(fk4), sum(1) from fact group by 1, 2, 3, 4;
> explain cbo select pk1, dim2.fk4, sum(1), count(c1)
> from fact, dim2
> where to_date(fact.fk4) = dim2.fk4
> group by 1, 2
> order by 1, 2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities

2020-03-23 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065096#comment-17065096
 ] 

Jesus Camacho Rodriguez commented on HIVE-22262:


[~vgarg], iirc {{HiveAggregateJoinTransposeRule}} has a cost check before 
triggering the transformation; you may want to check whether using the original 
{{AggregateJoinTransposeRule}} makes a difference.

> Aggregate pushdown through join may generate additional rewriting 
> opportunities
> ---
>
> Key: HIVE-22262
> URL: https://issues.apache.org/jira/browse/HIVE-22262
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Assignee: Vineet Garg
>Priority: Major
> Attachments: eager-v2.sql
>
>
> In this case, there is a function used in the query and materialized view, 
> but the aggregate is not being pushed down.  Script is attached.
> Example query and materialized view:
>  create materialized view av1 stored as orc as select fk1, fk2, fk3, 
> to_date(fk4), sum(1) from fact group by 1, 2, 3, 4;
> explain cbo select pk1, dim2.fk4, sum(1), count(c1)
> from fact, dim2
> where to_date(fact.fk4) = dim2.fk4
> group by 1, 2
> order by 1, 2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities

2020-03-23 Thread Vineet Garg (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065043#comment-17065043
 ] 

Vineet Garg commented on HIVE-22262:


{code:sql}
explain cbo select pk1, dim2.fk4, sum(1), count(c1)
from fact, dim2
where to_date(fact.fk4) = dim2.fk4
group by pk1,dim2.fk4
order by pk1,dim2.fk4;
{code}

This doesn't fail anymore but we aren't able to rewrite it. I tried adding 
{{HiveAggregateJoinTransposeRule}} before rewriting but it still doesn't kick 
in rewriting.

Plan after the introduction of rule
{code}
HiveSortLimit(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
  HiveProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$3])
HiveAggregate(group=[{0, 1}], agg#0=[sum($2)], agg#1=[count($3)])
  HiveProject($f0=[$1], $f1=[$2], $f2=[1], $f3=[$3])
HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
cost=[not available])
  HiveProject(TO_DATE=[TO_DATE($3)])
HiveFilter(condition=[IS NOT NULL(TO_DATE($3))])
  HiveTableScan(table=[[default, fact]], table:alias=[fact])
  HiveProject(pk1=[$0], fk4=[$1], c1=[$2], CAST=[CAST($1):DATE])
HiveFilter(condition=[IS NOT NULL(CAST($1):DATE)])
  HiveTableScan(table=[[default, dim2]], table:alias=[dim2])
{code}

> Aggregate pushdown through join may generate additional rewriting 
> opportunities
> ---
>
> Key: HIVE-22262
> URL: https://issues.apache.org/jira/browse/HIVE-22262
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Priority: Major
> Attachments: eager-v2.sql
>
>
> In this case, there is a function used in the query and materialized view, 
> but the aggregate is not being pushed down.  Script is attached.
> Example query and materialized view:
>  create materialized view av1 stored as orc as select fk1, fk2, fk3, 
> to_date(fk4), sum(1) from fact group by 1, 2, 3, 4;
> explain cbo select pk1, dim2.fk4, sum(1), count(c1)
> from fact, dim2
> where to_date(fact.fk4) = dim2.fk4
> group by 1, 2
> order by 1, 2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)