[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities
[ https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065259#comment-17065259 ] Vineet Garg commented on HIVE-22262: Of course! I don't know why I was thinking that count(*) could be computed over already grouped set of data > Aggregate pushdown through join may generate additional rewriting > opportunities > --- > > Key: HIVE-22262 > URL: https://issues.apache.org/jira/browse/HIVE-22262 > Project: Hive > Issue Type: Sub-task > Components: CBO, Materialized views >Affects Versions: 3.1.2 >Reporter: Steve Carlin >Assignee: Vineet Garg >Priority: Major > Attachments: eager-v2.sql > > > In this case, there is a function used in the query and materialized view, > but the aggregate is not being pushed down. Script is attached. > Example query and materialized view: > create materialized view av1 stored as orc as select fk1, fk2, fk3, > to_date(fk4), sum(1) from fact group by 1, 2, 3, 4; > explain cbo select pk1, dim2.fk4, sum(1), count(c1) > from fact, dim2 > where to_date(fact.fk4) = dim2.fk4 > group by 1, 2 > order by 1, 2; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities
[ https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065257#comment-17065257 ] Jesus Camacho Rodriguez commented on HIVE-22262: The count in the query is computing the number of rows in each group; without storing the count per group in the MV, we will not be able to compute the former (in fact, the rollup in the rewriting would be done with a sum). > Aggregate pushdown through join may generate additional rewriting > opportunities > --- > > Key: HIVE-22262 > URL: https://issues.apache.org/jira/browse/HIVE-22262 > Project: Hive > Issue Type: Sub-task > Components: CBO, Materialized views >Affects Versions: 3.1.2 >Reporter: Steve Carlin >Assignee: Vineet Garg >Priority: Major > Attachments: eager-v2.sql > > > In this case, there is a function used in the query and materialized view, > but the aggregate is not being pushed down. Script is attached. > Example query and materialized view: > create materialized view av1 stored as orc as select fk1, fk2, fk3, > to_date(fk4), sum(1) from fact group by 1, 2, 3, 4; > explain cbo select pk1, dim2.fk4, sum(1), count(c1) > from fact, dim2 > where to_date(fact.fk4) = dim2.fk4 > group by 1, 2 > order by 1, 2; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities
[ https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065249#comment-17065249 ] Vineet Garg commented on HIVE-22262: [~jcamachorodriguez] My thinking was that while introducing aggregate on top of view we keep this count beside adding necessary aggregations like sum(fk3). Since this count doesn' have any aggregate column it should be logically equivalent. What do you think? > Aggregate pushdown through join may generate additional rewriting > opportunities > --- > > Key: HIVE-22262 > URL: https://issues.apache.org/jira/browse/HIVE-22262 > Project: Hive > Issue Type: Sub-task > Components: CBO, Materialized views >Affects Versions: 3.1.2 >Reporter: Steve Carlin >Assignee: Vineet Garg >Priority: Major > Attachments: eager-v2.sql > > > In this case, there is a function used in the query and materialized view, > but the aggregate is not being pushed down. Script is attached. > Example query and materialized view: > create materialized view av1 stored as orc as select fk1, fk2, fk3, > to_date(fk4), sum(1) from fact group by 1, 2, 3, 4; > explain cbo select pk1, dim2.fk4, sum(1), count(c1) > from fact, dim2 > where to_date(fact.fk4) = dim2.fk4 > group by 1, 2 > order by 1, 2; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities
[ https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065232#comment-17065232 ] Jesus Camacho Rodriguez commented on HIVE-22262: {quote} More specific repro {quote} Can you compute the {{count}} in the query if you do not store it in the MV? > Aggregate pushdown through join may generate additional rewriting > opportunities > --- > > Key: HIVE-22262 > URL: https://issues.apache.org/jira/browse/HIVE-22262 > Project: Hive > Issue Type: Sub-task > Components: CBO, Materialized views >Affects Versions: 3.1.2 >Reporter: Steve Carlin >Assignee: Vineet Garg >Priority: Major > Attachments: eager-v2.sql > > > In this case, there is a function used in the query and materialized view, > but the aggregate is not being pushed down. Script is attached. > Example query and materialized view: > create materialized view av1 stored as orc as select fk1, fk2, fk3, > to_date(fk4), sum(1) from fact group by 1, 2, 3, 4; > explain cbo select pk1, dim2.fk4, sum(1), count(c1) > from fact, dim2 > where to_date(fact.fk4) = dim2.fk4 > group by 1, 2 > order by 1, 2; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities
[ https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065207#comment-17065207 ] Vineet Garg commented on HIVE-22262: Once we have improvement for the above, AggregateJoinTransposeRule should help us trigger more rewritings > Aggregate pushdown through join may generate additional rewriting > opportunities > --- > > Key: HIVE-22262 > URL: https://issues.apache.org/jira/browse/HIVE-22262 > Project: Hive > Issue Type: Sub-task > Components: CBO, Materialized views >Affects Versions: 3.1.2 >Reporter: Steve Carlin >Assignee: Vineet Garg >Priority: Major > Attachments: eager-v2.sql > > > In this case, there is a function used in the query and materialized view, > but the aggregate is not being pushed down. Script is attached. > Example query and materialized view: > create materialized view av1 stored as orc as select fk1, fk2, fk3, > to_date(fk4), sum(1) from fact group by 1, 2, 3, 4; > explain cbo select pk1, dim2.fk4, sum(1), count(c1) > from fact, dim2 > where to_date(fact.fk4) = dim2.fk4 > group by 1, 2 > order by 1, 2; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities
[ https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065206#comment-17065206 ] Vineet Garg commented on HIVE-22262: Unfortunately rewriting still isn't triggered for above query. AggregateJoinTransposeRule does group by pushdown (along with eager count) and adds an extra count() on both side generating the following plan {noformat} HiveAggregate(group=[{3, 4}], agg#0=[sum($8)], agg#1=[$SUM0($9)]) HiveProject(TO_DATE=[$0], $f1=[$1], count=[$2], pk1=[$3], fk4=[$4], CAST=[$5], count0=[$6], $f4=[$7], $f8=[*($1, $6)], $f9=[*($2, $7)]) HiveJoin(condition=[=($0, $5)], joinType=[inner], algorithm=[none], cost=[not available]) HiveAggregate(group=[{1}], agg#0=[sum($0)], count=[count()]) HiveProject(subset=[rel#500:Subset#2.HIVE.[]], fk3=[$2], TO_DATE=[TO_DATE($3)]) HiveFilter(subset=[rel#498:Subset#1.HIVE.[]], condition=[IS NOT NULL(TO_DATE($3))]) HiveTableScan(subset=[rel#496:Subset#0.HIVE.[]], table=[[default, fact]], table:alias=[fact]) HiveAggregate(group=[{0, 1, 3}], count=[count()], agg#1=[count($2)]) HiveProject(subset=[rel#505:Subset#5.HIVE.[]], pk1=[$0], fk4=[$1], c1=[$2], CAST=[CAST($1):DATE]) HiveFilter(subset=[rel#503:Subset#4.HIVE.[]], condition=[IS NOT NULL(CAST($1):DATE)]) HiveTableScan(subset=[rel#501:Subset#3.HIVE.[]], table=[[default, dim2]], table:alias=[dim2]) {noformat} Currently MV rewriting isn't capable of rewriting if there is an aggregate on query which isn't in view More specific repro {code:sql} create materialized view av3 stored as orc as select fk1, fk2, fk3, fk4, sum(fk3) from fact group by fk1,fk2,fk3,fk4; explain cbo select sum(fk3), count(*) from fact group by fk4; {code} > Aggregate pushdown through join may generate additional rewriting > opportunities > --- > > Key: HIVE-22262 > URL: https://issues.apache.org/jira/browse/HIVE-22262 > Project: Hive > Issue Type: Sub-task > Components: CBO, Materialized views >Affects Versions: 3.1.2 >Reporter: Steve Carlin >Assignee: Vineet Garg >Priority: Major > Attachments: eager-v2.sql > > > In this case, there is a function used in the query and materialized view, > but the aggregate is not being pushed down. Script is attached. > Example query and materialized view: > create materialized view av1 stored as orc as select fk1, fk2, fk3, > to_date(fk4), sum(1) from fact group by 1, 2, 3, 4; > explain cbo select pk1, dim2.fk4, sum(1), count(c1) > from fact, dim2 > where to_date(fact.fk4) = dim2.fk4 > group by 1, 2 > order by 1, 2; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities
[ https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065205#comment-17065205 ] Vineet Garg commented on HIVE-22262: The reason {{HiveAggregateProjectMergeRule}} isn't able to merge is because Project is introducing a constant expression in this case. Following query better encapsulate case where {{AggregateJoinTransposeRule}} should help trigger rewriting {code:sql} explain cbo select pk1, dim2.fk4, sum(fk3), count(c1) from fact, dim2 where to_date(fact.fk4) = dim2.fk4 group by pk1,dim2.fk4 order by pk1,dim2.fk4; {code} > Aggregate pushdown through join may generate additional rewriting > opportunities > --- > > Key: HIVE-22262 > URL: https://issues.apache.org/jira/browse/HIVE-22262 > Project: Hive > Issue Type: Sub-task > Components: CBO, Materialized views >Affects Versions: 3.1.2 >Reporter: Steve Carlin >Assignee: Vineet Garg >Priority: Major > Attachments: eager-v2.sql > > > In this case, there is a function used in the query and materialized view, > but the aggregate is not being pushed down. Script is attached. > Example query and materialized view: > create materialized view av1 stored as orc as select fk1, fk2, fk3, > to_date(fk4), sum(1) from fact group by 1, 2, 3, 4; > explain cbo select pk1, dim2.fk4, sum(1), count(c1) > from fact, dim2 > where to_date(fact.fk4) = dim2.fk4 > group by 1, 2 > order by 1, 2; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities
[ https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065099#comment-17065099 ] Vineet Garg commented on HIVE-22262: [~jcamachorodriguez] This is due to {{HiveAggregateJoinTransposeRule}} not matching because there is a Project b/w Aggregate and Join. I am trying {{HiveAggregateProjectMergeRule}} but it still doesn't match. > Aggregate pushdown through join may generate additional rewriting > opportunities > --- > > Key: HIVE-22262 > URL: https://issues.apache.org/jira/browse/HIVE-22262 > Project: Hive > Issue Type: Sub-task > Components: CBO, Materialized views >Affects Versions: 3.1.2 >Reporter: Steve Carlin >Assignee: Vineet Garg >Priority: Major > Attachments: eager-v2.sql > > > In this case, there is a function used in the query and materialized view, > but the aggregate is not being pushed down. Script is attached. > Example query and materialized view: > create materialized view av1 stored as orc as select fk1, fk2, fk3, > to_date(fk4), sum(1) from fact group by 1, 2, 3, 4; > explain cbo select pk1, dim2.fk4, sum(1), count(c1) > from fact, dim2 > where to_date(fact.fk4) = dim2.fk4 > group by 1, 2 > order by 1, 2; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities
[ https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065096#comment-17065096 ] Jesus Camacho Rodriguez commented on HIVE-22262: [~vgarg], iirc {{HiveAggregateJoinTransposeRule}} has a cost check before triggering the transformation; you may want to check whether using the original {{AggregateJoinTransposeRule}} makes a difference. > Aggregate pushdown through join may generate additional rewriting > opportunities > --- > > Key: HIVE-22262 > URL: https://issues.apache.org/jira/browse/HIVE-22262 > Project: Hive > Issue Type: Sub-task > Components: CBO, Materialized views >Affects Versions: 3.1.2 >Reporter: Steve Carlin >Assignee: Vineet Garg >Priority: Major > Attachments: eager-v2.sql > > > In this case, there is a function used in the query and materialized view, > but the aggregate is not being pushed down. Script is attached. > Example query and materialized view: > create materialized view av1 stored as orc as select fk1, fk2, fk3, > to_date(fk4), sum(1) from fact group by 1, 2, 3, 4; > explain cbo select pk1, dim2.fk4, sum(1), count(c1) > from fact, dim2 > where to_date(fact.fk4) = dim2.fk4 > group by 1, 2 > order by 1, 2; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22262) Aggregate pushdown through join may generate additional rewriting opportunities
[ https://issues.apache.org/jira/browse/HIVE-22262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065043#comment-17065043 ] Vineet Garg commented on HIVE-22262: {code:sql} explain cbo select pk1, dim2.fk4, sum(1), count(c1) from fact, dim2 where to_date(fact.fk4) = dim2.fk4 group by pk1,dim2.fk4 order by pk1,dim2.fk4; {code} This doesn't fail anymore but we aren't able to rewrite it. I tried adding {{HiveAggregateJoinTransposeRule}} before rewriting but it still doesn't kick in rewriting. Plan after the introduction of rule {code} HiveSortLimit(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) HiveProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$3]) HiveAggregate(group=[{0, 1}], agg#0=[sum($2)], agg#1=[count($3)]) HiveProject($f0=[$1], $f1=[$2], $f2=[1], $f3=[$3]) HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(TO_DATE=[TO_DATE($3)]) HiveFilter(condition=[IS NOT NULL(TO_DATE($3))]) HiveTableScan(table=[[default, fact]], table:alias=[fact]) HiveProject(pk1=[$0], fk4=[$1], c1=[$2], CAST=[CAST($1):DATE]) HiveFilter(condition=[IS NOT NULL(CAST($1):DATE)]) HiveTableScan(table=[[default, dim2]], table:alias=[dim2]) {code} > Aggregate pushdown through join may generate additional rewriting > opportunities > --- > > Key: HIVE-22262 > URL: https://issues.apache.org/jira/browse/HIVE-22262 > Project: Hive > Issue Type: Sub-task > Components: CBO, Materialized views >Affects Versions: 3.1.2 >Reporter: Steve Carlin >Priority: Major > Attachments: eager-v2.sql > > > In this case, there is a function used in the query and materialized view, > but the aggregate is not being pushed down. Script is attached. > Example query and materialized view: > create materialized view av1 stored as orc as select fk1, fk2, fk3, > to_date(fk4), sum(1) from fact group by 1, 2, 3, 4; > explain cbo select pk1, dim2.fk4, sum(1), count(c1) > from fact, dim2 > where to_date(fact.fk4) = dim2.fk4 > group by 1, 2 > order by 1, 2; -- This message was sent by Atlassian Jira (v8.3.4#803005)