[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793983#comment-16793983
 ] 

Hive QA commented on HIVE-21382:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12962639/HIVE-21382.6.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15832 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16533/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16533/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16533/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12962639 - PreCommit-HIVE-Build

> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, 
> HIVE-21382.2.patch, HIVE-21382.3.patch, HIVE-21382.4.patch, 
> HIVE-21382.5.patch, HIVE-21382.6.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
> cs_quantity=[$18], cs_list_price=[$20])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[perf_constraints, catalog_sales]], 
> table:alias=[catalog_sales])
> HiveProject(d_date_sk=[$0])
>   HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
> HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
>   HiveProject(i_item_sk=[$1])
> HiveFilter(condition=[>($3, 4)])
>   HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
> HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
>   HiveJoin(condition=[=($1, $4)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveJoin(condition=[=($0, $2)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
>   HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[perf_constraints, 
> store_sales]], table:alias=[store_sales])
>   HiveProject(d_date_sk=[$0], d_date=[$2])
> HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
>   HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
> HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
>   HiveTableScan(table=[[perf_constraints, item]], 
> table:alias=[item])
> {code}
> Right side of HiveSemiJoin has an aggregate which could be reduce to have 
> only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793952#comment-16793952
 ] 

Hive QA commented on HIVE-21382:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
20s{color} | {color:blue} ql in master has 2257 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16533/dev-support/hive-personality.sh
 |
| git revision | master / 677eef8 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16533/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, 
> HIVE-21382.2.patch, HIVE-21382.3.patch, HIVE-21382.4.patch, 
> HIVE-21382.5.patch, HIVE-21382.6.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], 

[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793372#comment-16793372
 ] 

Hive QA commented on HIVE-21382:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12962522/HIVE-21382.5.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15820 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigrationEx.org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigrationEx
 (batchId=252)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16520/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16520/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16520/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12962522 - PreCommit-HIVE-Build

> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, 
> HIVE-21382.2.patch, HIVE-21382.3.patch, HIVE-21382.4.patch, HIVE-21382.5.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
> cs_quantity=[$18], cs_list_price=[$20])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[perf_constraints, catalog_sales]], 
> table:alias=[catalog_sales])
> HiveProject(d_date_sk=[$0])
>   HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
> HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
>   HiveProject(i_item_sk=[$1])
> HiveFilter(condition=[>($3, 4)])
>   HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
> HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
>   HiveJoin(condition=[=($1, $4)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveJoin(condition=[=($0, $2)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
>   HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[perf_constraints, 
> store_sales]], table:alias=[store_sales])
>   HiveProject(d_date_sk=[$0], d_date=[$2])
> HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
>   HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
> HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
>   HiveTableScan(table=[[perf_constraints, item]], 
> table:alias=[item])
> {code}
> Right side of HiveSemiJoin has an aggregate which could be reduce to have 
> only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793344#comment-16793344
 ] 

Hive QA commented on HIVE-21382:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
30s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
12s{color} | {color:blue} ql in master has 2257 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 27s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16520/dev-support/hive-personality.sh
 |
| git revision | master / 3f02ce4 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16520/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16520/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, 
> HIVE-21382.2.patch, HIVE-21382.3.patch, HIVE-21382.4.patch, HIVE-21382.5.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   

[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-14 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792997#comment-16792997
 ] 

Vineet Garg commented on HIVE-21382:


Thanks for reviewing [~jcamachorodriguez]. I have addressed the comment and I 
have uploaded a new patch.

> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, 
> HIVE-21382.2.patch, HIVE-21382.3.patch, HIVE-21382.4.patch, HIVE-21382.5.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
> cs_quantity=[$18], cs_list_price=[$20])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[perf_constraints, catalog_sales]], 
> table:alias=[catalog_sales])
> HiveProject(d_date_sk=[$0])
>   HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
> HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
>   HiveProject(i_item_sk=[$1])
> HiveFilter(condition=[>($3, 4)])
>   HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
> HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
>   HiveJoin(condition=[=($1, $4)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveJoin(condition=[=($0, $2)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
>   HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[perf_constraints, 
> store_sales]], table:alias=[store_sales])
>   HiveProject(d_date_sk=[$0], d_date=[$2])
> HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
>   HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
> HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
>   HiveTableScan(table=[[perf_constraints, item]], 
> table:alias=[item])
> {code}
> Right side of HiveSemiJoin has an aggregate which could be reduce to have 
> only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-14 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792827#comment-16792827
 ] 

Jesus Camacho Rodriguez commented on HIVE-21382:


Left a minor comment on last PR.

+1 (pending clean run)

Thanks [~vgarg]

> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, 
> HIVE-21382.2.patch, HIVE-21382.3.patch, HIVE-21382.4.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
> cs_quantity=[$18], cs_list_price=[$20])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[perf_constraints, catalog_sales]], 
> table:alias=[catalog_sales])
> HiveProject(d_date_sk=[$0])
>   HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
> HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
>   HiveProject(i_item_sk=[$1])
> HiveFilter(condition=[>($3, 4)])
>   HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
> HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
>   HiveJoin(condition=[=($1, $4)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveJoin(condition=[=($0, $2)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
>   HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[perf_constraints, 
> store_sales]], table:alias=[store_sales])
>   HiveProject(d_date_sk=[$0], d_date=[$2])
> HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
>   HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
> HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
>   HiveTableScan(table=[[perf_constraints, item]], 
> table:alias=[item])
> {code}
> Right side of HiveSemiJoin has an aggregate which could be reduce to have 
> only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-14 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792599#comment-16792599
 ] 

Hive QA commented on HIVE-21382:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12962427/HIVE-21382.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15827 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_sw]
 (batchId=161)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16504/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16504/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16504/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12962427 - PreCommit-HIVE-Build

> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, 
> HIVE-21382.2.patch, HIVE-21382.3.patch, HIVE-21382.4.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
> cs_quantity=[$18], cs_list_price=[$20])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[perf_constraints, catalog_sales]], 
> table:alias=[catalog_sales])
> HiveProject(d_date_sk=[$0])
>   HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
> HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
>   HiveProject(i_item_sk=[$1])
> HiveFilter(condition=[>($3, 4)])
>   HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
> HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
>   HiveJoin(condition=[=($1, $4)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveJoin(condition=[=($0, $2)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
>   HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[perf_constraints, 
> store_sales]], table:alias=[store_sales])
>   HiveProject(d_date_sk=[$0], d_date=[$2])
> HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
>   HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
> HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
>   HiveTableScan(table=[[perf_constraints, item]], 
> table:alias=[item])
> {code}
> Right side of HiveSemiJoin has an aggregate which could be reduce to have 
> only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-14 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792566#comment-16792566
 ] 

Hive QA commented on HIVE-21382:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
11s{color} | {color:blue} ql in master has 2257 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16504/dev-support/hive-personality.sh
 |
| git revision | master / a2892cd |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16504/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16504/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, 
> HIVE-21382.2.patch, HIVE-21382.3.patch, HIVE-21382.4.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   

[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-12 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791025#comment-16791025
 ] 

Vineet Garg commented on HIVE-21382:


[~jcamachorodriguez] Can you take a look: 
https://github.com/apache/hive/pull/567

> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, 
> HIVE-21382.2.patch, HIVE-21382.3.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
> cs_quantity=[$18], cs_list_price=[$20])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[perf_constraints, catalog_sales]], 
> table:alias=[catalog_sales])
> HiveProject(d_date_sk=[$0])
>   HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
> HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
>   HiveProject(i_item_sk=[$1])
> HiveFilter(condition=[>($3, 4)])
>   HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
> HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
>   HiveJoin(condition=[=($1, $4)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveJoin(condition=[=($0, $2)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
>   HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[perf_constraints, 
> store_sales]], table:alias=[store_sales])
>   HiveProject(d_date_sk=[$0], d_date=[$2])
> HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
>   HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
> HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
>   HiveTableScan(table=[[perf_constraints, item]], 
> table:alias=[item])
> {code}
> Right side of HiveSemiJoin has an aggregate which could be reduce to have 
> only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790288#comment-16790288
 ] 

Hive QA commented on HIVE-21382:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12962070/HIVE-21382.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15824 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16460/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16460/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16460/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12962070 - PreCommit-HIVE-Build

> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, 
> HIVE-21382.2.patch, HIVE-21382.3.patch
>
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
> cs_quantity=[$18], cs_list_price=[$20])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[perf_constraints, catalog_sales]], 
> table:alias=[catalog_sales])
> HiveProject(d_date_sk=[$0])
>   HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
> HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
>   HiveProject(i_item_sk=[$1])
> HiveFilter(condition=[>($3, 4)])
>   HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
> HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
>   HiveJoin(condition=[=($1, $4)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveJoin(condition=[=($0, $2)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
>   HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[perf_constraints, 
> store_sales]], table:alias=[store_sales])
>   HiveProject(d_date_sk=[$0], d_date=[$2])
> HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
>   HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
> HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
>   HiveTableScan(table=[[perf_constraints, item]], 
> table:alias=[item])
> {code}
> Right side of HiveSemiJoin has an aggregate which could be reduce to have 
> only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790265#comment-16790265
 ] 

Hive QA commented on HIVE-21382:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
10s{color} | {color:blue} ql in master has 2258 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
40s{color} | {color:red} ql: The patch generated 4 new + 1 unchanged - 0 fixed 
= 5 total (was 1) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16460/dev-support/hive-personality.sh
 |
| git revision | master / 9f2f101 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16460/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16460/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, 
> HIVE-21382.2.patch, HIVE-21382.3.patch
>
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> 

[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790224#comment-16790224
 ] 

Hive QA commented on HIVE-21382:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12962042/HIVE-21382.3.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16458/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16458/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16458/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2019-03-12 04:32:12.577
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-16458/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2019-03-12 04:32:12.580
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 9f2f101 HIVE-21388: Constant UDF is not pushed to JDBCStorage 
Handler (Jesus Camacho Rodriguez, reviewed by Jason Dere)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 9f2f101 HIVE-21388: Constant UDF is not pushed to JDBCStorage 
Handler (Jesus Camacho Rodriguez, reviewed by Jason Dere)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2019-03-12 04:32:13.469
+ rm -rf ../yetus_PreCommit-HIVE-Build-16458
+ mkdir ../yetus_PreCommit-HIVE-Build-16458
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-16458
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-16458/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java:
 does not exist in index
error: a/ql/src/test/queries/clientpositive/constraints_optimization.q: does 
not exist in index
error: 
a/ql/src/test/results/clientpositive/llap/constraints_optimization.q.out: does 
not exist in index
error: 
a/ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query23.q.out: 
does not exist in index
error: 
a/ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query39.q.out: 
does not exist in index
error: 
a/ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query64.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/perf/tez/constraints/query23.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/perf/tez/constraints/query39.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/perf/tez/constraints/query64.q.out: 
does not exist in index
Going to apply patch with: git apply -p1
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
protoc-jar: executing: [/tmp/protoc5885607783622954764.exe, --version]
protoc-jar: executing: [/tmp/protoc5885607783622954764.exe, 
-I/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore,
 
--java_out=/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/target/generated-sources,
 
/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore/metastore.proto]
libprotoc 2.5.0
ANTLR Parser Generator  Version 3.5.2
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process 

[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787709#comment-16787709
 ] 

Hive QA commented on HIVE-21382:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961631/HIVE-21382.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16404/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16404/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16404/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Tests exited with: Exception: Patch URL 
https://issues.apache.org/jira/secure/attachment/12961631/HIVE-21382.2.patch 
was found in seen patch url's cache and a test was probably run already on it. 
Aborting...
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961631 - PreCommit-HIVE-Build

> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, 
> HIVE-21382.2.patch
>
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
> cs_quantity=[$18], cs_list_price=[$20])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[perf_constraints, catalog_sales]], 
> table:alias=[catalog_sales])
> HiveProject(d_date_sk=[$0])
>   HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
> HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
>   HiveProject(i_item_sk=[$1])
> HiveFilter(condition=[>($3, 4)])
>   HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
> HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
>   HiveJoin(condition=[=($1, $4)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveJoin(condition=[=($0, $2)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
>   HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[perf_constraints, 
> store_sales]], table:alias=[store_sales])
>   HiveProject(d_date_sk=[$0], d_date=[$2])
> HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
>   HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
> HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
>   HiveTableScan(table=[[perf_constraints, item]], 
> table:alias=[item])
> {code}
> Right side of HiveSemiJoin has an aggregate which could be reduce to have 
> only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787706#comment-16787706
 ] 

Hive QA commented on HIVE-21382:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961631/HIVE-21382.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15820 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query23]
 (batchId=275)
org.apache.hive.hcatalog.mapreduce.TestHCatPartitioned.testHCatPartitionedTable[0]
 (batchId=209)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16403/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16403/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16403/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961631 - PreCommit-HIVE-Build

> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, 
> HIVE-21382.2.patch
>
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
> cs_quantity=[$18], cs_list_price=[$20])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[perf_constraints, catalog_sales]], 
> table:alias=[catalog_sales])
> HiveProject(d_date_sk=[$0])
>   HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
> HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
>   HiveProject(i_item_sk=[$1])
> HiveFilter(condition=[>($3, 4)])
>   HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
> HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
>   HiveJoin(condition=[=($1, $4)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveJoin(condition=[=($0, $2)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
>   HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[perf_constraints, 
> store_sales]], table:alias=[store_sales])
>   HiveProject(d_date_sk=[$0], d_date=[$2])
> HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
>   HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
> HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
>   HiveTableScan(table=[[perf_constraints, item]], 
> table:alias=[item])
> {code}
> Right side of HiveSemiJoin has an aggregate which could be reduce to have 
> only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787678#comment-16787678
 ] 

Hive QA commented on HIVE-21382:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
8s{color} | {color:blue} ql in master has 2258 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 4 new + 1 unchanged - 0 fixed 
= 5 total (was 1) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
19s{color} | {color:red} ql generated 1 new + 2258 unchanged - 0 fixed = 2259 
total (was 2258) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Should 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveRelFieldTrimmer$TableRefFinder
 be a _static_ inner class?  At HiveRelFieldTrimmer.java:inner class?  At 
HiveRelFieldTrimmer.java:[lines 325-339] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16403/dev-support/hive-personality.sh
 |
| git revision | master / bd95a2f |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16403/yetus/diff-checkstyle-ql.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16403/yetus/new-findbugs-ql.html
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16403/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, 
> HIVE-21382.2.patch
>
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in 

[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786050#comment-16786050
 ] 

Hive QA commented on HIVE-21382:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961272/HIVE-21382.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 15819 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query39]
 (batchId=275)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query64]
 (batchId=275)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query39]
 (batchId=275)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query64]
 (batchId=275)
org.apache.hive.hcatalog.mapreduce.TestHCatPartitioned.testHCatPartitionedTable[1]
 (batchId=209)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16367/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16367/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16367/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961272 - PreCommit-HIVE-Build

> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21382.1.patch
>
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
> cs_quantity=[$18], cs_list_price=[$20])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[perf_constraints, catalog_sales]], 
> table:alias=[catalog_sales])
> HiveProject(d_date_sk=[$0])
>   HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
> HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
>   HiveProject(i_item_sk=[$1])
> HiveFilter(condition=[>($3, 4)])
>   HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
> HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
>   HiveJoin(condition=[=($1, $4)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveJoin(condition=[=($0, $2)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
>   HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[perf_constraints, 
> store_sales]], table:alias=[store_sales])
>   HiveProject(d_date_sk=[$0], d_date=[$2])
> HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
>   HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
> HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
>   HiveTableScan(table=[[perf_constraints, item]], 
> table:alias=[item])
> {code}
> Right side of HiveSemiJoin has an aggregate which 

[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785996#comment-16785996
 ] 

Hive QA commented on HIVE-21382:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
51s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
33s{color} | {color:blue} ql in master has 2251 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 3 new + 1 unchanged - 0 fixed 
= 4 total (was 1) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16367/dev-support/hive-personality.sh
 |
| git revision | master / 0413fec |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16367/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16367/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21382.1.patch
>
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> 

[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-05 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785156#comment-16785156
 ] 

Vineet Garg commented on HIVE-21382:


This patch also fixes HIVE-21387

> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21382.1.patch
>
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
> cs_quantity=[$18], cs_list_price=[$20])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[perf_constraints, catalog_sales]], 
> table:alias=[catalog_sales])
> HiveProject(d_date_sk=[$0])
>   HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
> HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
>   HiveProject(i_item_sk=[$1])
> HiveFilter(condition=[>($3, 4)])
>   HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
> HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
>   HiveJoin(condition=[=($1, $4)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveJoin(condition=[=($0, $2)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
>   HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[perf_constraints, 
> store_sales]], table:alias=[store_sales])
>   HiveProject(d_date_sk=[$0], d_date=[$2])
> HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
>   HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
> HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
>   HiveTableScan(table=[[perf_constraints, item]], 
> table:alias=[item])
> {code}
> Right side of HiveSemiJoin has an aggregate which could be reduce to have 
> only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-05 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785155#comment-16785155
 ] 

Vineet Garg commented on HIVE-21382:


Challenge with query23 is that group by has expressions (e.g. {{substr}}) which 
couldn't be tracked back to source table making it difficult for optimizer to 
determine if it is safe or not to remove such keys. I am attaching a patch 
which should otherwise work. It extends the existing logic to consider group by 
keys reduction even if columns originate from different source table.

> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21382.1.patch
>
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
> cs_quantity=[$18], cs_list_price=[$20])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[perf_constraints, catalog_sales]], 
> table:alias=[catalog_sales])
> HiveProject(d_date_sk=[$0])
>   HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
> HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
>   HiveProject(i_item_sk=[$1])
> HiveFilter(condition=[>($3, 4)])
>   HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
> HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
>   HiveJoin(condition=[=($1, $4)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveJoin(condition=[=($0, $2)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
>   HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[perf_constraints, 
> store_sales]], table:alias=[store_sales])
>   HiveProject(d_date_sk=[$0], d_date=[$2])
> HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
>   HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
> HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
>   HiveTableScan(table=[[perf_constraints, item]], 
> table:alias=[item])
> {code}
> Right side of HiveSemiJoin has an aggregate which could be reduce to have 
> only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-04 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783947#comment-16783947
 ] 

Vineet Garg commented on HIVE-21382:


Current logic of Group by reduction only works if all the group by keys are 
originated from same table. In this case group by consists of column from item 
and store_sales table.

> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
> cs_quantity=[$18], cs_list_price=[$20])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[perf_constraints, catalog_sales]], 
> table:alias=[catalog_sales])
> HiveProject(d_date_sk=[$0])
>   HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
> HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
>   HiveProject(i_item_sk=[$1])
> HiveFilter(condition=[>($3, 4)])
>   HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
> HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
>   HiveJoin(condition=[=($1, $4)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveJoin(condition=[=($0, $2)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
>   HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[perf_constraints, 
> store_sales]], table:alias=[store_sales])
>   HiveProject(d_date_sk=[$0], d_date=[$2])
> HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
>   HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
> HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
>   HiveTableScan(table=[[perf_constraints, item]], 
> table:alias=[item])
> {code}
> Right side of HiveSemiJoin has an aggregate which could be reduce to have 
> only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)