[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793983#comment-16793983 ] Hive QA commented on HIVE-21382: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12962639/HIVE-21382.6.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 15832 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16533/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16533/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16533/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12962639 - PreCommit-HIVE-Build > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, > HIVE-21382.2.patch, HIVE-21382.3.patch, HIVE-21382.4.patch, > HIVE-21382.5.patch, HIVE-21382.6.patch > > Time Spent: 3h > Remaining Estimate: 0h > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], > cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], > cs_quantity=[$18], cs_list_price=[$20]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, catalog_sales]], > table:alias=[catalog_sales]) > HiveProject(d_date_sk=[$0]) > HiveFilter(condition=[AND(=($6, 1999), =($8, 1))]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$1]) > HiveFilter(condition=[>($3, 4)]) > HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3]) > HiveAggregate(group=[{3, 4, 5}], agg#0=[count()]) > HiveJoin(condition=[=($1, $4)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveJoin(condition=[=($0, $2)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, > store_sales]], table:alias=[store_sales]) > HiveProject(d_date_sk=[$0], d_date=[$2]) > HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)]) > HiveTableScan(table=[[perf_constraints, item]], > table:alias=[item]) > {code} > Right side of HiveSemiJoin has an aggregate which could be reduce to have > only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793952#comment-16793952 ] Hive QA commented on HIVE-21382: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 49s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 20s{color} | {color:blue} ql in master has 2257 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 26m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16533/dev-support/hive-personality.sh | | git revision | master / 677eef8 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16533/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, > HIVE-21382.2.patch, HIVE-21382.3.patch, HIVE-21382.4.patch, > HIVE-21382.5.patch, HIVE-21382.6.patch > > Time Spent: 3h > Remaining Estimate: 0h > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $4)], joinType=[inner],
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793372#comment-16793372 ] Hive QA commented on HIVE-21382: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12962522/HIVE-21382.5.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15820 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigrationEx.org.apache.hadoop.hive.ql.parse.TestReplicationWithTableMigrationEx (batchId=252) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16520/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16520/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16520/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12962522 - PreCommit-HIVE-Build > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, > HIVE-21382.2.patch, HIVE-21382.3.patch, HIVE-21382.4.patch, HIVE-21382.5.patch > > Time Spent: 3h > Remaining Estimate: 0h > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], > cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], > cs_quantity=[$18], cs_list_price=[$20]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, catalog_sales]], > table:alias=[catalog_sales]) > HiveProject(d_date_sk=[$0]) > HiveFilter(condition=[AND(=($6, 1999), =($8, 1))]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$1]) > HiveFilter(condition=[>($3, 4)]) > HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3]) > HiveAggregate(group=[{3, 4, 5}], agg#0=[count()]) > HiveJoin(condition=[=($1, $4)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveJoin(condition=[=($0, $2)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, > store_sales]], table:alias=[store_sales]) > HiveProject(d_date_sk=[$0], d_date=[$2]) > HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)]) > HiveTableScan(table=[[perf_constraints, item]], > table:alias=[item]) > {code} > Right side of HiveSemiJoin has an aggregate which could be reduce to have > only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key. -- This message was sent by Atlassian JIRA
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793344#comment-16793344 ] Hive QA commented on HIVE-21382: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 30s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 12s{color} | {color:blue} ql in master has 2257 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 14s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 27s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16520/dev-support/hive-personality.sh | | git revision | master / 3f02ce4 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-16520/yetus/patch-asflicense-problems.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16520/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, > HIVE-21382.2.patch, HIVE-21382.3.patch, HIVE-21382.4.patch, HIVE-21382.5.patch > > Time Spent: 3h > Remaining Estimate: 0h > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) >
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792997#comment-16792997 ] Vineet Garg commented on HIVE-21382: Thanks for reviewing [~jcamachorodriguez]. I have addressed the comment and I have uploaded a new patch. > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, > HIVE-21382.2.patch, HIVE-21382.3.patch, HIVE-21382.4.patch, HIVE-21382.5.patch > > Time Spent: 3h > Remaining Estimate: 0h > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], > cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], > cs_quantity=[$18], cs_list_price=[$20]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, catalog_sales]], > table:alias=[catalog_sales]) > HiveProject(d_date_sk=[$0]) > HiveFilter(condition=[AND(=($6, 1999), =($8, 1))]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$1]) > HiveFilter(condition=[>($3, 4)]) > HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3]) > HiveAggregate(group=[{3, 4, 5}], agg#0=[count()]) > HiveJoin(condition=[=($1, $4)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveJoin(condition=[=($0, $2)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, > store_sales]], table:alias=[store_sales]) > HiveProject(d_date_sk=[$0], d_date=[$2]) > HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)]) > HiveTableScan(table=[[perf_constraints, item]], > table:alias=[item]) > {code} > Right side of HiveSemiJoin has an aggregate which could be reduce to have > only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792827#comment-16792827 ] Jesus Camacho Rodriguez commented on HIVE-21382: Left a minor comment on last PR. +1 (pending clean run) Thanks [~vgarg] > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, > HIVE-21382.2.patch, HIVE-21382.3.patch, HIVE-21382.4.patch > > Time Spent: 3h > Remaining Estimate: 0h > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], > cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], > cs_quantity=[$18], cs_list_price=[$20]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, catalog_sales]], > table:alias=[catalog_sales]) > HiveProject(d_date_sk=[$0]) > HiveFilter(condition=[AND(=($6, 1999), =($8, 1))]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$1]) > HiveFilter(condition=[>($3, 4)]) > HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3]) > HiveAggregate(group=[{3, 4, 5}], agg#0=[count()]) > HiveJoin(condition=[=($1, $4)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveJoin(condition=[=($0, $2)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, > store_sales]], table:alias=[store_sales]) > HiveProject(d_date_sk=[$0], d_date=[$2]) > HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)]) > HiveTableScan(table=[[perf_constraints, item]], > table:alias=[item]) > {code} > Right side of HiveSemiJoin has an aggregate which could be reduce to have > only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792599#comment-16792599 ] Hive QA commented on HIVE-21382: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12962427/HIVE-21382.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15827 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_sw] (batchId=161) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16504/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16504/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16504/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12962427 - PreCommit-HIVE-Build > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, > HIVE-21382.2.patch, HIVE-21382.3.patch, HIVE-21382.4.patch > > Time Spent: 2h 50m > Remaining Estimate: 0h > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], > cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], > cs_quantity=[$18], cs_list_price=[$20]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, catalog_sales]], > table:alias=[catalog_sales]) > HiveProject(d_date_sk=[$0]) > HiveFilter(condition=[AND(=($6, 1999), =($8, 1))]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$1]) > HiveFilter(condition=[>($3, 4)]) > HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3]) > HiveAggregate(group=[{3, 4, 5}], agg#0=[count()]) > HiveJoin(condition=[=($1, $4)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveJoin(condition=[=($0, $2)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, > store_sales]], table:alias=[store_sales]) > HiveProject(d_date_sk=[$0], d_date=[$2]) > HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)]) > HiveTableScan(table=[[perf_constraints, item]], > table:alias=[item]) > {code} > Right side of HiveSemiJoin has an aggregate which could be reduce to have > only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792566#comment-16792566 ] Hive QA commented on HIVE-21382: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 14s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 11s{color} | {color:blue} ql in master has 2257 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16504/dev-support/hive-personality.sh | | git revision | master / a2892cd | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-16504/yetus/patch-asflicense-problems.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16504/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, > HIVE-21382.2.patch, HIVE-21382.3.patch, HIVE-21382.4.patch > > Time Spent: 2h 50m > Remaining Estimate: 0h > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) >
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791025#comment-16791025 ] Vineet Garg commented on HIVE-21382: [~jcamachorodriguez] Can you take a look: https://github.com/apache/hive/pull/567 > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, > HIVE-21382.2.patch, HIVE-21382.3.patch > > Time Spent: 10m > Remaining Estimate: 0h > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], > cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], > cs_quantity=[$18], cs_list_price=[$20]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, catalog_sales]], > table:alias=[catalog_sales]) > HiveProject(d_date_sk=[$0]) > HiveFilter(condition=[AND(=($6, 1999), =($8, 1))]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$1]) > HiveFilter(condition=[>($3, 4)]) > HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3]) > HiveAggregate(group=[{3, 4, 5}], agg#0=[count()]) > HiveJoin(condition=[=($1, $4)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveJoin(condition=[=($0, $2)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, > store_sales]], table:alias=[store_sales]) > HiveProject(d_date_sk=[$0], d_date=[$2]) > HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)]) > HiveTableScan(table=[[perf_constraints, item]], > table:alias=[item]) > {code} > Right side of HiveSemiJoin has an aggregate which could be reduce to have > only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790288#comment-16790288 ] Hive QA commented on HIVE-21382: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12962070/HIVE-21382.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 15824 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16460/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16460/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16460/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12962070 - PreCommit-HIVE-Build > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, > HIVE-21382.2.patch, HIVE-21382.3.patch > > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], > cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], > cs_quantity=[$18], cs_list_price=[$20]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, catalog_sales]], > table:alias=[catalog_sales]) > HiveProject(d_date_sk=[$0]) > HiveFilter(condition=[AND(=($6, 1999), =($8, 1))]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$1]) > HiveFilter(condition=[>($3, 4)]) > HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3]) > HiveAggregate(group=[{3, 4, 5}], agg#0=[count()]) > HiveJoin(condition=[=($1, $4)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveJoin(condition=[=($0, $2)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, > store_sales]], table:alias=[store_sales]) > HiveProject(d_date_sk=[$0], d_date=[$2]) > HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)]) > HiveTableScan(table=[[perf_constraints, item]], > table:alias=[item]) > {code} > Right side of HiveSemiJoin has an aggregate which could be reduce to have > only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790265#comment-16790265 ] Hive QA commented on HIVE-21382: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 10s{color} | {color:blue} ql in master has 2258 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s{color} | {color:red} ql: The patch generated 4 new + 1 unchanged - 0 fixed = 5 total (was 1) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 6s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16460/dev-support/hive-personality.sh | | git revision | master / 9f2f101 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16460/yetus/diff-checkstyle-ql.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16460/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, > HIVE-21382.2.patch, HIVE-21382.3.patch > > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], >
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790224#comment-16790224 ] Hive QA commented on HIVE-21382: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12962042/HIVE-21382.3.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16458/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16458/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16458/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2019-03-12 04:32:12.577 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-16458/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2019-03-12 04:32:12.580 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 9f2f101 HIVE-21388: Constant UDF is not pushed to JDBCStorage Handler (Jesus Camacho Rodriguez, reviewed by Jason Dere) + git clean -f -d Removing standalone-metastore/metastore-server/src/gen/ + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 9f2f101 HIVE-21388: Constant UDF is not pushed to JDBCStorage Handler (Jesus Camacho Rodriguez, reviewed by Jason Dere) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2019-03-12 04:32:13.469 + rm -rf ../yetus_PreCommit-HIVE-Build-16458 + mkdir ../yetus_PreCommit-HIVE-Build-16458 + git gc + cp -R . ../yetus_PreCommit-HIVE-Build-16458 + mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-16458/yetus + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java: does not exist in index error: a/ql/src/test/queries/clientpositive/constraints_optimization.q: does not exist in index error: a/ql/src/test/results/clientpositive/llap/constraints_optimization.q.out: does not exist in index error: a/ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query23.q.out: does not exist in index error: a/ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query39.q.out: does not exist in index error: a/ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query64.q.out: does not exist in index error: a/ql/src/test/results/clientpositive/perf/tez/constraints/query23.q.out: does not exist in index error: a/ql/src/test/results/clientpositive/perf/tez/constraints/query39.q.out: does not exist in index error: a/ql/src/test/results/clientpositive/perf/tez/constraints/query64.q.out: does not exist in index Going to apply patch with: git apply -p1 + [[ maven == \m\a\v\e\n ]] + rm -rf /data/hiveptest/working/maven/org/apache/hive + mvn -B clean install -DskipTests -T 4 -q -Dmaven.repo.local=/data/hiveptest/working/maven protoc-jar: executing: [/tmp/protoc5885607783622954764.exe, --version] protoc-jar: executing: [/tmp/protoc5885607783622954764.exe, -I/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore, --java_out=/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/target/generated-sources, /data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore/metastore.proto] libprotoc 2.5.0 ANTLR Parser Generator Version 3.5.2 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787709#comment-16787709 ] Hive QA commented on HIVE-21382: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12961631/HIVE-21382.2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16404/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16404/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16404/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Tests exited with: Exception: Patch URL https://issues.apache.org/jira/secure/attachment/12961631/HIVE-21382.2.patch was found in seen patch url's cache and a test was probably run already on it. Aborting... {noformat} This message is automatically generated. ATTACHMENT ID: 12961631 - PreCommit-HIVE-Build > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, > HIVE-21382.2.patch > > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], > cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], > cs_quantity=[$18], cs_list_price=[$20]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, catalog_sales]], > table:alias=[catalog_sales]) > HiveProject(d_date_sk=[$0]) > HiveFilter(condition=[AND(=($6, 1999), =($8, 1))]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$1]) > HiveFilter(condition=[>($3, 4)]) > HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3]) > HiveAggregate(group=[{3, 4, 5}], agg#0=[count()]) > HiveJoin(condition=[=($1, $4)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveJoin(condition=[=($0, $2)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, > store_sales]], table:alias=[store_sales]) > HiveProject(d_date_sk=[$0], d_date=[$2]) > HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)]) > HiveTableScan(table=[[perf_constraints, item]], > table:alias=[item]) > {code} > Right side of HiveSemiJoin has an aggregate which could be reduce to have > only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787706#comment-16787706 ] Hive QA commented on HIVE-21382: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12961631/HIVE-21382.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15820 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query23] (batchId=275) org.apache.hive.hcatalog.mapreduce.TestHCatPartitioned.testHCatPartitionedTable[0] (batchId=209) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16403/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16403/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16403/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12961631 - PreCommit-HIVE-Build > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, > HIVE-21382.2.patch > > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], > cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], > cs_quantity=[$18], cs_list_price=[$20]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, catalog_sales]], > table:alias=[catalog_sales]) > HiveProject(d_date_sk=[$0]) > HiveFilter(condition=[AND(=($6, 1999), =($8, 1))]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$1]) > HiveFilter(condition=[>($3, 4)]) > HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3]) > HiveAggregate(group=[{3, 4, 5}], agg#0=[count()]) > HiveJoin(condition=[=($1, $4)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveJoin(condition=[=($0, $2)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, > store_sales]], table:alias=[store_sales]) > HiveProject(d_date_sk=[$0], d_date=[$2]) > HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)]) > HiveTableScan(table=[[perf_constraints, item]], > table:alias=[item]) > {code} > Right side of HiveSemiJoin has an aggregate which could be reduce to have > only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787678#comment-16787678 ] Hive QA commented on HIVE-21382: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 11s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 8s{color} | {color:blue} ql in master has 2258 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 41s{color} | {color:red} ql: The patch generated 4 new + 1 unchanged - 0 fixed = 5 total (was 1) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 4m 19s{color} | {color:red} ql generated 1 new + 2258 unchanged - 0 fixed = 2259 total (was 2258) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 26m 2s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:ql | | | Should org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveRelFieldTrimmer$TableRefFinder be a _static_ inner class? At HiveRelFieldTrimmer.java:inner class? At HiveRelFieldTrimmer.java:[lines 325-339] | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16403/dev-support/hive-personality.sh | | git revision | master / bd95a2f | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16403/yetus/diff-checkstyle-ql.txt | | findbugs | http://104.198.109.242/logs//PreCommit-HIVE-Build-16403/yetus/new-findbugs-ql.html | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16403/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21382.1.patch, HIVE-21382.2.patch, > HIVE-21382.2.patch > > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786050#comment-16786050 ] Hive QA commented on HIVE-21382: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12961272/HIVE-21382.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 15819 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query39] (batchId=275) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query64] (batchId=275) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query39] (batchId=275) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query64] (batchId=275) org.apache.hive.hcatalog.mapreduce.TestHCatPartitioned.testHCatPartitionedTable[1] (batchId=209) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16367/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16367/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16367/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12961272 - PreCommit-HIVE-Build > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21382.1.patch > > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], > cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], > cs_quantity=[$18], cs_list_price=[$20]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, catalog_sales]], > table:alias=[catalog_sales]) > HiveProject(d_date_sk=[$0]) > HiveFilter(condition=[AND(=($6, 1999), =($8, 1))]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$1]) > HiveFilter(condition=[>($3, 4)]) > HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3]) > HiveAggregate(group=[{3, 4, 5}], agg#0=[count()]) > HiveJoin(condition=[=($1, $4)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveJoin(condition=[=($0, $2)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, > store_sales]], table:alias=[store_sales]) > HiveProject(d_date_sk=[$0], d_date=[$2]) > HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)]) > HiveTableScan(table=[[perf_constraints, item]], > table:alias=[item]) > {code} > Right side of HiveSemiJoin has an aggregate which
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785996#comment-16785996 ] Hive QA commented on HIVE-21382: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 51s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 33s{color} | {color:blue} ql in master has 2251 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 41s{color} | {color:red} ql: The patch generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 26m 37s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16367/dev-support/hive-personality.sh | | git revision | master / 0413fec | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16367/yetus/diff-checkstyle-ql.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16367/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21382.1.patch > > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], > cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) >
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785156#comment-16785156 ] Vineet Garg commented on HIVE-21382: This patch also fixes HIVE-21387 > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21382.1.patch > > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], > cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], > cs_quantity=[$18], cs_list_price=[$20]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, catalog_sales]], > table:alias=[catalog_sales]) > HiveProject(d_date_sk=[$0]) > HiveFilter(condition=[AND(=($6, 1999), =($8, 1))]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$1]) > HiveFilter(condition=[>($3, 4)]) > HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3]) > HiveAggregate(group=[{3, 4, 5}], agg#0=[count()]) > HiveJoin(condition=[=($1, $4)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveJoin(condition=[=($0, $2)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, > store_sales]], table:alias=[store_sales]) > HiveProject(d_date_sk=[$0], d_date=[$2]) > HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)]) > HiveTableScan(table=[[perf_constraints, item]], > table:alias=[item]) > {code} > Right side of HiveSemiJoin has an aggregate which could be reduce to have > only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785155#comment-16785155 ] Vineet Garg commented on HIVE-21382: Challenge with query23 is that group by has expressions (e.g. {{substr}}) which couldn't be tracked back to source table making it difficult for optimizer to determine if it is safe or not to remove such keys. I am attaching a patch which should otherwise work. It extends the existing logic to consider group by keys reduction even if columns originate from different source table. > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Attachments: HIVE-21382.1.patch > > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], > cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], > cs_quantity=[$18], cs_list_price=[$20]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, catalog_sales]], > table:alias=[catalog_sales]) > HiveProject(d_date_sk=[$0]) > HiveFilter(condition=[AND(=($6, 1999), =($8, 1))]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$1]) > HiveFilter(condition=[>($3, 4)]) > HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3]) > HiveAggregate(group=[{3, 4, 5}], agg#0=[count()]) > HiveJoin(condition=[=($1, $4)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveJoin(condition=[=($0, $2)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, > store_sales]], table:alias=[store_sales]) > HiveProject(d_date_sk=[$0], d_date=[$2]) > HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)]) > HiveTableScan(table=[[perf_constraints, item]], > table:alias=[item]) > {code} > Right side of HiveSemiJoin has an aggregate which could be reduce to have > only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23
[ https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783947#comment-16783947 ] Vineet Garg commented on HIVE-21382: Current logic of Group by reduction only works if all the group by keys are originated from same table. In this case group by consists of column from item and store_sales table. > Group by keys reduction optimization - keys are not reduced in query23 > -- > > Key: HIVE-21382 > URL: https://issues.apache.org/jira/browse/HIVE-21382 > Project: Hive > Issue Type: Improvement >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > > {code:sql} > explain cbo with frequent_ss_items as > (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date > solddate,count(*) cnt > from store_sales > ,date_dim > ,item > where ss_sold_date_sk = d_date_sk > and ss_item_sk = i_item_sk > and d_year in (1999,1999+1,1999+2,1999+3) > group by substr(i_item_desc,1,30),i_item_sk,d_date > having count(*) >4) > select sum(sales) > from ((select cs_quantity*cs_list_price sales >from catalog_sales >,date_dim >where d_year = 1999 > and d_moy = 1 > and cs_sold_date_sk = d_date_sk > and cs_item_sk in (select item_sk from frequent_ss_items))) subq > limit 100; > {code} > {code:sql} > HiveSortLimit(fetch=[100]) > HiveProject($f0=[$0]) > HiveAggregate(group=[{}], agg#0=[sum($0)]) > HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)]) > HiveSemiJoin(condition=[=($1, $5)], joinType=[inner]) > HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], > cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], > cs_quantity=[$18], cs_list_price=[$20]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, catalog_sales]], > table:alias=[catalog_sales]) > HiveProject(d_date_sk=[$0]) > HiveFilter(condition=[AND(=($6, 1999), =($8, 1))]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$1]) > HiveFilter(condition=[>($3, 4)]) > HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3]) > HiveAggregate(group=[{3, 4, 5}], agg#0=[count()]) > HiveJoin(condition=[=($1, $4)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveJoin(condition=[=($0, $2)], joinType=[inner], > algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]) > HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[perf_constraints, > store_sales]], table:alias=[store_sales]) > HiveProject(d_date_sk=[$0], d_date=[$2]) > HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)]) > HiveTableScan(table=[[perf_constraints, date_dim]], > table:alias=[date_dim]) > HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)]) > HiveTableScan(table=[[perf_constraints, item]], > table:alias=[item]) > {code} > Right side of HiveSemiJoin has an aggregate which could be reduce to have > only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)