[jira] [Commented] (HIVE-18008) Add optimization rule to remove gby from right side of left semi-join
[ https://issues.apache.org/jira/browse/HIVE-18008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246853#comment-16246853 ] Andrew Sherman commented on HIVE-18008: --- Thanks > Add optimization rule to remove gby from right side of left semi-join > - > > Key: HIVE-18008 > URL: https://issues.apache.org/jira/browse/HIVE-18008 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-18008.1.patch, HIVE-18008.2.patch > > > Group by (on same keys as semi join) as right side of Left semi join is > unnecessary and could be removed. We see this pattern in subqueries with > explicit distinct keyword e.g. > {code:sql} > explain select * from src b where b.key in (select distinct key from src a > where a.value = b.value) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18008) Add optimization rule to remove gby from right side of left semi-join
[ https://issues.apache.org/jira/browse/HIVE-18008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246839#comment-16246839 ] Vineet Garg commented on HIVE-18008: Should be fixed now > Add optimization rule to remove gby from right side of left semi-join > - > > Key: HIVE-18008 > URL: https://issues.apache.org/jira/browse/HIVE-18008 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-18008.1.patch, HIVE-18008.2.patch > > > Group by (on same keys as semi join) as right side of Left semi join is > unnecessary and could be removed. We see this pattern in subqueries with > explicit distinct keyword e.g. > {code:sql} > explain select * from src b where b.key in (select distinct key from src a > where a.value = b.value) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18008) Add optimization rule to remove gby from right side of left semi-join
[ https://issues.apache.org/jira/browse/HIVE-18008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246825#comment-16246825 ] Vineet Garg commented on HIVE-18008: Let me revert this, sorry about the inconvenience. > Add optimization rule to remove gby from right side of left semi-join > - > > Key: HIVE-18008 > URL: https://issues.apache.org/jira/browse/HIVE-18008 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-18008.1.patch, HIVE-18008.2.patch > > > Group by (on same keys as semi join) as right side of Left semi join is > unnecessary and could be removed. We see this pattern in subqueries with > explicit distinct keyword e.g. > {code:sql} > explain select * from src b where b.key in (select distinct key from src a > where a.value = b.value) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18008) Add optimization rule to remove gby from right side of left semi-join
[ https://issues.apache.org/jira/browse/HIVE-18008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246824#comment-16246824 ] Andrew Sherman commented on HIVE-18008: --- I see the same break, just FYI, not piling on :-) I see the new file in the patch but it is not in the commit; {noformat} [~/git/asf/hive]$ git show --name-only ff3b327d322b04916e019fcec75d3fbd48e26bae commit ff3b327d322b04916e019fcec75d3fbd48e26bae (HEAD -> master, origin/master, origin/HEAD) Author: Vineet GargDate: Thu Nov 9 15:54:11 2017 -0800 HIVE-18008 : Add optimization rule to remove gby from right side of left semi-join (Vineet Garg, reviewed by Ashutosh Chauhan) ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java ql/src/test/queries/clientpositive/subquery_in.q ql/src/test/results/clientpositive/llap/subquery_in.q.out ql/src/test/results/clientpositive/spark/subquery_in.q.out ql/src/test/results/clientpositive/subquery_unqualcolumnrefs.q.out {noformat} > Add optimization rule to remove gby from right side of left semi-join > - > > Key: HIVE-18008 > URL: https://issues.apache.org/jira/browse/HIVE-18008 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-18008.1.patch, HIVE-18008.2.patch > > > Group by (on same keys as semi join) as right side of Left semi join is > unnecessary and could be removed. We see this pattern in subqueries with > explicit distinct keyword e.g. > {code:sql} > explain select * from src b where b.key in (select distinct key from src a > where a.value = b.value) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18008) Add optimization rule to remove gby from right side of left semi-join
[ https://issues.apache.org/jira/browse/HIVE-18008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246823#comment-16246823 ] Sergey Shelukhin commented on HIVE-18008: - Appears to have broken the build: {noformat} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project hive-exec: Compilation failure [ERROR] /Users/sergey/git/hivegit/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:[208,57] cannot find symbol [ERROR] symbol: class HiveRemoveGBYSemiJoinRule [ERROR] location: package org.apache.hadoop.hive.ql.optimizer.calcite.rules [ERROR] -> [Help 1] {noformat} > Add optimization rule to remove gby from right side of left semi-join > - > > Key: HIVE-18008 > URL: https://issues.apache.org/jira/browse/HIVE-18008 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-18008.1.patch, HIVE-18008.2.patch > > > Group by (on same keys as semi join) as right side of Left semi join is > unnecessary and could be removed. We see this pattern in subqueries with > explicit distinct keyword e.g. > {code:sql} > explain select * from src b where b.key in (select distinct key from src a > where a.value = b.value) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18008) Add optimization rule to remove gby from right side of left semi-join
[ https://issues.apache.org/jira/browse/HIVE-18008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246592#comment-16246592 ] Ashutosh Chauhan commented on HIVE-18008: - scratch that. Patch LGTM +1 > Add optimization rule to remove gby from right side of left semi-join > - > > Key: HIVE-18008 > URL: https://issues.apache.org/jira/browse/HIVE-18008 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-18008.1.patch > > > Group by (on same keys as semi join) as right side of Left semi join is > unnecessary and could be removed. We see this pattern in subqueries with > explicit distinct keyword e.g. > {code:sql} > explain select * from src b where b.key in (select distinct key from src a > where a.value = b.value) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18008) Add optimization rule to remove gby from right side of left semi-join
[ https://issues.apache.org/jira/browse/HIVE-18008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244259#comment-16244259 ] Ashutosh Chauhan commented on HIVE-18008: - {code} joinInfo.rightSet().equals(ImmutableBitSet.range(rightAggregate.getGroupCount())); {code} Just count check may not be sufficient, we should also check if they are same column. > Add optimization rule to remove gby from right side of left semi-join > - > > Key: HIVE-18008 > URL: https://issues.apache.org/jira/browse/HIVE-18008 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-18008.1.patch > > > Group by (on same keys as semi join) as right side of Left semi join is > unnecessary and could be removed. We see this pattern in subqueries with > explicit distinct keyword e.g. > {code:sql} > explain select * from src b where b.key in (select distinct key from src a > where a.value = b.value) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-18008) Add optimization rule to remove gby from right side of left semi-join
[ https://issues.apache.org/jira/browse/HIVE-18008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243999#comment-16243999 ] Hive QA commented on HIVE-18008: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12896535/HIVE-18008.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 11372 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=62) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=156) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucketmapjoin7] (batchId=173) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[schemeAuthority] (batchId=174) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=102) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=94) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucketmapjoin7] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_in] (batchId=131) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi] (batchId=111) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] (batchId=243) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] (batchId=243) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=206) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7704/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7704/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7704/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12896535 - PreCommit-HIVE-Build > Add optimization rule to remove gby from right side of left semi-join > - > > Key: HIVE-18008 > URL: https://issues.apache.org/jira/browse/HIVE-18008 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-18008.1.patch > > > Group by (on same keys as semi join) as right side of Left semi join is > unnecessary and could be removed. We see this pattern in subqueries with > explicit distinct keyword e.g. > {code:sql} > explain select * from src b where b.key in (select distinct key from src a > where a.value = b.value) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)