[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries
[ https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669316#comment-15669316 ] Pengcheng Xiong commented on HIVE-10901: [~gopalv], I am not sure how many reducers were used in jenkins, but it may be related to what you described last time. > Optimize mutli column distinct queries > > > Key: HIVE-10901 > URL: https://issues.apache.org/jira/browse/HIVE-10901 > Project: Hive > Issue Type: New Feature > Components: CBO, Logical Optimizer >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Pengcheng Xiong > Attachments: HIVE-10901.02.patch, HIVE-10901.03.patch, > HIVE-10901.patch > > > HIVE-10568 is useful only when there is a distinct on one column. It can be > expanded for multiple column cases too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries
[ https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669134#comment-15669134 ] Hive QA commented on HIVE-10901: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12839085/HIVE-10901.03.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10680 tests executed *Failed tests:* {noformat} TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=102) [skewjoinopt3.q,smb_mapjoin_4.q,timestamp_comparison.q,union_remove_10.q,mapreduce2.q,bucketmapjoin_negative.q,udf_in_file.q,auto_join12.q,skewjoin.q,vector_left_outer_join.q,semijoin.q,skewjoinopt9.q,smb_mapjoin_3.q,stats10.q,nullgroup4.q] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=133) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=91) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] (batchId=91) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[multi_count_distinct] (batchId=90) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2142/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2142/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2142/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12839085 - PreCommit-HIVE-Build > Optimize mutli column distinct queries > > > Key: HIVE-10901 > URL: https://issues.apache.org/jira/browse/HIVE-10901 > Project: Hive > Issue Type: New Feature > Components: CBO, Logical Optimizer >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Pengcheng Xiong > Attachments: HIVE-10901.02.patch, HIVE-10901.03.patch, > HIVE-10901.patch > > > HIVE-10568 is useful only when there is a distinct on one column. It can be > expanded for multiple column cases too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries
[ https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665710#comment-15665710 ] Hive QA commented on HIVE-10901: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12838854/HIVE-10901.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10695 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_count_distinct] (batchId=48) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_grouping_sets] (batchId=75) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown3] (batchId=141) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_grouping_sets] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_distinct_gby] (batchId=149) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_1] (batchId=90) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] (batchId=91) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query70] (batchId=219) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join18_multi_distinct] (batchId=103) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join18_multi_distinct] (batchId=104) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[limit_pushdown] (batchId=121) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2118/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2118/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2118/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12838854 - PreCommit-HIVE-Build > Optimize mutli column distinct queries > > > Key: HIVE-10901 > URL: https://issues.apache.org/jira/browse/HIVE-10901 > Project: Hive > Issue Type: New Feature > Components: CBO, Logical Optimizer >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Pengcheng Xiong > Attachments: HIVE-10901.02.patch, HIVE-10901.patch > > > HIVE-10568 is useful only when there is a distinct on one column. It can be > expanded for multiple column cases too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries
[ https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616435#comment-15616435 ] Ashutosh Chauhan commented on HIVE-10901: - We can use old method implemented in AggregateExpandDistinctAggregatesRule which does this via computing distinct count on each branch and then doing a join. Likely grouping set approach may be more efficient but join approach may be an improvement on state of art in certain cases. > Optimize mutli column distinct queries > > > Key: HIVE-10901 > URL: https://issues.apache.org/jira/browse/HIVE-10901 > Project: Hive > Issue Type: New Feature > Components: CBO, Logical Optimizer >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Ashutosh Chauhan > Attachments: HIVE-10901.patch > > > HIVE-10568 is useful only when there is a distinct on one column. It can be > expanded for multiple column cases too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries
[ https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603939#comment-15603939 ] Hive QA commented on HIVE-10901: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12737058/HIVE-10901.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1775/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1775/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-1775/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2016-10-25 02:26:04.338 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-1775/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2016-10-25 02:26:04.341 + cd apache-github-source-source + git fetch origin >From https://github.com/apache/hive 080de97..7968e1e master -> origin/master + git reset --hard HEAD HEAD is now at 080de97 HIVE-14950 Support integer data type (Zoltan Haindrich via Alan Gates) + git clean -f -d + git checkout master Already on 'master' Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded. (use "git pull" to update your local branch) + git reset --hard origin/master HEAD is now at 7968e1e HIVE-14837: JDBC: standalone jar is missing hadoop core dependencies (Tao Li, via Gopal V) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2016-10-25 02:26:05.441 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveExpandDistinctAggregatesRule.java: No such file or directory error: a/ql/src/test/results/clientpositive/tez/limit_pushdown.q.out: No such file or directory error: a/ql/src/test/results/clientpositive/tez/mrr.q.out: No such file or directory error: a/ql/src/test/results/clientpositive/tez/vectorization_limit.q.out: No such file or directory The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12737058 - PreCommit-HIVE-Build > Optimize mutli column distinct queries > > > Key: HIVE-10901 > URL: https://issues.apache.org/jira/browse/HIVE-10901 > Project: Hive > Issue Type: New Feature > Components: CBO, Logical Optimizer >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Ashutosh Chauhan > Attachments: HIVE-10901.patch > > > HIVE-10568 is useful only when there is a distinct on one column. It can be > expanded for multiple column cases too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries
[ https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603097#comment-15603097 ] Ashutosh Chauhan commented on HIVE-10901: - Calcite has a rewrite for this query pattern using grouping-sets which was introduced in CALCITE-732 . However, while adapting that for hive ran into a limitation of Hive which is HIVE-15045 Currently, Hive doesn't support grouping sets when a column is part of both grouping set as well as aggregation function, which is what rewrite of CALCITE-732 will result in. This limitation is physical since it seems current GroupBy operator cannot handle this use case. > Optimize mutli column distinct queries > > > Key: HIVE-10901 > URL: https://issues.apache.org/jira/browse/HIVE-10901 > Project: Hive > Issue Type: New Feature > Components: CBO, Logical Optimizer >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Ashutosh Chauhan > Attachments: HIVE-10901.patch > > > HIVE-10568 is useful only when there is a distinct on one column. It can be > expanded for multiple column cases too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries
[ https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570191#comment-14570191 ] Hive QA commented on HIVE-10901: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12737058/HIVE-10901.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 8988 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_count org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_groupby3 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2 org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4150/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4150/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4150/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12737058 - PreCommit-HIVE-TRUNK-Build Optimize mutli column distinct queries Key: HIVE-10901 URL: https://issues.apache.org/jira/browse/HIVE-10901 Project: Hive Issue Type: New Feature Components: CBO, Logical Optimizer Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Ashutosh Chauhan Attachments: HIVE-10901.patch HIVE-10568 is useful only when there is a distinct on one column. It can be expanded for multiple column cases too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)