[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries

2016-11-15 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669316#comment-15669316
 ] 

Pengcheng Xiong commented on HIVE-10901:


[~gopalv], I am not sure how many reducers were used in jenkins, but it may be 
related to what you described last time.

> Optimize  mutli column distinct queries 
> 
>
> Key: HIVE-10901
> URL: https://issues.apache.org/jira/browse/HIVE-10901
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10901.02.patch, HIVE-10901.03.patch, 
> HIVE-10901.patch
>
>
> HIVE-10568 is useful only when there is a distinct on one column. It can be 
> expanded for multiple column cases too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries

2016-11-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669134#comment-15669134
 ] 

Hive QA commented on HIVE-10901:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12839085/HIVE-10901.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10680 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=102)

[skewjoinopt3.q,smb_mapjoin_4.q,timestamp_comparison.q,union_remove_10.q,mapreduce2.q,bucketmapjoin_negative.q,udf_in_file.q,auto_join12.q,skewjoin.q,vector_left_outer_join.q,semijoin.q,skewjoinopt9.q,smb_mapjoin_3.q,stats10.q,nullgroup4.q]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=91)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=91)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[multi_count_distinct]
 (batchId=90)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2142/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2142/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2142/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12839085 - PreCommit-HIVE-Build

> Optimize  mutli column distinct queries 
> 
>
> Key: HIVE-10901
> URL: https://issues.apache.org/jira/browse/HIVE-10901
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10901.02.patch, HIVE-10901.03.patch, 
> HIVE-10901.patch
>
>
> HIVE-10568 is useful only when there is a distinct on one column. It can be 
> expanded for multiple column cases too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries

2016-11-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665710#comment-15665710
 ] 

Hive QA commented on HIVE-10901:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12838854/HIVE-10901.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10695 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_count_distinct] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_grouping_sets] 
(batchId=75)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown3]
 (batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_grouping_sets]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_distinct_gby]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_1] 
(batchId=90)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=91)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query70] 
(batchId=219)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join18_multi_distinct]
 (batchId=103)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join18_multi_distinct]
 (batchId=104)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[limit_pushdown] 
(batchId=121)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2118/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2118/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2118/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12838854 - PreCommit-HIVE-Build

> Optimize  mutli column distinct queries 
> 
>
> Key: HIVE-10901
> URL: https://issues.apache.org/jira/browse/HIVE-10901
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Attachments: HIVE-10901.02.patch, HIVE-10901.patch
>
>
> HIVE-10568 is useful only when there is a distinct on one column. It can be 
> expanded for multiple column cases too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries

2016-10-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616435#comment-15616435
 ] 

Ashutosh Chauhan commented on HIVE-10901:
-

We can  use old method implemented in AggregateExpandDistinctAggregatesRule 
which does this via computing distinct count on each branch and then doing a 
join. Likely grouping set approach may be more efficient but join approach may 
be an improvement on state of art in certain cases.

> Optimize  mutli column distinct queries 
> 
>
> Key: HIVE-10901
> URL: https://issues.apache.org/jira/browse/HIVE-10901
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-10901.patch
>
>
> HIVE-10568 is useful only when there is a distinct on one column. It can be 
> expanded for multiple column cases too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries

2016-10-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603939#comment-15603939
 ] 

Hive QA commented on HIVE-10901:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12737058/HIVE-10901.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1775/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1775/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-1775/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2016-10-25 02:26:04.338
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-1775/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2016-10-25 02:26:04.341
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   080de97..7968e1e  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 080de97 HIVE-14950 Support integer data type (Zoltan Haindrich 
via Alan Gates)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at 7968e1e HIVE-14837: JDBC: standalone jar is missing hadoop core 
dependencies (Tao Li, via Gopal V)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2016-10-25 02:26:05.441
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveExpandDistinctAggregatesRule.java:
 No such file or directory
error: a/ql/src/test/results/clientpositive/tez/limit_pushdown.q.out: No such 
file or directory
error: a/ql/src/test/results/clientpositive/tez/mrr.q.out: No such file or 
directory
error: a/ql/src/test/results/clientpositive/tez/vectorization_limit.q.out: No 
such file or directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12737058 - PreCommit-HIVE-Build

> Optimize  mutli column distinct queries 
> 
>
> Key: HIVE-10901
> URL: https://issues.apache.org/jira/browse/HIVE-10901
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-10901.patch
>
>
> HIVE-10568 is useful only when there is a distinct on one column. It can be 
> expanded for multiple column cases too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries

2016-10-24 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603097#comment-15603097
 ] 

Ashutosh Chauhan commented on HIVE-10901:
-

Calcite has a rewrite for this query pattern using grouping-sets which was 
introduced in CALCITE-732 . However, while adapting that for hive ran into a 
limitation of Hive which is HIVE-15045 Currently, Hive doesn't support grouping 
sets when a column is part of both grouping set as well as aggregation 
function, which is what rewrite of CALCITE-732 will result in. 
This limitation is physical since it seems current GroupBy operator cannot 
handle this use case. 

> Optimize  mutli column distinct queries 
> 
>
> Key: HIVE-10901
> URL: https://issues.apache.org/jira/browse/HIVE-10901
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.0
>Reporter: Mostafa Mokhtar
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-10901.patch
>
>
> HIVE-10568 is useful only when there is a distinct on one column. It can be 
> expanded for multiple column cases too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10901) Optimize mutli column distinct queries

2015-06-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570191#comment-14570191
 ] 

Hive QA commented on HIVE-10901:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12737058/HIVE-10901.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 8988 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_count
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_groupby3
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4150/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4150/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4150/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12737058 - PreCommit-HIVE-TRUNK-Build

 Optimize  mutli column distinct queries 
 

 Key: HIVE-10901
 URL: https://issues.apache.org/jira/browse/HIVE-10901
 Project: Hive
  Issue Type: New Feature
  Components: CBO, Logical Optimizer
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Ashutosh Chauhan
 Attachments: HIVE-10901.patch


 HIVE-10568 is useful only when there is a distinct on one column. It can be 
 expanded for multiple column cases too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)