[jira] [Commented] (HIVE-19097) related equals and in operators may cause inaccurate stats estimations

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569093#comment-16569093
 ] 

Hive QA commented on HIVE-19097:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
53s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
38s{color} | {color:blue} common in master has 64 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
26s{color} | {color:blue} ql in master has 2301 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
50s{color} | {color:red} ql: The patch generated 13 new + 637 unchanged - 19 
fixed = 650 total (was 656) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
41s{color} | {color:red} ql generated 4 new + 2297 unchanged - 4 fixed = 2301 
total (was 2301) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 31m 13s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.interpretNodeAs(PrimitiveTypeInfo,
 ExprNodeDesc) invokes inefficient new Byte(String) constructor; use 
Byte.valueOf(String) instead  At TypeCheckProcFactory.java:Byte(String) 
constructor; use Byte.valueOf(String) instead  At 
TypeCheckProcFactory.java:[line 1259] |
|  |  
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.interpretNodeAs(PrimitiveTypeInfo,
 ExprNodeDesc) invokes inefficient new Integer(String) constructor; use 
Integer.valueOf(String) instead  At TypeCheckProcFactory.java:Integer(String) 
constructor; use Integer.valueOf(String) instead  At 
TypeCheckProcFactory.java:[line 1251] |
|  |  
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.interpretNodeAs(PrimitiveTypeInfo,
 ExprNodeDesc) invokes inefficient new Long(String) constructor; use 
Long.valueOf(String) instead  At TypeCheckProcFactory.java:Long(String) 
constructor; use Long.valueOf(String) instead  At 
TypeCheckProcFactory.java:[line 1253] |
|  |  
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.interpretNodeAs(PrimitiveTypeInfo,
 ExprNodeDesc) invokes inefficient new Short(String) constructor; use 
Short.valueOf(String) instead  At TypeCheckProcFactory.java:Short(String) 
constructor; use Short.valueOf(String) instead  At 
TypeCheckProcFactory.java:[line 1261] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13038/dev-support/hive-personali

[jira] [Commented] (HIVE-20301) Enable vectorization for materialized view rewriting tests

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569079#comment-16569079
 ] 

Hive QA commented on HIVE-20301:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12934355/HIVE-20301.patch

{color:green}SUCCESS:{color} +1 due to 16 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 14859 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13037/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13037/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13037/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12934355 - PreCommit-HIVE-Build

> Enable vectorization for materialized view rewriting tests
> --
>
> Key: HIVE-20301
> URL: https://issues.apache.org/jira/browse/HIVE-20301
> Project: Hive
>  Issue Type: Test
>  Components: Materialized views, Test
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20301.patch, HIVE-20301.patch, HIVE-20301.patch, 
> HIVE-20301.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20118) SessionStateUserAuthenticator.getGroupNames() is always empty

2018-08-03 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-20118:
--
Attachment: HIVE-20118.2.patch

> SessionStateUserAuthenticator.getGroupNames() is always empty
> -
>
> Key: HIVE-20118
> URL: https://issues.apache.org/jira/browse/HIVE-20118
> Project: Hive
>  Issue Type: Improvement
>  Components: Authorization
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-20118.1.patch, HIVE-20118.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20301) Enable vectorization for materialized view rewriting tests

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569060#comment-16569060
 ] 

Hive QA commented on HIVE-20301:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
52s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}  2m  2s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13037/dev-support/hive-personality.sh
 |
| git revision | master / 16225d2 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13037/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Enable vectorization for materialized view rewriting tests
> --
>
> Key: HIVE-20301
> URL: https://issues.apache.org/jira/browse/HIVE-20301
> Project: Hive
>  Issue Type: Test
>  Components: Materialized views, Test
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20301.patch, HIVE-20301.patch, HIVE-20301.patch, 
> HIVE-20301.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20292) Bad join ordering in tpcds query93 with primary constraint defined

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569057#comment-16569057
 ] 

Hive QA commented on HIVE-20292:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12934377/HIVE-20292.5.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 14860 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[q93_with_constraints] 
(batchId=48)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[column_access_stats]
 (batchId=134)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join28] 
(batchId=146)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join32_lessSize] 
(batchId=111)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[mapjoin_subquery2] 
(batchId=111)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[mapjoin_subquery] 
(batchId=131)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[runtime_skewjoin_mapjoin_spark]
 (batchId=133)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[smb_mapjoin_25] 
(batchId=112)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_scalar] 
(batchId=126)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_nested_mapjoin]
 (batchId=116)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query2] 
(batchId=265)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query59] 
(batchId=265)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query95] 
(batchId=265)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13036/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13036/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13036/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12934377 - PreCommit-HIVE-Build

> Bad join ordering in tpcds query93 with primary constraint defined
> --
>
> Key: HIVE-20292
> URL: https://issues.apache.org/jira/browse/HIVE-20292
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20292.1.patch, HIVE-20292.2.patch, 
> HIVE-20292.3.patch, HIVE-20292.4.patch, HIVE-20292.5.patch
>
>
> Query 93 has join (including outer) b/w store_sales, store_return and reason. 
>  Without constraints store_return is joined with reason and then with 
> store_sales.
> But if a primary key is added on store_return (alter table store_returns add 
> constraint tpcds_pk_sr primary key (sr_item_sk, sr_ticket_number) disable 
> novalidate rely) join order becomes ((store_sales, store_return), reason) 
> which is very inefficient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20292) Bad join ordering in tpcds query93 with primary constraint defined

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569044#comment-16569044
 ] 

Hive QA commented on HIVE-20292:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
46s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
27s{color} | {color:blue} ql in master has 2301 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13036/dev-support/hive-personality.sh
 |
| git revision | master / 16225d2 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: itests ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13036/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Bad join ordering in tpcds query93 with primary constraint defined
> --
>
> Key: HIVE-20292
> URL: https://issues.apache.org/jira/browse/HIVE-20292
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20292.1.patch, HIVE-20292.2.patch, 
> HIVE-20292.3.patch, HIVE-20292.4.patch, HIVE-20292.5.patch
>
>
> Query 93 has join (including outer) b/w store_sales, store_return and reason. 
>  Without constraints store_return is joined with reason and then with 
> store_sales.
> But if a primary key is added on store_return (alter table store_returns add 
> constraint tpcds_pk_sr primary key (sr_item_sk, sr_ticket_number) disable 
> novalidate rely) join order becomes ((store_sales, store_return), reason) 
> which is very inefficient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20311) add txn stats checks to some more paths

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569033#comment-16569033
 ] 

Hive QA commented on HIVE-20311:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12934343/HIVE-20311.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 14859 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_stats5] (batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats_partial_size] 
(batchId=19)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization_acid]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_nway_join]
 (batchId=178)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr_2]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_like_2]
 (batchId=177)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_udf2]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_mapjoin3]
 (batchId=158)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13035/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13035/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13035/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12934343 - PreCommit-HIVE-Build

> add txn stats checks to some more paths
> ---
>
> Key: HIVE-20311
> URL: https://issues.apache.org/jira/browse/HIVE-20311
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-20311.patch
>
>
> These were set to false in the original patch for no reason as far as I see.
> I later added notes but not TODOs to switch them over, so they remained as 
> non-txn.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20292) Bad join ordering in tpcds query93 with primary constraint defined

2018-08-03 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-20292:
---
Status: Open  (was: Patch Available)

> Bad join ordering in tpcds query93 with primary constraint defined
> --
>
> Key: HIVE-20292
> URL: https://issues.apache.org/jira/browse/HIVE-20292
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20292.1.patch, HIVE-20292.2.patch, 
> HIVE-20292.3.patch, HIVE-20292.4.patch, HIVE-20292.5.patch
>
>
> Query 93 has join (including outer) b/w store_sales, store_return and reason. 
>  Without constraints store_return is joined with reason and then with 
> store_sales.
> But if a primary key is added on store_return (alter table store_returns add 
> constraint tpcds_pk_sr primary key (sr_item_sk, sr_ticket_number) disable 
> novalidate rely) join order becomes ((store_sales, store_return), reason) 
> which is very inefficient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20292) Bad join ordering in tpcds query93 with primary constraint defined

2018-08-03 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-20292:
---
Status: Patch Available  (was: Open)

> Bad join ordering in tpcds query93 with primary constraint defined
> --
>
> Key: HIVE-20292
> URL: https://issues.apache.org/jira/browse/HIVE-20292
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20292.1.patch, HIVE-20292.2.patch, 
> HIVE-20292.3.patch, HIVE-20292.4.patch, HIVE-20292.5.patch
>
>
> Query 93 has join (including outer) b/w store_sales, store_return and reason. 
>  Without constraints store_return is joined with reason and then with 
> store_sales.
> But if a primary key is added on store_return (alter table store_returns add 
> constraint tpcds_pk_sr primary key (sr_item_sk, sr_ticket_number) disable 
> novalidate rely) join order becomes ((store_sales, store_return), reason) 
> which is very inefficient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20292) Bad join ordering in tpcds query93 with primary constraint defined

2018-08-03 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-20292:
---
Attachment: HIVE-20292.5.patch

> Bad join ordering in tpcds query93 with primary constraint defined
> --
>
> Key: HIVE-20292
> URL: https://issues.apache.org/jira/browse/HIVE-20292
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20292.1.patch, HIVE-20292.2.patch, 
> HIVE-20292.3.patch, HIVE-20292.4.patch, HIVE-20292.5.patch
>
>
> Query 93 has join (including outer) b/w store_sales, store_return and reason. 
>  Without constraints store_return is joined with reason and then with 
> store_sales.
> But if a primary key is added on store_return (alter table store_returns add 
> constraint tpcds_pk_sr primary key (sr_item_sk, sr_ticket_number) disable 
> novalidate rely) join order becomes ((store_sales, store_return), reason) 
> which is very inefficient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20311) add txn stats checks to some more paths

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569024#comment-16569024
 ] 

Hive QA commented on HIVE-20311:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
22s{color} | {color:blue} ql in master has 2301 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
43s{color} | {color:red} ql: The patch generated 1 new + 59 unchanged - 0 fixed 
= 60 total (was 59) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13035/dev-support/hive-personality.sh
 |
| git revision | master / 16225d2 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13035/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13035/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> add txn stats checks to some more paths
> ---
>
> Key: HIVE-20311
> URL: https://issues.apache.org/jira/browse/HIVE-20311
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-20311.patch
>
>
> These were set to false in the original patch for no reason as far as I see.
> I later added notes but not TODOs to switch them over, so they remained as 
> non-txn.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20109) get rid of COLUMN_STATS_ACCURATE

2018-08-03 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569020#comment-16569020
 ] 

Sergey Shelukhin commented on HIVE-20109:
-

WIP patch. This turned out to be a much larger feature than I anticipated since 
many paths require different structure due to the location of the flags 
changing (e.g. flag is for each stat now and not entire partition, etc.), as 
well as some other non trivial changes (non-trivial as in my brain hurts 
reading this code, actually all the changes are very simple logically).
Remaining areas (marked with TODO# comments):
1) Aggregate stats. Purely mechanical task to propagate and verify lists, but 
painful.
2) Cached store. Mostly mechanical.
3) Conversion script. The existing code is in the comment that needs to be 
converted into a tool.
4) Fixing test failures that alter stats with alter table, and numerous small 
bugs that no doubt exist.

I may return to this patch week after next...

> get rid of COLUMN_STATS_ACCURATE
> 
>
> Key: HIVE-20109
> URL: https://issues.apache.org/jira/browse/HIVE-20109
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-20109.nogen.patch, HIVE-20109.patch
>
>
> I don't know why anyone would come up with an idea of storing a set of 
> booleans in a database using JSON. This has caused various problems in the 
> past (text field limitations, perf issues when parsing a giant string; also 
> bugs because the way it is set is brittle).
> However, now that we are implementing transactional stats, it becomes 
> especially problematic and error prone because the code in Hive sets C_S_A in 
> random places with reckless abandon, whereas we want to change the state of 
> the stats in well defined places where txn semantics can be verified.
> Currently in HIVE-19416, we are handling random things that touch it (from 
> metastore itself to output committers, various stats tasks, commands like 
> truncate, etc.) via a pile of hacks, but the best solution would be to remove 
> it completely and replace with a DB table/columns in stats tables that would 
> need to be set explicitly, not via generic alter_table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20109) get rid of COLUMN_STATS_ACCURATE

2018-08-03 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-20109:

Attachment: HIVE-20109.patch
HIVE-20109.nogen.patch

> get rid of COLUMN_STATS_ACCURATE
> 
>
> Key: HIVE-20109
> URL: https://issues.apache.org/jira/browse/HIVE-20109
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-20109.nogen.patch, HIVE-20109.patch
>
>
> I don't know why anyone would come up with an idea of storing a set of 
> booleans in a database using JSON. This has caused various problems in the 
> past (text field limitations, perf issues when parsing a giant string; also 
> bugs because the way it is set is brittle).
> However, now that we are implementing transactional stats, it becomes 
> especially problematic and error prone because the code in Hive sets C_S_A in 
> random places with reckless abandon, whereas we want to change the state of 
> the stats in well defined places where txn semantics can be verified.
> Currently in HIVE-19416, we are handling random things that touch it (from 
> metastore itself to output committers, various stats tasks, commands like 
> truncate, etc.) via a pile of hacks, but the best solution would be to remove 
> it completely and replace with a DB table/columns in stats tables that would 
> need to be set explicitly, not via generic alter_table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20109) get rid of COLUMN_STATS_ACCURATE

2018-08-03 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-20109:

Attachment: (was: HIVE-20109.nogen.patch)

> get rid of COLUMN_STATS_ACCURATE
> 
>
> Key: HIVE-20109
> URL: https://issues.apache.org/jira/browse/HIVE-20109
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> I don't know why anyone would come up with an idea of storing a set of 
> booleans in a database using JSON. This has caused various problems in the 
> past (text field limitations, perf issues when parsing a giant string; also 
> bugs because the way it is set is brittle).
> However, now that we are implementing transactional stats, it becomes 
> especially problematic and error prone because the code in Hive sets C_S_A in 
> random places with reckless abandon, whereas we want to change the state of 
> the stats in well defined places where txn semantics can be verified.
> Currently in HIVE-19416, we are handling random things that touch it (from 
> metastore itself to output committers, various stats tasks, commands like 
> truncate, etc.) via a pile of hacks, but the best solution would be to remove 
> it completely and replace with a DB table/columns in stats tables that would 
> need to be set explicitly, not via generic alter_table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20109) get rid of COLUMN_STATS_ACCURATE

2018-08-03 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-20109:

Attachment: HIVE-20109.nogen.patch

> get rid of COLUMN_STATS_ACCURATE
> 
>
> Key: HIVE-20109
> URL: https://issues.apache.org/jira/browse/HIVE-20109
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-20109.nogen.patch
>
>
> I don't know why anyone would come up with an idea of storing a set of 
> booleans in a database using JSON. This has caused various problems in the 
> past (text field limitations, perf issues when parsing a giant string; also 
> bugs because the way it is set is brittle).
> However, now that we are implementing transactional stats, it becomes 
> especially problematic and error prone because the code in Hive sets C_S_A in 
> random places with reckless abandon, whereas we want to change the state of 
> the stats in well defined places where txn semantics can be verified.
> Currently in HIVE-19416, we are handling random things that touch it (from 
> metastore itself to output committers, various stats tasks, commands like 
> truncate, etc.) via a pile of hacks, but the best solution would be to remove 
> it completely and replace with a DB table/columns in stats tables that would 
> need to be set explicitly, not via generic alter_table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17979) Tez: Improve ReduceRecordSource passDownKey copying

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569013#comment-16569013
 ] 

Hive QA commented on HIVE-17979:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12896290/HIVE-17979.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 14859 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning 
(batchId=322)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13034/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13034/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13034/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12896290 - PreCommit-HIVE-Build

> Tez: Improve ReduceRecordSource passDownKey copying
> ---
>
> Key: HIVE-17979
> URL: https://issues.apache.org/jira/browse/HIVE-17979
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-17979.1.patch, HIVE-17979.2.patch
>
>
> Tez does not use a single Key stream for both sides of the join, so each 
> input gets its own ReduceRecordSource 
> {code}
> sources[tag] = new ReduceRecordSource();
> {code}
> And this means for each input stream, there's a deserialized key (because the 
> tag is not part of the Key byte stream), this means for a 2-table join there 
> are 2 ReduceRecordSource objects.
> This means that the passDownKey is only an optimization when the Key, 
> List has more than 1 value in it. Otherwise the copy is entirely 
> wasted CPU cycles, because it deserializes the entire row to extract the key 
> and discards the row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20277) Vectorization: Case expressions that return BOOLEAN are not supported for FILTER

2018-08-03 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-20277:

Attachment: HIVE-20277.03.patch

> Vectorization: Case expressions that return BOOLEAN are not supported for 
> FILTER
> 
>
> Key: HIVE-20277
> URL: https://issues.apache.org/jira/browse/HIVE-20277
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-20277.02.patch, HIVE-20277.03.patch, 
> HIVE-20277.WIP.01.patch
>
>
> In cases like Query89, the vertex with the filter is not vectorized.
> {code}
>Filter Operator
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (null) END 
> (type: boolean)
> {code}
> {code}
> Reducer 3 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> notVectorizedReason: FILTER operator: Unexpected hive type 
> name void
> vectorized: false
> {code}
> The query specifically has 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) else null end > 0.1
> {code}
> while rewriting it to 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) > 0.1 else false end
> {code}
> does vectorize into 
> {code}
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: SelectColumnIsTrue(col 
> 12:boolean)(children: VectorUDFAdaptor(CASE WHEN ((avg_window_0 <> 0.0D)) 
> THEN (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) 
> END)(children: DoubleColNotEqualDoubleScalar(col 7:double, val 0.0) -> 
> 8:boolean, DoubleColGreaterDoubleScalar(col 9:double, val 0.1)(children: 
> DoubleColDivideDoubleColumn(col 10:double, col 7:double)(children: 
> FuncAbsDoubleToDouble(col 9:double)(children: 
> DoubleColSubtractDoubleColumn(col 6:double, col 7:double) -> 9:double) -> 
> 10:double) -> 9:double) -> 11:boolean) -> 12:boolean)
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) END 
> (type: boolean)
>   Statistics: Num rows: 11 Data size: 5291 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20277) Vectorization: Case expressions that return BOOLEAN are not supported for FILTER

2018-08-03 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-20277:

Status: Patch Available  (was: In Progress)

Again.

> Vectorization: Case expressions that return BOOLEAN are not supported for 
> FILTER
> 
>
> Key: HIVE-20277
> URL: https://issues.apache.org/jira/browse/HIVE-20277
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-20277.02.patch, HIVE-20277.03.patch, 
> HIVE-20277.WIP.01.patch
>
>
> In cases like Query89, the vertex with the filter is not vectorized.
> {code}
>Filter Operator
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (null) END 
> (type: boolean)
> {code}
> {code}
> Reducer 3 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> notVectorizedReason: FILTER operator: Unexpected hive type 
> name void
> vectorized: false
> {code}
> The query specifically has 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) else null end > 0.1
> {code}
> while rewriting it to 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) > 0.1 else false end
> {code}
> does vectorize into 
> {code}
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: SelectColumnIsTrue(col 
> 12:boolean)(children: VectorUDFAdaptor(CASE WHEN ((avg_window_0 <> 0.0D)) 
> THEN (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) 
> END)(children: DoubleColNotEqualDoubleScalar(col 7:double, val 0.0) -> 
> 8:boolean, DoubleColGreaterDoubleScalar(col 9:double, val 0.1)(children: 
> DoubleColDivideDoubleColumn(col 10:double, col 7:double)(children: 
> FuncAbsDoubleToDouble(col 9:double)(children: 
> DoubleColSubtractDoubleColumn(col 6:double, col 7:double) -> 9:double) -> 
> 10:double) -> 9:double) -> 11:boolean) -> 12:boolean)
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) END 
> (type: boolean)
>   Statistics: Num rows: 11 Data size: 5291 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20277) Vectorization: Case expressions that return BOOLEAN are not supported for FILTER

2018-08-03 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-20277:

Status: In Progress  (was: Patch Available)

> Vectorization: Case expressions that return BOOLEAN are not supported for 
> FILTER
> 
>
> Key: HIVE-20277
> URL: https://issues.apache.org/jira/browse/HIVE-20277
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-20277.02.patch, HIVE-20277.WIP.01.patch
>
>
> In cases like Query89, the vertex with the filter is not vectorized.
> {code}
>Filter Operator
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (null) END 
> (type: boolean)
> {code}
> {code}
> Reducer 3 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> notVectorizedReason: FILTER operator: Unexpected hive type 
> name void
> vectorized: false
> {code}
> The query specifically has 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) else null end > 0.1
> {code}
> while rewriting it to 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) > 0.1 else false end
> {code}
> does vectorize into 
> {code}
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: SelectColumnIsTrue(col 
> 12:boolean)(children: VectorUDFAdaptor(CASE WHEN ((avg_window_0 <> 0.0D)) 
> THEN (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) 
> END)(children: DoubleColNotEqualDoubleScalar(col 7:double, val 0.0) -> 
> 8:boolean, DoubleColGreaterDoubleScalar(col 9:double, val 0.1)(children: 
> DoubleColDivideDoubleColumn(col 10:double, col 7:double)(children: 
> FuncAbsDoubleToDouble(col 9:double)(children: 
> DoubleColSubtractDoubleColumn(col 6:double, col 7:double) -> 9:double) -> 
> 10:double) -> 9:double) -> 11:boolean) -> 12:boolean)
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) END 
> (type: boolean)
>   Statistics: Num rows: 11 Data size: 5291 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20302) LLAP: non-vectorized execution in IO ignores virtual columns, including ROW__ID

2018-08-03 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20302:
---
   Resolution: Fixed
Fix Version/s: 3.2.0
   4.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master, branch-3. Thanks for reviewing [~sershe]

> LLAP: non-vectorized execution in IO ignores virtual columns, including 
> ROW__ID
> ---
>
> Key: HIVE-20302
> URL: https://issues.apache.org/jira/browse/HIVE-20302
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-20302.01.patch, HIVE-20302.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17979) Tez: Improve ReduceRecordSource passDownKey copying

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568990#comment-16568990
 ] 

Hive QA commented on HIVE-17979:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
17s{color} | {color:blue} ql in master has 2301 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
45s{color} | {color:red} ql: The patch generated 1 new + 136 unchanged - 0 
fixed = 137 total (was 136) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 58s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13034/dev-support/hive-personality.sh
 |
| git revision | master / ce2754d |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13034/yetus/diff-checkstyle-ql.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13034/yetus/whitespace-eol.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13034/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Tez: Improve ReduceRecordSource passDownKey copying
> ---
>
> Key: HIVE-17979
> URL: https://issues.apache.org/jira/browse/HIVE-17979
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-17979.1.patch, HIVE-17979.2.patch
>
>
> Tez does not use a single Key stream for both sides of the join, so each 
> input gets its own ReduceRecordSource 
> {code}
> sources[tag] = new ReduceRecordSource();
> {code}
> And this means for each input stream, there's a deserialized key (because the 
> tag is not part of the Key byte stream), this means for a 2-table join there 
> are 2 ReduceRecordSource objects.
> This means that the passDownKey is only an optimization when the Key, 
> List has more than 1 value in it. Otherwise the copy is entirely 
> wasted CPU cycles, because it deserializes the entire row to extract the key 
> and discards the row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20304) When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, and the execution engine is mr, same stage may launch twice due to the wrong generated plan

2018-08-03 Thread Hui Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568989#comment-16568989
 ] 

Hui Huang commented on HIVE-20304:
--

hi, [~belugabehr] I run the sql provided by HIVE-14557 in my local environment 
, and I got different errors with this issue.

> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is mr, same stage may launch twice due to the wrong 
> generated plan
> 
>
> Key: HIVE-20304
> URL: https://issues.apache.org/jira/browse/HIVE-20304
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Major
> Fix For: 1.2.1, 2.3.3
>
> Attachments: HIVE-20304.1.patch, HIVE-20304.patch
>
>
> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is set to mr, same stage of a query may launch twice 
> due to the wrong generated plan. If hive.exec.parallel is also true, the same 
> stage will launch at the same time and the job will failed due to the first 
> completed stage clear the map.xml/reduce.xml file stored in the hdfs.
> use following sql to reproduce the issue:
> {code:java}
> CREATE TABLE `tbl1`(
>   `fence` string);
> CREATE TABLE `tbl2`(
>   `order_id` string,
>   `phone` string,
>   `search_id` string
> )
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl3`(
>   `order_id` string,
>   `platform` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl4`(
>   `groupname` string,
>   `phone` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl5`(
>   `search_id` string,
>   `fence` string)
> PARTITIONED BY (
>   `dt` string);
> SET hive.exec.parallel = TRUE;
> SET hive.auto.convert.join = TRUE;
> SET hive.optimize.skewjoin = TRUE;
> SELECT dt,
>platform,
>groupname,
>count(1) as cnt
> FROM
> (SELECT dt,
> platform,
> groupname
>  FROM
>  (SELECT fence
>   FROM tbl1)ta
>JOIN
>(SELECT a0.dt,
>a1.platform,
>a2.groupname,
>a3.fence
> FROM
> (SELECT dt,
> order_id,
> phone,
> search_id
>  FROM tbl2
>  WHERE dt =20180703 )a0
>   JOIN
>   (SELECT order_id,
>   platform,
>   dt
>FROM tbl3
>WHERE dt =20180703 )a1 ON a0.order_id = a1.order_id
>   INNER JOIN
>   (SELECT groupname,
>   phone,
>   dt
>FROM tbl4
>WHERE dt =20180703 )a2 ON a0.phone = a2.phone
>   LEFT JOIN
>   (SELECT search_id,
>   fence,
>   dt
>FROM tbl5
>WHERE dt =20180703)a3 ON a0.search_id = a3.search_id)t0 ON 
> ta.fence = t0.fence)t11
> GROUP BY dt,
>  platform,
>  groupname;
> DROP TABLE tbl1;
> DROP TABLE tbl2;
> DROP TABLE tbl3;
> DROP TABLE tbl4;
> DROP TABLE tbl5;
> {code}
> We will get some error message like this:
> Examining task ID: task_1531284442065_3637_m_00 (and more) from job 
> job_1531284442065_3637
> Task with the most failures(4):
> -
> Task ID:
>   task_1531284442065_3637_m_00
> URL:
>   
> http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1531284442065_3637&tipid=task_1531284442065_3637_m_00
> -
> Diagnostic Messages for this Task:
> File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> java.io.FileNotFoundException: File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> Looking into the plan by executing explain, I found that the Stage-4 and 
> Stage-5 can reached from multi root tasks.
> {code:java}
> Explain
> STAGE DEPENDENCIES:
>   Stage-21 is a root stage , consists of Stage-34, Stage-5
>   Stage-34 has a backup stage: Stage-5
>   Stage-20 depends on stages: Stage-34
>   Stage-17 depends on stages: Stage-5, Stage-18, Stage-20 , consists of 
> Stage-32, Stage-33, Stage-1
>   Stage-32 has a backup stage: Stage-1
>   Stage-15 depends on stages: Stage-32
>   Stag

[jira] [Updated] (HIVE-20314) Include partition pruning in materialized view rewriting

2018-08-03 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20314:
---
Attachment: HIVE-20314.patch

> Include partition pruning in materialized view rewriting
> 
>
> Key: HIVE-20314
> URL: https://issues.apache.org/jira/browse/HIVE-20314
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20314.patch
>
>
> To be able to reduce the cost of the expression using the materialized view 
> when some of its partitions are pruned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20314) Include partition pruning in materialized view rewriting

2018-08-03 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20314:
---
Status: Patch Available  (was: In Progress)

> Include partition pruning in materialized view rewriting
> 
>
> Key: HIVE-20314
> URL: https://issues.apache.org/jira/browse/HIVE-20314
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> To be able to reduce the cost of the expression using the materialized view 
> when some of its partitions are pruned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HIVE-20314) Include partition pruning in materialized view rewriting

2018-08-03 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-20314 started by Jesus Camacho Rodriguez.
--
> Include partition pruning in materialized view rewriting
> 
>
> Key: HIVE-20314
> URL: https://issues.apache.org/jira/browse/HIVE-20314
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> To be able to reduce the cost of the expression using the materialized view 
> when some of its partitions are pruned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20314) Include partition pruning in materialized view rewriting

2018-08-03 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-20314:
--


> Include partition pruning in materialized view rewriting
> 
>
> Key: HIVE-20314
> URL: https://issues.apache.org/jira/browse/HIVE-20314
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> To be able to reduce the cost of the expression using the materialized view 
> when some of its partitions are pruned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20302) LLAP: non-vectorized execution in IO ignores virtual columns, including ROW__ID

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568981#comment-16568981
 ] 

Hive QA commented on HIVE-20302:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12934340/HIVE-20302.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 14859 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13033/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13033/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13033/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12934340 - PreCommit-HIVE-Build

> LLAP: non-vectorized execution in IO ignores virtual columns, including 
> ROW__ID
> ---
>
> Key: HIVE-20302
> URL: https://issues.apache.org/jira/browse/HIVE-20302
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20302.01.patch, HIVE-20302.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20302) LLAP: non-vectorized execution in IO ignores virtual columns, including ROW__ID

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568974#comment-16568974
 ] 

Hive QA commented on HIVE-20302:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
32s{color} | {color:blue} ql in master has 2301 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
38s{color} | {color:red} ql: The patch generated 3 new + 54 unchanged - 2 fixed 
= 57 total (was 56) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 51s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13033/dev-support/hive-personality.sh
 |
| git revision | master / b02012f |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13033/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13033/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> LLAP: non-vectorized execution in IO ignores virtual columns, including 
> ROW__ID
> ---
>
> Key: HIVE-20302
> URL: https://issues.apache.org/jira/browse/HIVE-20302
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20302.01.patch, HIVE-20302.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20292) Bad join ordering in tpcds query93 with primary constraint defined

2018-08-03 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568971#comment-16568971
 ] 

Vineet Garg commented on HIVE-20292:


Review request is at: https://reviews.apache.org/r/68202/

> Bad join ordering in tpcds query93 with primary constraint defined
> --
>
> Key: HIVE-20292
> URL: https://issues.apache.org/jira/browse/HIVE-20292
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20292.1.patch, HIVE-20292.2.patch, 
> HIVE-20292.3.patch, HIVE-20292.4.patch
>
>
> Query 93 has join (including outer) b/w store_sales, store_return and reason. 
>  Without constraints store_return is joined with reason and then with 
> store_sales.
> But if a primary key is added on store_return (alter table store_returns add 
> constraint tpcds_pk_sr primary key (sr_item_sk, sr_ticket_number) disable 
> novalidate rely) join order becomes ((store_sales, store_return), reason) 
> which is very inefficient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19097) related equals and in operators may cause inaccurate stats estimations

2018-08-03 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-19097:

Attachment: HIVE-19097.13.patch

> related equals and in operators may cause inaccurate stats estimations
> --
>
> Key: HIVE-19097
> URL: https://issues.apache.org/jira/browse/HIVE-19097
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19097.01.patch, HIVE-19097.02.patch, 
> HIVE-19097.03.patch, HIVE-19097.04.patch, HIVE-19097.05.patch, 
> HIVE-19097.06.patch, HIVE-19097.06wip01.patch, HIVE-19097.06wip02.patch, 
> HIVE-19097.07.patch, HIVE-19097.08.patch, HIVE-19097.08.patch, 
> HIVE-19097.09.patch, HIVE-19097.10.patch, HIVE-19097.11.patch, 
> HIVE-19097.12.patch, HIVE-19097.13.patch, HIVE-19097.partial.patch
>
>
> tpcds#74 is optimized in a way that for date_dim the condition contains IN 
> and = for the same column
> {code:java}
> | Map Operator Tree: |
> | TableScan  |
> |   alias: date_dim  |
> |   filterExpr: (((d_year) IN (2001, 2002) and (d_year = 
> 2002) and d_date_sk is not null) or ((d_year) IN (2001, 2002) and (d_year = 
> 2001) and d_date_sk is not null)) (type: boolean) |
> |   Statistics: Num rows: 73049 Data size: 876588 Basic 
> stats: COMPLETE Column stats: COMPLETE |
> |   Filter Operator  |
> | predicate: ((d_year) IN (2001, 2002) and (d_year = 
> 2002) and d_date_sk is not null) (type: boolean) |
> | Statistics: Num rows: 4 Data size: 48 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> {code}
> the "real" row count will be 365
> for separate {{IN}} and {{=}} the estimation is very good; but if both are 
> present it becomes (very) underestimated.
> {code:java}
> set hive.query.results.cache.enabled=false;
> drop table if exists t1;
> drop table if exists t8;
> create table t1 (a integer,b integer);
> create table t8 like t1;
> insert into t1 values (1,1),(2,2),(3,3),(4,4),(5,5);
> insert into t8
> select * from t1 union all select * from t1 union all select * from t1 union 
> all select * from t1 union all
> select * from t1 union all select * from t1 union all select * from t1 union 
> all select * from t1
> ;
> analyze table t1 compute statistics for columns;
> analyze table t8 compute statistics for columns;
> explain analyze select sum(a) from t8 where b in (2,3) group by b;
> explain analyze select sum(a) from t8 where b=2 group by b;
> explain analyze select sum(a) from t1 where b in (2,3) and b=2 group by b;
> explain analyze select sum(a) from t8 where b in (2,3) and b=2 group by b;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19097) related equals and in operators may cause inaccurate stats estimations

2018-08-03 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-19097:

Status: Patch Available  (was: Open)

> related equals and in operators may cause inaccurate stats estimations
> --
>
> Key: HIVE-19097
> URL: https://issues.apache.org/jira/browse/HIVE-19097
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19097.01.patch, HIVE-19097.02.patch, 
> HIVE-19097.03.patch, HIVE-19097.04.patch, HIVE-19097.05.patch, 
> HIVE-19097.06.patch, HIVE-19097.06wip01.patch, HIVE-19097.06wip02.patch, 
> HIVE-19097.07.patch, HIVE-19097.08.patch, HIVE-19097.08.patch, 
> HIVE-19097.09.patch, HIVE-19097.10.patch, HIVE-19097.11.patch, 
> HIVE-19097.12.patch, HIVE-19097.13.patch, HIVE-19097.partial.patch
>
>
> tpcds#74 is optimized in a way that for date_dim the condition contains IN 
> and = for the same column
> {code:java}
> | Map Operator Tree: |
> | TableScan  |
> |   alias: date_dim  |
> |   filterExpr: (((d_year) IN (2001, 2002) and (d_year = 
> 2002) and d_date_sk is not null) or ((d_year) IN (2001, 2002) and (d_year = 
> 2001) and d_date_sk is not null)) (type: boolean) |
> |   Statistics: Num rows: 73049 Data size: 876588 Basic 
> stats: COMPLETE Column stats: COMPLETE |
> |   Filter Operator  |
> | predicate: ((d_year) IN (2001, 2002) and (d_year = 
> 2002) and d_date_sk is not null) (type: boolean) |
> | Statistics: Num rows: 4 Data size: 48 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> {code}
> the "real" row count will be 365
> for separate {{IN}} and {{=}} the estimation is very good; but if both are 
> present it becomes (very) underestimated.
> {code:java}
> set hive.query.results.cache.enabled=false;
> drop table if exists t1;
> drop table if exists t8;
> create table t1 (a integer,b integer);
> create table t8 like t1;
> insert into t1 values (1,1),(2,2),(3,3),(4,4),(5,5);
> insert into t8
> select * from t1 union all select * from t1 union all select * from t1 union 
> all select * from t1 union all
> select * from t1 union all select * from t1 union all select * from t1 union 
> all select * from t1
> ;
> analyze table t1 compute statistics for columns;
> analyze table t8 compute statistics for columns;
> explain analyze select sum(a) from t8 where b in (2,3) group by b;
> explain analyze select sum(a) from t8 where b=2 group by b;
> explain analyze select sum(a) from t1 where b in (2,3) and b=2 group by b;
> explain analyze select sum(a) from t8 where b in (2,3) and b=2 group by b;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19097) related equals and in operators may cause inaccurate stats estimations

2018-08-03 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-19097:

Status: Open  (was: Patch Available)

> related equals and in operators may cause inaccurate stats estimations
> --
>
> Key: HIVE-19097
> URL: https://issues.apache.org/jira/browse/HIVE-19097
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19097.01.patch, HIVE-19097.02.patch, 
> HIVE-19097.03.patch, HIVE-19097.04.patch, HIVE-19097.05.patch, 
> HIVE-19097.06.patch, HIVE-19097.06wip01.patch, HIVE-19097.06wip02.patch, 
> HIVE-19097.07.patch, HIVE-19097.08.patch, HIVE-19097.08.patch, 
> HIVE-19097.09.patch, HIVE-19097.10.patch, HIVE-19097.11.patch, 
> HIVE-19097.12.patch, HIVE-19097.partial.patch
>
>
> tpcds#74 is optimized in a way that for date_dim the condition contains IN 
> and = for the same column
> {code:java}
> | Map Operator Tree: |
> | TableScan  |
> |   alias: date_dim  |
> |   filterExpr: (((d_year) IN (2001, 2002) and (d_year = 
> 2002) and d_date_sk is not null) or ((d_year) IN (2001, 2002) and (d_year = 
> 2001) and d_date_sk is not null)) (type: boolean) |
> |   Statistics: Num rows: 73049 Data size: 876588 Basic 
> stats: COMPLETE Column stats: COMPLETE |
> |   Filter Operator  |
> | predicate: ((d_year) IN (2001, 2002) and (d_year = 
> 2002) and d_date_sk is not null) (type: boolean) |
> | Statistics: Num rows: 4 Data size: 48 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> {code}
> the "real" row count will be 365
> for separate {{IN}} and {{=}} the estimation is very good; but if both are 
> present it becomes (very) underestimated.
> {code:java}
> set hive.query.results.cache.enabled=false;
> drop table if exists t1;
> drop table if exists t8;
> create table t1 (a integer,b integer);
> create table t8 like t1;
> insert into t1 values (1,1),(2,2),(3,3),(4,4),(5,5);
> insert into t8
> select * from t1 union all select * from t1 union all select * from t1 union 
> all select * from t1 union all
> select * from t1 union all select * from t1 union all select * from t1 union 
> all select * from t1
> ;
> analyze table t1 compute statistics for columns;
> analyze table t8 compute statistics for columns;
> explain analyze select sum(a) from t8 where b in (2,3) group by b;
> explain analyze select sum(a) from t8 where b=2 group by b;
> explain analyze select sum(a) from t1 where b in (2,3) and b=2 group by b;
> explain analyze select sum(a) from t8 where b in (2,3) and b=2 group by b;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HIVE-20118) SessionStateUserAuthenticator.getGroupNames() is always empty

2018-08-03 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reopened HIVE-20118:


[~daijy], I have reverted this patch from master since all fixes need a green 
run before they can be pushed. Please resubmit for a clean run and push it 
again. Thanks

> SessionStateUserAuthenticator.getGroupNames() is always empty
> -
>
> Key: HIVE-20118
> URL: https://issues.apache.org/jira/browse/HIVE-20118
> Project: Hive
>  Issue Type: Improvement
>  Components: Authorization
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-20118.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20292) Bad join ordering in tpcds query93 with primary constraint defined

2018-08-03 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568967#comment-16568967
 ] 

Ashutosh Chauhan commented on HIVE-20292:
-

Can you also create RB ?

> Bad join ordering in tpcds query93 with primary constraint defined
> --
>
> Key: HIVE-20292
> URL: https://issues.apache.org/jira/browse/HIVE-20292
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20292.1.patch, HIVE-20292.2.patch, 
> HIVE-20292.3.patch, HIVE-20292.4.patch
>
>
> Query 93 has join (including outer) b/w store_sales, store_return and reason. 
>  Without constraints store_return is joined with reason and then with 
> store_sales.
> But if a primary key is added on store_return (alter table store_returns add 
> constraint tpcds_pk_sr primary key (sr_item_sk, sr_ticket_number) disable 
> novalidate rely) join order becomes ((store_sales, store_return), reason) 
> which is very inefficient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20292) Bad join ordering in tpcds query93 with primary constraint defined

2018-08-03 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-20292:
---
Status: Open  (was: Patch Available)

> Bad join ordering in tpcds query93 with primary constraint defined
> --
>
> Key: HIVE-20292
> URL: https://issues.apache.org/jira/browse/HIVE-20292
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20292.1.patch, HIVE-20292.2.patch, 
> HIVE-20292.3.patch, HIVE-20292.4.patch
>
>
> Query 93 has join (including outer) b/w store_sales, store_return and reason. 
>  Without constraints store_return is joined with reason and then with 
> store_sales.
> But if a primary key is added on store_return (alter table store_returns add 
> constraint tpcds_pk_sr primary key (sr_item_sk, sr_ticket_number) disable 
> novalidate rely) join order becomes ((store_sales, store_return), reason) 
> which is very inefficient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20292) Bad join ordering in tpcds query93 with primary constraint defined

2018-08-03 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-20292:
---
Status: Patch Available  (was: Open)

Latest patch (4) updates golden file. I have verified that all of them are 
expected changes.

> Bad join ordering in tpcds query93 with primary constraint defined
> --
>
> Key: HIVE-20292
> URL: https://issues.apache.org/jira/browse/HIVE-20292
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20292.1.patch, HIVE-20292.2.patch, 
> HIVE-20292.3.patch, HIVE-20292.4.patch
>
>
> Query 93 has join (including outer) b/w store_sales, store_return and reason. 
>  Without constraints store_return is joined with reason and then with 
> store_sales.
> But if a primary key is added on store_return (alter table store_returns add 
> constraint tpcds_pk_sr primary key (sr_item_sk, sr_ticket_number) disable 
> novalidate rely) join order becomes ((store_sales, store_return), reason) 
> which is very inefficient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20292) Bad join ordering in tpcds query93 with primary constraint defined

2018-08-03 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-20292:
---
Attachment: HIVE-20292.4.patch

> Bad join ordering in tpcds query93 with primary constraint defined
> --
>
> Key: HIVE-20292
> URL: https://issues.apache.org/jira/browse/HIVE-20292
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-20292.1.patch, HIVE-20292.2.patch, 
> HIVE-20292.3.patch, HIVE-20292.4.patch
>
>
> Query 93 has join (including outer) b/w store_sales, store_return and reason. 
>  Without constraints store_return is joined with reason and then with 
> store_sales.
> But if a primary key is added on store_return (alter table store_returns add 
> constraint tpcds_pk_sr primary key (sr_item_sk, sr_ticket_number) disable 
> novalidate rely) join order becomes ((store_sales, store_return), reason) 
> which is very inefficient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20301) Enable vectorization for materialized view rewriting tests

2018-08-03 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20301:
---
Attachment: HIVE-20301.patch

> Enable vectorization for materialized view rewriting tests
> --
>
> Key: HIVE-20301
> URL: https://issues.apache.org/jira/browse/HIVE-20301
> Project: Hive
>  Issue Type: Test
>  Components: Materialized views, Test
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20301.patch, HIVE-20301.patch, HIVE-20301.patch, 
> HIVE-20301.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-17683) Add explain locks command

2018-08-03 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17683:
--
   Resolution: Fixed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

> Add explain locks  command
> ---
>
> Key: HIVE-17683
> URL: https://issues.apache.org/jira/browse/HIVE-17683
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Igor Kryvenko
>Priority: Critical
> Fix For: 3.2.0
>
> Attachments: HIVE-17683-branch-3.patch, HIVE-17683.01.patch, 
> HIVE-17683.02.patch, HIVE-17683.03.patch, HIVE-17683.04.patch, 
> HIVE-17683.05.patch, HIVE-17683.06.patch
>
>
> Explore if it's possible to add info about what locks will be asked for to 
> the query plan.
> Lock acquisition (for Acid Lock Manager) is done in 
> DbTxnManager.acquireLocks() which is called once the query starts running.  
> Would need to refactor that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17683) Add explain locks command

2018-08-03 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568960#comment-16568960
 ] 

Eugene Koifman commented on HIVE-17683:
---

committed to branch-3
thanks Igor for the contribution

> Add explain locks  command
> ---
>
> Key: HIVE-17683
> URL: https://issues.apache.org/jira/browse/HIVE-17683
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Igor Kryvenko
>Priority: Critical
> Attachments: HIVE-17683-branch-3.patch, HIVE-17683.01.patch, 
> HIVE-17683.02.patch, HIVE-17683.03.patch, HIVE-17683.04.patch, 
> HIVE-17683.05.patch, HIVE-17683.06.patch
>
>
> Explore if it's possible to add info about what locks will be asked for to 
> the query plan.
> Lock acquisition (for Acid Lock Manager) is done in 
> DbTxnManager.acquireLocks() which is called once the query starts running.  
> Would need to refactor that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-03 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568956#comment-16568956
 ] 

Jason Dere commented on HIVE-20300:
---

Took a look a the use of the LlapOutputFormatService within 
VectorFileSinkArrowOperator and left one comment. [~mmccline]/[~teddy.choi] 
might be able to take a better look at the vectorization changes than I can.

> VectorFileSinkArrowOperator
> ---
>
> Key: HIVE-20300
> URL: https://issues.apache.org/jira/browse/HIVE-20300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20300.1.patch, HIVE-20300.2.patch
>
>
> Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
> LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20301) Enable vectorization for materialized view rewriting tests

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568953#comment-16568953
 ] 

Hive QA commented on HIVE-20301:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12934330/HIVE-20301.patch

{color:green}SUCCESS:{color} +1 due to 16 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 14829 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=170)

[bucketsortoptimize_insert_7.q,cbo_windowing.q,groupby_rollup_empty.q,vector_case_when_2.q,strict_managed_tables_sysdb.q,subquery_corr.q,vector_decimal_expressions.q,merge1.q,cbo_rp_join.q,materialized_view_rewrite_ssb_2.q,windowing.q,vector_aggregate_without_gby.q,vector_windowing_streaming.q,materialized_view_create_rewrite_dummy.q,tez_smb_reduce_side.q,sample10_mm.q,orc_create.q,vector_partition_diff_num_cols.q,bucketmapjoin6.q,update_all_partitioned.q,schema_evol_text_vec_part.q,lineage2.q,auto_sortmerge_join_16.q,auto_join29.q,dynpart_sort_optimization.q,bucket_num_reducers_acid.q,tez_join_hash.q,multi_insert_lateral_view.q,ptf_streaming.q,non_native_window_udf.q]
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13032/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13032/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13032/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12934330 - PreCommit-HIVE-Build

> Enable vectorization for materialized view rewriting tests
> --
>
> Key: HIVE-20301
> URL: https://issues.apache.org/jira/browse/HIVE-20301
> Project: Hive
>  Issue Type: Test
>  Components: Materialized views, Test
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20301.patch, HIVE-20301.patch, HIVE-20301.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20311) add txn stats checks to some more paths

2018-08-03 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568935#comment-16568935
 ] 

Eugene Koifman commented on HIVE-20311:
---

+1 pending tests

> add txn stats checks to some more paths
> ---
>
> Key: HIVE-20311
> URL: https://issues.apache.org/jira/browse/HIVE-20311
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-20311.patch
>
>
> These were set to false in the original patch for no reason as far as I see.
> I later added notes but not TODOs to switch them over, so they remained as 
> non-txn.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20301) Enable vectorization for materialized view rewriting tests

2018-08-03 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568932#comment-16568932
 ] 

Ashutosh Chauhan commented on HIVE-20301:
-

+1

> Enable vectorization for materialized view rewriting tests
> --
>
> Key: HIVE-20301
> URL: https://issues.apache.org/jira/browse/HIVE-20301
> Project: Hive
>  Issue Type: Test
>  Components: Materialized views, Test
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20301.patch, HIVE-20301.patch, HIVE-20301.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19097) related equals and in operators may cause inaccurate stats estimations

2018-08-03 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568930#comment-16568930
 ] 

Ashutosh Chauhan commented on HIVE-19097:
-

+1

> related equals and in operators may cause inaccurate stats estimations
> --
>
> Key: HIVE-19097
> URL: https://issues.apache.org/jira/browse/HIVE-19097
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19097.01.patch, HIVE-19097.02.patch, 
> HIVE-19097.03.patch, HIVE-19097.04.patch, HIVE-19097.05.patch, 
> HIVE-19097.06.patch, HIVE-19097.06wip01.patch, HIVE-19097.06wip02.patch, 
> HIVE-19097.07.patch, HIVE-19097.08.patch, HIVE-19097.08.patch, 
> HIVE-19097.09.patch, HIVE-19097.10.patch, HIVE-19097.11.patch, 
> HIVE-19097.12.patch, HIVE-19097.partial.patch
>
>
> tpcds#74 is optimized in a way that for date_dim the condition contains IN 
> and = for the same column
> {code:java}
> | Map Operator Tree: |
> | TableScan  |
> |   alias: date_dim  |
> |   filterExpr: (((d_year) IN (2001, 2002) and (d_year = 
> 2002) and d_date_sk is not null) or ((d_year) IN (2001, 2002) and (d_year = 
> 2001) and d_date_sk is not null)) (type: boolean) |
> |   Statistics: Num rows: 73049 Data size: 876588 Basic 
> stats: COMPLETE Column stats: COMPLETE |
> |   Filter Operator  |
> | predicate: ((d_year) IN (2001, 2002) and (d_year = 
> 2002) and d_date_sk is not null) (type: boolean) |
> | Statistics: Num rows: 4 Data size: 48 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> {code}
> the "real" row count will be 365
> for separate {{IN}} and {{=}} the estimation is very good; but if both are 
> present it becomes (very) underestimated.
> {code:java}
> set hive.query.results.cache.enabled=false;
> drop table if exists t1;
> drop table if exists t8;
> create table t1 (a integer,b integer);
> create table t8 like t1;
> insert into t1 values (1,1),(2,2),(3,3),(4,4),(5,5);
> insert into t8
> select * from t1 union all select * from t1 union all select * from t1 union 
> all select * from t1 union all
> select * from t1 union all select * from t1 union all select * from t1 union 
> all select * from t1
> ;
> analyze table t1 compute statistics for columns;
> analyze table t8 compute statistics for columns;
> explain analyze select sum(a) from t8 where b in (2,3) group by b;
> explain analyze select sum(a) from t8 where b=2 group by b;
> explain analyze select sum(a) from t1 where b in (2,3) and b=2 group by b;
> explain analyze select sum(a) from t8 where b in (2,3) and b=2 group by b;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20301) Enable vectorization for materialized view rewriting tests

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568918#comment-16568918
 ] 

Hive QA commented on HIVE-20301:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 4s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}  2m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13032/dev-support/hive-personality.sh
 |
| git revision | master / b02012f |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13032/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Enable vectorization for materialized view rewriting tests
> --
>
> Key: HIVE-20301
> URL: https://issues.apache.org/jira/browse/HIVE-20301
> Project: Hive
>  Issue Type: Test
>  Components: Materialized views, Test
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20301.patch, HIVE-20301.patch, HIVE-20301.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20277) Vectorization: Case expressions that return BOOLEAN are not supported for FILTER

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568908#comment-16568908
 ] 

Hive QA commented on HIVE-20277:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12934326/HIVE-20277.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 14859 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testMultipleTriggers1 
(batchId=251)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testMultipleTriggers2 
(batchId=251)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerCustomCreatedDynamicPartitions
 (batchId=251)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerCustomCreatedDynamicPartitionsMultiInsert
 (batchId=251)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerCustomCreatedDynamicPartitionsUnionAll
 (batchId=251)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerCustomCreatedFiles
 (batchId=251)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerCustomNonExistent
 (batchId=251)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighBytesRead 
(batchId=251)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighShuffleBytes
 (batchId=251)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerShortQueryElapsedTime
 (batchId=251)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerSlowQueryElapsedTime
 (batchId=251)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerSlowQueryExecutionTime
 (batchId=251)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerTotalTasks 
(batchId=251)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerVertexRawInputSplitsNoKill
 (batchId=251)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13031/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13031/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13031/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12934326 - PreCommit-HIVE-Build

> Vectorization: Case expressions that return BOOLEAN are not supported for 
> FILTER
> 
>
> Key: HIVE-20277
> URL: https://issues.apache.org/jira/browse/HIVE-20277
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-20277.02.patch, HIVE-20277.WIP.01.patch
>
>
> In cases like Query89, the vertex with the filter is not vectorized.
> {code}
>Filter Operator
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (null) END 
> (type: boolean)
> {code}
> {code}
> Reducer 3 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> notVectorizedReason: FILTER operator: Unexpected hive type 
> name void
> vectorized: false
> {code}
> The query specifically has 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) else null end > 0.1
> {code}
> while rewriting it to 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) > 0.1 else false end
> {code}
> does vectorize into 
> {code}
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: SelectColumnIsTrue(col 
> 12:boolean)(children: VectorUDFAdaptor(CASE WHEN ((avg_window_0 <> 0.0D)) 
> THEN (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) 
> END)(children: DoubleColNotEqualDoubleScalar(col 7:double, val 0.0) -> 
> 8:boolean, DoubleColGreaterDoubleScalar(col 9:double, val 0.1)(children: 
> DoubleColDivideDoubleColumn(col 10:double, col 7:double)(children: 
> FuncAbsDo

[jira] [Updated] (HIVE-20313) consider making ROW__ID a 1st class object

2018-08-03 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-20313:
--
Description: 
ROW_ID, which is a struct that represents a unique row ID within a partition of 
a full CRUD transactional table is currently modeled as a {{VirtualColumn}}.  
Acid metadata columns from which ROW_ID is built are actually stored in the 
data file.  

There is no end to special handling of acid metadata columns in the code to 
make this work.

Perhaps a better approach is to add struct column to an acid table at creation 
time and make it a 1st class citizen visible in the metastore.  'select 
count(*) ' would need special handling to remove it.  There may need to be 
a way to make these columns read-only.

For data added via Load Data, Add Partition, etc (i.e. original files in a CRUD 
table), acid reader would have fill in the values as it does today.

This would make schema evolution, PPD, projection pruning work seamlessly.
This should also make adding formats other than ORC in full CRUD tables easy.

This will likely be painful but should be investigated.



  was:
ROW__ID, which is a struct that represents a unique row ID within a partition 
of a full CRUD transactional table is currently modeled as a {{VirtualColumn}}. 
 Acid metadata columns from which ROW__ID is built are actually stored in the 
data file.  

There is no end to special handling of acid metadata columns in the code to 
make this work.

Perhaps a better approach is to add struct column to an acid table at creation 
time and make it a 1st class citizen visible in the metastore.  'select 
count(*) ' would need special handling to remove it.  There may need to be 
a way to make these columns read-only.

For data added via Load Data, Add Partition, etc (i.e. original files in a CRUD 
table), acid reader would have fill in the values as it does today.

This would make schema evolution, PPD, projection pruning work seamlessly.
This should also make adding formats other than ORC in full CRUD tables easy.

This will likely be painful but should be investigated.




> consider making ROW__ID a 1st class object
> --
>
> Key: HIVE-20313
> URL: https://issues.apache.org/jira/browse/HIVE-20313
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 0.11.0
>Reporter: Eugene Koifman
>Priority: Major
>
> ROW_ID, which is a struct that represents a unique row ID within a partition 
> of a full CRUD transactional table is currently modeled as a 
> {{VirtualColumn}}.  Acid metadata columns from which ROW_ID is built are 
> actually stored in the data file.  
> There is no end to special handling of acid metadata columns in the code to 
> make this work.
> Perhaps a better approach is to add struct column to an acid table at 
> creation time and make it a 1st class citizen visible in the metastore.  
> 'select count(*) ' would need special handling to remove it.  There may 
> need to be a way to make these columns read-only.
> For data added via Load Data, Add Partition, etc (i.e. original files in a 
> CRUD table), acid reader would have fill in the values as it does today.
> This would make schema evolution, PPD, projection pruning work seamlessly.
> This should also make adding formats other than ORC in full CRUD tables easy.
> This will likely be painful but should be investigated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20118) SessionStateUserAuthenticator.getGroupNames() is always empty

2018-08-03 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-20118:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
   4.0.0
   Status: Resolved  (was: Patch Available)

Patch committed to both master and branch-3. Thanks Thejas for review!

> SessionStateUserAuthenticator.getGroupNames() is always empty
> -
>
> Key: HIVE-20118
> URL: https://issues.apache.org/jira/browse/HIVE-20118
> Project: Hive
>  Issue Type: Improvement
>  Components: Authorization
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-20118.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20277) Vectorization: Case expressions that return BOOLEAN are not supported for FILTER

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568878#comment-16568878
 ] 

Hive QA commented on HIVE-20277:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
17s{color} | {color:blue} ql in master has 2301 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
48s{color} | {color:red} ql: The patch generated 7 new + 382 unchanged - 3 
fixed = 389 total (was 385) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13031/dev-support/hive-personality.sh
 |
| git revision | master / a3cd496 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13031/yetus/diff-checkstyle-ql.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13031/yetus/whitespace-eol.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13031/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Vectorization: Case expressions that return BOOLEAN are not supported for 
> FILTER
> 
>
> Key: HIVE-20277
> URL: https://issues.apache.org/jira/browse/HIVE-20277
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-20277.02.patch, HIVE-20277.WIP.01.patch
>
>
> In cases like Query89, the vertex with the filter is not vectorized.
> {code}
>Filter Operator
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (null) END 
> (type: boolean)
> {code}
> {code}
> Reducer 3 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> notVectorizedReason: FILTER operator: Unexpected hive type 
> name void
> vectorized: false
> {code}
> The query specifically has 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sal

[jira] [Updated] (HIVE-17979) Tez: Improve ReduceRecordSource passDownKey copying

2018-08-03 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17979:

Status: Open  (was: Patch Available)

+1

> Tez: Improve ReduceRecordSource passDownKey copying
> ---
>
> Key: HIVE-17979
> URL: https://issues.apache.org/jira/browse/HIVE-17979
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-17979.1.patch, HIVE-17979.2.patch
>
>
> Tez does not use a single Key stream for both sides of the join, so each 
> input gets its own ReduceRecordSource 
> {code}
> sources[tag] = new ReduceRecordSource();
> {code}
> And this means for each input stream, there's a deserialized key (because the 
> tag is not part of the Key byte stream), this means for a 2-table join there 
> are 2 ReduceRecordSource objects.
> This means that the passDownKey is only an optimization when the Key, 
> List has more than 1 value in it. Otherwise the copy is entirely 
> wasted CPU cycles, because it deserializes the entire row to extract the key 
> and discards the row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-17979) Tez: Improve ReduceRecordSource passDownKey copying

2018-08-03 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17979:

Status: Patch Available  (was: Open)

> Tez: Improve ReduceRecordSource passDownKey copying
> ---
>
> Key: HIVE-17979
> URL: https://issues.apache.org/jira/browse/HIVE-17979
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-17979.1.patch, HIVE-17979.2.patch
>
>
> Tez does not use a single Key stream for both sides of the join, so each 
> input gets its own ReduceRecordSource 
> {code}
> sources[tag] = new ReduceRecordSource();
> {code}
> And this means for each input stream, there's a deserialized key (because the 
> tag is not part of the Key byte stream), this means for a 2-table join there 
> are 2 ReduceRecordSource objects.
> This means that the passDownKey is only an optimization when the Key, 
> List has more than 1 value in it. Otherwise the copy is entirely 
> wasted CPU cycles, because it deserializes the entire row to extract the key 
> and discards the row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20311) add txn stats checks to some more paths

2018-08-03 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568870#comment-16568870
 ] 

Sergey Shelukhin commented on HIVE-20311:
-

[~ekoifman] can you take a look? This might also require some out file changes, 
we'll see if they look valid.

> add txn stats checks to some more paths
> ---
>
> Key: HIVE-20311
> URL: https://issues.apache.org/jira/browse/HIVE-20311
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-20311.patch
>
>
> These were set to false in the original patch for no reason as far as I see.
> I later added notes but not TODOs to switch them over, so they remained as 
> non-txn.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20311) add txn stats checks to some more paths

2018-08-03 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-20311:

Status: Patch Available  (was: Open)

> add txn stats checks to some more paths
> ---
>
> Key: HIVE-20311
> URL: https://issues.apache.org/jira/browse/HIVE-20311
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-20311.patch
>
>
> These were set to false in the original patch for no reason as far as I see.
> I later added notes but not TODOs to switch them over, so they remained as 
> non-txn.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20311) add txn stats checks to some more paths

2018-08-03 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-20311:

Attachment: HIVE-20311.patch

> add txn stats checks to some more paths
> ---
>
> Key: HIVE-20311
> URL: https://issues.apache.org/jira/browse/HIVE-20311
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-20311.patch
>
>
> These were set to false in the original patch for no reason as far as I see.
> I later added notes but not TODOs to switch them over, so they remained as 
> non-txn.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20311) add txn stats checks to some more paths

2018-08-03 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-20311:

Attachment: (was: HIVE-20311.patch)

> add txn stats checks to some more paths
> ---
>
> Key: HIVE-20311
> URL: https://issues.apache.org/jira/browse/HIVE-20311
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> These were set to false in the original patch for no reason as far as I see.
> I later added notes but not TODOs to switch them over, so they remained as 
> non-txn.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20311) add txn stats checks to some more paths

2018-08-03 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-20311:

Attachment: HIVE-20311.patch

> add txn stats checks to some more paths
> ---
>
> Key: HIVE-20311
> URL: https://issues.apache.org/jira/browse/HIVE-20311
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-20311.patch
>
>
> These were set to false in the original patch for no reason as far as I see.
> I later added notes but not TODOs to switch them over, so they remained as 
> non-txn.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService

2018-08-03 Thread Eric Wohlstadter (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-20312 started by Eric Wohlstadter.
---
> Allow arrow clients to use their own BufferAllocator with 
> LlapOutputFormatService
> -
>
> Key: HIVE-20312
> URL: https://issues.apache.org/jira/browse/HIVE-20312
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
>
> Clients should be able to provide their own BufferAllocator to 
> LlapBaseInputFormat if allocator operations depend on client-side logic. For 
> example, clients may want to manage the allocator hierarchy per client-side 
> task, thread, etc.. 
> Currently the client is forced to use one global RootAllocator per process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService

2018-08-03 Thread Eric Wohlstadter (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter reassigned HIVE-20312:
---


> Allow arrow clients to use their own BufferAllocator with 
> LlapOutputFormatService
> -
>
> Key: HIVE-20312
> URL: https://issues.apache.org/jira/browse/HIVE-20312
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
>
> Clients should be able to provide their own BufferAllocator to 
> LlapBaseInputFormat if allocator operations depend on client-side logic. For 
> example, clients may want to manage the allocator hierarchy per client-side 
> task, thread, etc.. 
> Currently the client is forced to use one global RootAllocator per process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20302) LLAP: non-vectorized execution in IO ignores virtual columns, including ROW__ID

2018-08-03 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568859#comment-16568859
 ] 

Jesus Camacho Rodriguez commented on HIVE-20302:


Uploaded a new patch fixing those issues. [~sershe], indeed ROW__ID virtual 
column is a struct comprising the three fields.

> LLAP: non-vectorized execution in IO ignores virtual columns, including 
> ROW__ID
> ---
>
> Key: HIVE-20302
> URL: https://issues.apache.org/jira/browse/HIVE-20302
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20302.01.patch, HIVE-20302.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20311) add txn stats checks to some more paths

2018-08-03 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-20311:
---


> add txn stats checks to some more paths
> ---
>
> Key: HIVE-20311
> URL: https://issues.apache.org/jira/browse/HIVE-20311
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> These were set to false in the original patch for no reason as far as I see.
> I later added notes but not TODOs to switch them over, so they remained as 
> non-txn.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20302) LLAP: non-vectorized execution in IO ignores virtual columns, including ROW__ID

2018-08-03 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20302:
---
Attachment: HIVE-20302.01.patch

> LLAP: non-vectorized execution in IO ignores virtual columns, including 
> ROW__ID
> ---
>
> Key: HIVE-20302
> URL: https://issues.apache.org/jira/browse/HIVE-20302
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20302.01.patch, HIVE-20302.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20310) Scalar subquery throws error when hive.optimize.remove.sq_count_check is on

2018-08-03 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-20310:
---
Description: 
*Reproducer*
{code:sql}
> set hive.optimize.remove.sq_count_check=true;
> create table tempty(i int);
> create table t(c0 int);
> explain select * from t where c0 > (select count(*) from tempty group by 1);
{code}

  was:
*Reproducer*
{code:sql}
> create table tempty(i int);
> create table t(c0 int);
> explain select * from t where c0 > (select count(*) from tempty group by 1);
{code}


> Scalar subquery throws error when hive.optimize.remove.sq_count_check is on
> ---
>
> Key: HIVE-20310
> URL: https://issues.apache.org/jira/browse/HIVE-20310
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>
> *Reproducer*
> {code:sql}
> > set hive.optimize.remove.sq_count_check=true;
> > create table tempty(i int);
> > create table t(c0 int);
> > explain select * from t where c0 > (select count(*) from tempty group by 1);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20310) Scalar subquery throws error when hive.optimize.remove.sq_count_check is on

2018-08-03 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-20310:
--


> Scalar subquery throws error when hive.optimize.remove.sq_count_check is on
> ---
>
> Key: HIVE-20310
> URL: https://issues.apache.org/jira/browse/HIVE-20310
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>
> *Reproducer*
> {code:sql}
> > create table tempty(i int);
> > create table t(c0 int);
> > explain select * from t where c0 > (select count(*) from tempty group by 1);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20291) Allow HiveStreamingConnection to receive a WriteId

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568845#comment-16568845
 ] 

Hive QA commented on HIVE-20291:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12934294/HIVE-20291.2.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 14860 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13030/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13030/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13030/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12934294 - PreCommit-HIVE-Build

> Allow HiveStreamingConnection to receive a WriteId
> --
>
> Key: HIVE-20291
> URL: https://issues.apache.org/jira/browse/HIVE-20291
> Project: Hive
>  Issue Type: Improvement
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20291.1.patch, HIVE-20291.2.patch
>
>
> If the writeId is received externally it won't need to open connections to 
> the metastore. It won't be able to the commit in this case as well so it must 
> be done by the entity passing the writeId.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20291) Allow HiveStreamingConnection to receive a WriteId

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568816#comment-16568816
 ] 

Hive QA commented on HIVE-20291:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
56s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
33s{color} | {color:blue} ql in master has 2301 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
29s{color} | {color:blue} streaming in master has 2 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
20s{color} | {color:red} streaming: The patch generated 4 new + 647 unchanged - 
86 fixed = 651 total (was 733) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 30m 48s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13030/dev-support/hive-personality.sh
 |
| git revision | master / a3cd496 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13030/yetus/diff-checkstyle-streaming.txt
 |
| modules | C: ql streaming U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13030/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Allow HiveStreamingConnection to receive a WriteId
> --
>
> Key: HIVE-20291
> URL: https://issues.apache.org/jira/browse/HIVE-20291
> Project: Hive
>  Issue Type: Improvement
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20291.1.patch, HIVE-20291.2.patch
>
>
> If the writeId is received externally it won't need to open connections to 
> the metastore. It won't be able to the commit in this case as well so it must 
> be done by the entity passing the writeId.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20301) Enable vectorization for materialized view rewriting tests

2018-08-03 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20301:
---
Attachment: HIVE-20301.patch

> Enable vectorization for materialized view rewriting tests
> --
>
> Key: HIVE-20301
> URL: https://issues.apache.org/jira/browse/HIVE-20301
> Project: Hive
>  Issue Type: Test
>  Components: Materialized views, Test
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20301.patch, HIVE-20301.patch, HIVE-20301.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20301) Enable vectorization for materialized view rewriting tests

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568747#comment-16568747
 ] 

Hive QA commented on HIVE-20301:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12934289/HIVE-20301.patch

{color:green}SUCCESS:{color} +1 due to 16 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 14854 tests 
executed
*Failed tests:*
{noformat}
TestMiniDruidCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=192)

[druidmini_dynamic_partition.q,druidmini_test_ts.q,druidmini_expressions.q,druidmini_test_alter.q,druidmini_test_insert.q]
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testTokenAuth (batchId=264)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13029/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13029/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13029/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12934289 - PreCommit-HIVE-Build

> Enable vectorization for materialized view rewriting tests
> --
>
> Key: HIVE-20301
> URL: https://issues.apache.org/jira/browse/HIVE-20301
> Project: Hive
>  Issue Type: Test
>  Components: Materialized views, Test
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20301.patch, HIVE-20301.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20277) Vectorization: Case expressions that return BOOLEAN are not supported for FILTER

2018-08-03 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-20277:

Attachment: HIVE-20277.02.patch

> Vectorization: Case expressions that return BOOLEAN are not supported for 
> FILTER
> 
>
> Key: HIVE-20277
> URL: https://issues.apache.org/jira/browse/HIVE-20277
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-20277.02.patch, HIVE-20277.WIP.01.patch
>
>
> In cases like Query89, the vertex with the filter is not vectorized.
> {code}
>Filter Operator
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (null) END 
> (type: boolean)
> {code}
> {code}
> Reducer 3 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> notVectorizedReason: FILTER operator: Unexpected hive type 
> name void
> vectorized: false
> {code}
> The query specifically has 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) else null end > 0.1
> {code}
> while rewriting it to 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) > 0.1 else false end
> {code}
> does vectorize into 
> {code}
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: SelectColumnIsTrue(col 
> 12:boolean)(children: VectorUDFAdaptor(CASE WHEN ((avg_window_0 <> 0.0D)) 
> THEN (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) 
> END)(children: DoubleColNotEqualDoubleScalar(col 7:double, val 0.0) -> 
> 8:boolean, DoubleColGreaterDoubleScalar(col 9:double, val 0.1)(children: 
> DoubleColDivideDoubleColumn(col 10:double, col 7:double)(children: 
> FuncAbsDoubleToDouble(col 9:double)(children: 
> DoubleColSubtractDoubleColumn(col 6:double, col 7:double) -> 9:double) -> 
> 10:double) -> 9:double) -> 11:boolean) -> 12:boolean)
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) END 
> (type: boolean)
>   Statistics: Num rows: 11 Data size: 5291 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20277) Vectorization: Case expressions that return BOOLEAN are not supported for FILTER

2018-08-03 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-20277:

Status: Patch Available  (was: In Progress)

> Vectorization: Case expressions that return BOOLEAN are not supported for 
> FILTER
> 
>
> Key: HIVE-20277
> URL: https://issues.apache.org/jira/browse/HIVE-20277
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-20277.02.patch, HIVE-20277.WIP.01.patch
>
>
> In cases like Query89, the vertex with the filter is not vectorized.
> {code}
>Filter Operator
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (null) END 
> (type: boolean)
> {code}
> {code}
> Reducer 3 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> notVectorizedReason: FILTER operator: Unexpected hive type 
> name void
> vectorized: false
> {code}
> The query specifically has 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) else null end > 0.1
> {code}
> while rewriting it to 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) > 0.1 else false end
> {code}
> does vectorize into 
> {code}
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: SelectColumnIsTrue(col 
> 12:boolean)(children: VectorUDFAdaptor(CASE WHEN ((avg_window_0 <> 0.0D)) 
> THEN (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) 
> END)(children: DoubleColNotEqualDoubleScalar(col 7:double, val 0.0) -> 
> 8:boolean, DoubleColGreaterDoubleScalar(col 9:double, val 0.1)(children: 
> DoubleColDivideDoubleColumn(col 10:double, col 7:double)(children: 
> FuncAbsDoubleToDouble(col 9:double)(children: 
> DoubleColSubtractDoubleColumn(col 6:double, col 7:double) -> 9:double) -> 
> 10:double) -> 9:double) -> 11:boolean) -> 12:boolean)
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) END 
> (type: boolean)
>   Statistics: Num rows: 11 Data size: 5291 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20277) Vectorization: Case expressions that return BOOLEAN are not supported for FILTER

2018-08-03 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-20277:

Status: In Progress  (was: Patch Available)

> Vectorization: Case expressions that return BOOLEAN are not supported for 
> FILTER
> 
>
> Key: HIVE-20277
> URL: https://issues.apache.org/jira/browse/HIVE-20277
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-20277.WIP.01.patch
>
>
> In cases like Query89, the vertex with the filter is not vectorized.
> {code}
>Filter Operator
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (null) END 
> (type: boolean)
> {code}
> {code}
> Reducer 3 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> notVectorizedReason: FILTER operator: Unexpected hive type 
> name void
> vectorized: false
> {code}
> The query specifically has 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) else null end > 0.1
> {code}
> while rewriting it to 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) > 0.1 else false end
> {code}
> does vectorize into 
> {code}
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: SelectColumnIsTrue(col 
> 12:boolean)(children: VectorUDFAdaptor(CASE WHEN ((avg_window_0 <> 0.0D)) 
> THEN (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) 
> END)(children: DoubleColNotEqualDoubleScalar(col 7:double, val 0.0) -> 
> 8:boolean, DoubleColGreaterDoubleScalar(col 9:double, val 0.1)(children: 
> DoubleColDivideDoubleColumn(col 10:double, col 7:double)(children: 
> FuncAbsDoubleToDouble(col 9:double)(children: 
> DoubleColSubtractDoubleColumn(col 6:double, col 7:double) -> 9:double) -> 
> 10:double) -> 9:double) -> 11:boolean) -> 12:boolean)
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) END 
> (type: boolean)
>   Statistics: Num rows: 11 Data size: 5291 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-19985) ACID: Skip decoding the ROW__ID sections for read-only queries

2018-08-03 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568707#comment-16568707
 ] 

Gopal V edited comment on HIVE-19985 at 8/3/18 8:14 PM:


+1 (orc pending release)

Tested with/without flag=true, with {{select count(1), sum(ss_net_profit) from 
store_sales;}}.

with:  31.383 seconds (cold run), 15.278 seconds (hot run)
without: 35.525 seconds (cold run),  22.934 seconds (hot run)

Latest patch has a big impact on the cached runs.

I can see there's L1 cache miss hotspot in the double System.arrayCopy to make 
AcidWrapper and then to copy it back in copyBase().

{code}
if (isAcidScan) {
+int acidColCount = acidReader.includeAcidColumns() ? 
OrcInputFormat.getRootColumn(false) - 1 : 0;
...
+  int ixInVrb = includes.getPhysicalColumnIds().get(ixInReadSet) -
+  (acidReader.includeAcidColumns() ? 0 : OrcRecordUpdater.ROW);
{code}

Can that be changed to if(isAcidScan && innerReader.includeAcidColumns()) to 
skip that entirely, because the offsets fall back the same way to the non-acid 
impl?


was (Author: gopalv):
Tested with/without flag=true, with {{select count(1), sum(ss_net_profit) from 
store_sales;}}.

with:  31.383 seconds (cold run), 15.278 seconds (hot run)
without: 35.525 seconds (cold run),  22.934 seconds (hot run)

Latest patch has a big impact on the cached runs.

I can see there's L1 cache miss hotspot in the double System.arrayCopy to make 
AcidWrapper and then to copy it back in copyBase().

{code}
if (isAcidScan) {
+int acidColCount = acidReader.includeAcidColumns() ? 
OrcInputFormat.getRootColumn(false) - 1 : 0;
...
+  int ixInVrb = includes.getPhysicalColumnIds().get(ixInReadSet) -
+  (acidReader.includeAcidColumns() ? 0 : OrcRecordUpdater.ROW);
{code}

Can that be changed to if(isAcidScan && innerReader.includeAcidColumns()) to 
skip that entirely, because the offsets fall back the same way to the non-acid 
impl?

> ACID: Skip decoding the ROW__ID sections for read-only queries 
> ---
>
> Key: HIVE-19985
> URL: https://issues.apache.org/jira/browse/HIVE-19985
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal V
>Assignee: Eugene Koifman
>Priority: Major
>  Labels: Branch3Candidate
> Attachments: HIVE-19985.01.patch, HIVE-19985.04.patch
>
>
> For a base_n file there are no aborted transactions within the file and if 
> there are no pending delete deltas, the entire ACID ROW__ID can be skipped 
> for all read-only queries (i.e SELECT), though it still needs to be projected 
> out for MERGE, UPDATE and DELETE queries.
> This patch tries to entirely ignore the ACID ROW__ID fields for all tables 
> where there are no possible deletes or aborted transactions for an ACID split.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19985) ACID: Skip decoding the ROW__ID sections for read-only queries

2018-08-03 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568707#comment-16568707
 ] 

Gopal V commented on HIVE-19985:


Tested with/without flag=true, with {{select count(1), sum(ss_net_profit) from 
store_sales;}}.

with:  31.383 seconds (cold run), 15.278 seconds (hot run)
without: 35.525 seconds (cold run),  22.934 seconds (hot run)

Latest patch has a big impact on the cached runs.

I can see there's L1 cache miss hotspot in the double System.arrayCopy to make 
AcidWrapper and then to copy it back in copyBase().

{code}
if (isAcidScan) {
+int acidColCount = acidReader.includeAcidColumns() ? 
OrcInputFormat.getRootColumn(false) - 1 : 0;
...
+  int ixInVrb = includes.getPhysicalColumnIds().get(ixInReadSet) -
+  (acidReader.includeAcidColumns() ? 0 : OrcRecordUpdater.ROW);
{code}

Can that be changed to if(isAcidScan && innerReader.includeAcidColumns()) to 
skip that entirely, because the offsets fall back the same way to the non-acid 
impl?

> ACID: Skip decoding the ROW__ID sections for read-only queries 
> ---
>
> Key: HIVE-19985
> URL: https://issues.apache.org/jira/browse/HIVE-19985
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal V
>Assignee: Eugene Koifman
>Priority: Major
>  Labels: Branch3Candidate
> Attachments: HIVE-19985.01.patch, HIVE-19985.04.patch
>
>
> For a base_n file there are no aborted transactions within the file and if 
> there are no pending delete deltas, the entire ACID ROW__ID can be skipped 
> for all read-only queries (i.e SELECT), though it still needs to be projected 
> out for MERGE, UPDATE and DELETE queries.
> This patch tries to entirely ignore the ACID ROW__ID fields for all tables 
> where there are no possible deletes or aborted transactions for an ACID split.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20301) Enable vectorization for materialized view rewriting tests

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568684#comment-16568684
 ] 

Hive QA commented on HIVE-20301:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
47s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}  2m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13029/dev-support/hive-personality.sh
 |
| git revision | master / a3cd496 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13029/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Enable vectorization for materialized view rewriting tests
> --
>
> Key: HIVE-20301
> URL: https://issues.apache.org/jira/browse/HIVE-20301
> Project: Hive
>  Issue Type: Test
>  Components: Materialized views, Test
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20301.patch, HIVE-20301.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20304) When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, and the execution engine is mr, same stage may launch twice due to the wrong generated plan

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568671#comment-16568671
 ] 

Hive QA commented on HIVE-20304:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12934290/HIVE-20304.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 14860 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler 
(batchId=226)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13028/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13028/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13028/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12934290 - PreCommit-HIVE-Build

> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is mr, same stage may launch twice due to the wrong 
> generated plan
> 
>
> Key: HIVE-20304
> URL: https://issues.apache.org/jira/browse/HIVE-20304
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Major
> Fix For: 1.2.1, 2.3.3
>
> Attachments: HIVE-20304.1.patch, HIVE-20304.patch
>
>
> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is set to mr, same stage of a query may launch twice 
> due to the wrong generated plan. If hive.exec.parallel is also true, the same 
> stage will launch at the same time and the job will failed due to the first 
> completed stage clear the map.xml/reduce.xml file stored in the hdfs.
> use following sql to reproduce the issue:
> {code:java}
> CREATE TABLE `tbl1`(
>   `fence` string);
> CREATE TABLE `tbl2`(
>   `order_id` string,
>   `phone` string,
>   `search_id` string
> )
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl3`(
>   `order_id` string,
>   `platform` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl4`(
>   `groupname` string,
>   `phone` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl5`(
>   `search_id` string,
>   `fence` string)
> PARTITIONED BY (
>   `dt` string);
> SET hive.exec.parallel = TRUE;
> SET hive.auto.convert.join = TRUE;
> SET hive.optimize.skewjoin = TRUE;
> SELECT dt,
>platform,
>groupname,
>count(1) as cnt
> FROM
> (SELECT dt,
> platform,
> groupname
>  FROM
>  (SELECT fence
>   FROM tbl1)ta
>JOIN
>(SELECT a0.dt,
>a1.platform,
>a2.groupname,
>a3.fence
> FROM
> (SELECT dt,
> order_id,
> phone,
> search_id
>  FROM tbl2
>  WHERE dt =20180703 )a0
>   JOIN
>   (SELECT order_id,
>   platform,
>   dt
>FROM tbl3
>WHERE dt =20180703 )a1 ON a0.order_id = a1.order_id
>   INNER JOIN
>   (SELECT groupname,
>   phone,
>   dt
>FROM tbl4
>WHERE dt =20180703 )a2 ON a0.phone = a2.phone
>   LEFT JOIN
>   (SELECT search_id,
>   fence,
>   dt
>FROM tbl5
>WHERE dt =20180703)a3 ON a0.search_id = a3.search_id)t0 ON 
> ta.fence = t0.fence)t11
> GROUP BY dt,
>  platform,
>  groupname;
> DROP TABLE tbl1;
> DROP TABLE tbl2;
> DROP TABLE tbl3;
> DROP TABLE tbl4;
> DROP TABLE tbl5;
> {code}
> We will get some error message like this:
> Examining task ID: task_1531284442065_3637_m_00 (and more) from job 
> job_1531284442065_3637
> Task with the most failures(4):
> -
> Task ID:
>   task_1531284442065_3637_m_00
> URL:
>   
> http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1531284442065_3637&tipid=task_1531284442065_3637_m_00
> -
> D

[jira] [Commented] (HIVE-20304) When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, and the execution engine is mr, same stage may launch twice due to the wrong generated plan

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568618#comment-16568618
 ] 

Hive QA commented on HIVE-20304:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
24s{color} | {color:blue} ql in master has 2301 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13028/dev-support/hive-personality.sh
 |
| git revision | master / a3cd496 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13028/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is mr, same stage may launch twice due to the wrong 
> generated plan
> 
>
> Key: HIVE-20304
> URL: https://issues.apache.org/jira/browse/HIVE-20304
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Major
> Fix For: 1.2.1, 2.3.3
>
> Attachments: HIVE-20304.1.patch, HIVE-20304.patch
>
>
> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is set to mr, same stage of a query may launch twice 
> due to the wrong generated plan. If hive.exec.parallel is also true, the same 
> stage will launch at the same time and the job will failed due to the first 
> completed stage clear the map.xml/reduce.xml file stored in the hdfs.
> use following sql to reproduce the issue:
> {code:java}
> CREATE TABLE `tbl1`(
>   `fence` string);
> CREATE TABLE `tbl2`(
>   `order_id` string,
>   `phone` string,
>   `search_id` string
> )
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl3`(
>   `order_id` string,
>   `platform` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl4`(
>   `groupname` string,
>   `phone` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl5`(
>   `search_id` string,
>   `fence` string)
> PARTITIONED BY (
>   `dt` string)

[jira] [Assigned] (HIVE-20308) Add support for pagination

2018-08-03 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-20308:
--


> Add support for pagination
> --
>
> Key: HIVE-20308
> URL: https://issues.apache.org/jira/browse/HIVE-20308
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20307) Add support for filterspec

2018-08-03 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-20307:
--

Assignee: Vihang Karajgaonkar

> Add support for filterspec
> --
>
> Key: HIVE-20307
> URL: https://issues.apache.org/jira/browse/HIVE-20307
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20306) Add the thrift API to get partially filled Partition

2018-08-03 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-20306:
--


> Add the thrift API to get partially filled Partition
> 
>
> Key: HIVE-20306
> URL: https://issues.apache.org/jira/browse/HIVE-20306
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19715) Consolidated and flexible API for fetching partition metadata from HMS

2018-08-03 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568590#comment-16568590
 ] 

Vihang Karajgaonkar commented on HIVE-19715:


While I started working on it I realized a few things which could make changes 
in the design. By default, Thrift field requiredness is "default requiredness" 
[https://thrift.apache.org/docs/idl#field-requiredness] which is like a hybrid 
of {{optional}} and {{required}}. So in the write path thrift attempts to write 
them as long as its possible (null fields cannot be written IIUC). On the read 
side, reader always checks if the field is set. This is really the behavior 
what we want and fortunately, the Partition thrift definition has either 
default requiredness or optional which works well for partially filled 
partitions. So even in theory I can just return a List for this API, 
but I think using PartitionSpec still makes a lot of sense since it groups the 
partitions according the {{table location, fieldSchema, deserializer class}}. I 
think in case of non-standard partition locations, there is no harm in grouping 
them together esp when there are lot of such non-standard partitions.

I am planning to use {{PropertyUtils}} from {{commons-beanutils}} package which 
is already in the classpath for metastore from {{apache-commons}} dependency. 
It provides the {{setNestedProperty}} method which can used to set the fields. 
All the fields defined in Thrift have setter methods so this should not cause 
any problems.

For setting the projected fields, in case of JDO we cannot set multi-valued 
fields in {{setResult}} clause which is a JDO limitation. In such a case the 
JDO version of the API will fall back to retrieving the full partitions. The 
directSQL version of the API however should be able to parse and set 
multi-valued fields like it does currently. I am currently looking at the 
directSQL implementation of setting partition fields and come up with a more 
maintainable way to selectively fire correct queries based on the projection 
field list instead of introducing bunch of if/else or case statements in that 
code, so I am thinking of creating a PartitionFieldParser class which will 
split out the right queries for the given list of fields. We will have to take 
care of optimizing the field list as well. It should remove redundant fields 
eg. if {{sd}} is present we can safe remove the redundant {{sd.location}} or 
{{sd.serdeInfo.serializationClass}}. Similarly, if all the nested fields of 
{{sd}} are present individually we can combine them together to form one field 
{{sd}}. I am currently treating these as optional improvements which I will fix 
later as needed.

I plan to divide the work into sub-tasks since each one of these could be 
considerable code change.
 1. Expose thrift API with the support for projected fields
 2. Add support for filters
 3. Add support for pagination

Will update the design doc based on the above modifications once I am close to 
completion of the sub-task 1 just in case there are more puzzles to solve.

> Consolidated and flexible API for fetching partition metadata from HMS
> --
>
> Key: HIVE-19715
> URL: https://issues.apache.org/jira/browse/HIVE-19715
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Reporter: Todd Lipcon
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Attachments: HIVE-19715-design-doc.pdf
>
>
> Currently, the HMS thrift API exposes 17 different APIs for fetching 
> partition-related information. There is somewhat of a combinatorial explosion 
> going on, where each API has variants with and without "auth" info, by pspecs 
> vs names, by filters, by exprs, etc. Having all of these separate APIs long 
> term is a maintenance burden and also more confusing for consumers.
> Additionally, even with all of these APIs, there is a lack of granularity in 
> fetching only the information needed for a particular use case. For example, 
> in some use cases it may be beneficial to only fetch the partition locations 
> without wasting effort fetching statistics, etc.
> This JIRA proposes that we add a new "one API to rule them all" for fetching 
> partition info. The request and response would be encapsulated in structs. 
> Some desirable properties:
> - the request should be able to specify which pieces of information are 
> required (eg location, properties, etc)
> - in the case of partition parameters, the request should be able to do 
> either whitelisting or blacklisting (eg to exclude large incremental column 
> stats HLL dumped in there by Impala)
> - the request should optionally specify auth info (to encompas the 
> "with_auth" variants)
> - the request should be able to designate the

[jira] [Commented] (HIVE-19097) related equals and in operators may cause inaccurate stats estimations

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568562#comment-16568562
 ] 

Hive QA commented on HIVE-19097:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12934279/HIVE-19097.12.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 14862 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=186)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13027/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13027/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13027/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12934279 - PreCommit-HIVE-Build

> related equals and in operators may cause inaccurate stats estimations
> --
>
> Key: HIVE-19097
> URL: https://issues.apache.org/jira/browse/HIVE-19097
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19097.01.patch, HIVE-19097.02.patch, 
> HIVE-19097.03.patch, HIVE-19097.04.patch, HIVE-19097.05.patch, 
> HIVE-19097.06.patch, HIVE-19097.06wip01.patch, HIVE-19097.06wip02.patch, 
> HIVE-19097.07.patch, HIVE-19097.08.patch, HIVE-19097.08.patch, 
> HIVE-19097.09.patch, HIVE-19097.10.patch, HIVE-19097.11.patch, 
> HIVE-19097.12.patch, HIVE-19097.partial.patch
>
>
> tpcds#74 is optimized in a way that for date_dim the condition contains IN 
> and = for the same column
> {code:java}
> | Map Operator Tree: |
> | TableScan  |
> |   alias: date_dim  |
> |   filterExpr: (((d_year) IN (2001, 2002) and (d_year = 
> 2002) and d_date_sk is not null) or ((d_year) IN (2001, 2002) and (d_year = 
> 2001) and d_date_sk is not null)) (type: boolean) |
> |   Statistics: Num rows: 73049 Data size: 876588 Basic 
> stats: COMPLETE Column stats: COMPLETE |
> |   Filter Operator  |
> | predicate: ((d_year) IN (2001, 2002) and (d_year = 
> 2002) and d_date_sk is not null) (type: boolean) |
> | Statistics: Num rows: 4 Data size: 48 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> {code}
> the "real" row count will be 365
> for separate {{IN}} and {{=}} the estimation is very good; but if both are 
> present it becomes (very) underestimated.
> {code:java}
> set hive.query.results.cache.enabled=false;
> drop table if exists t1;
> drop table if exists t8;
> create table t1 (a integer,b integer);
> create table t8 like t1;
> insert into t1 values (1,1),(2,2),(3,3),(4,4),(5,5);
> insert into t8
> select * from t1 union all select * from t1 union all select * from t1 union 
> all select * from t1 union all
> select * from t1 union all select * from t1 union all select * from t1 union 
> all select * from t1
> ;
> analyze table t1 compute statistics for columns;
> analyze table t8 compute statistics for columns;
> explain analyze select sum(a) from t8 where b in (2,3) group by b;
> explain analyze select sum(a) from t8 where b=2 group by b;
> explain analyze select sum(a) from t1 where b in (2,3) and b=2 group by b;
> explain analyze select sum(a) from t8 where b in (2,3) and b=2 group by b;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19937) Intern fields in MapWork on deserialization

2018-08-03 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568535#comment-16568535
 ] 

Vihang Karajgaonkar commented on HIVE-19937:


HI [~stakiar] Sorry for the delay in responding. Can you please add a comment 
mentioning that set method interns the duplicate strings, just before the 
{{mapWork.setPathToPartitionInfo}} and {{partitionDesc.setBaseFileName}} in the 
customer deserializer's read method so that its more obvious why we are calling 
the set method. I feel it is easier to understand that way.

Rest looks good. Thanks for the patch. +1

> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, HIVE-19937.4.patch, HIVE-19937.5.patch, 
> post-patch-report.html, report.html
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20277) Vectorization: Case expressions that return BOOLEAN are not supported for FILTER

2018-08-03 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-20277:

Summary: Vectorization: Case expressions that return BOOLEAN are not 
supported for FILTER  (was: Vectorization: Case expressions that return NULL in 
FILTER)

> Vectorization: Case expressions that return BOOLEAN are not supported for 
> FILTER
> 
>
> Key: HIVE-20277
> URL: https://issues.apache.org/jira/browse/HIVE-20277
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-20277.WIP.01.patch
>
>
> In cases like Query89, the vertex with the filter is not vectorized.
> {code}
>Filter Operator
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (null) END 
> (type: boolean)
> {code}
> {code}
> Reducer 3 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> notVectorizedReason: FILTER operator: Unexpected hive type 
> name void
> vectorized: false
> {code}
> The query specifically has 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) else null end > 0.1
> {code}
> while rewriting it to 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) > 0.1 else false end
> {code}
> does vectorize into 
> {code}
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: SelectColumnIsTrue(col 
> 12:boolean)(children: VectorUDFAdaptor(CASE WHEN ((avg_window_0 <> 0.0D)) 
> THEN (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) 
> END)(children: DoubleColNotEqualDoubleScalar(col 7:double, val 0.0) -> 
> 8:boolean, DoubleColGreaterDoubleScalar(col 9:double, val 0.1)(children: 
> DoubleColDivideDoubleColumn(col 10:double, col 7:double)(children: 
> FuncAbsDoubleToDouble(col 9:double)(children: 
> DoubleColSubtractDoubleColumn(col 6:double, col 7:double) -> 9:double) -> 
> 10:double) -> 9:double) -> 11:boolean) -> 12:boolean)
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) END 
> (type: boolean)
>   Statistics: Num rows: 11 Data size: 5291 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20277) Vectorization: Case expressions that return NULL in FILTER

2018-08-03 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568523#comment-16568523
 ] 

Matt McCline commented on HIVE-20277:
-

Making the JIRA title more general.

> Vectorization: Case expressions that return NULL in FILTER
> --
>
> Key: HIVE-20277
> URL: https://issues.apache.org/jira/browse/HIVE-20277
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-20277.WIP.01.patch
>
>
> In cases like Query89, the vertex with the filter is not vectorized.
> {code}
>Filter Operator
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (null) END 
> (type: boolean)
> {code}
> {code}
> Reducer 3 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> notVectorizedReason: FILTER operator: Unexpected hive type 
> name void
> vectorized: false
> {code}
> The query specifically has 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) else null end > 0.1
> {code}
> while rewriting it to 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) > 0.1 else false end
> {code}
> does vectorize into 
> {code}
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: SelectColumnIsTrue(col 
> 12:boolean)(children: VectorUDFAdaptor(CASE WHEN ((avg_window_0 <> 0.0D)) 
> THEN (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) 
> END)(children: DoubleColNotEqualDoubleScalar(col 7:double, val 0.0) -> 
> 8:boolean, DoubleColGreaterDoubleScalar(col 9:double, val 0.1)(children: 
> DoubleColDivideDoubleColumn(col 10:double, col 7:double)(children: 
> FuncAbsDoubleToDouble(col 9:double)(children: 
> DoubleColSubtractDoubleColumn(col 6:double, col 7:double) -> 9:double) -> 
> 10:double) -> 9:double) -> 11:boolean) -> 12:boolean)
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) END 
> (type: boolean)
>   Statistics: Num rows: 11 Data size: 5291 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-20277) Vectorization: Case expressions that return NULL in FILTER

2018-08-03 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568523#comment-16568523
 ] 

Matt McCline edited comment on HIVE-20277 at 8/3/18 5:38 PM:
-

Making the JIRA title more general (was "Vectorization: Case expressions that 
return NULL in FILTER").


was (Author: mmccline):
Making the JIRA title more general.

> Vectorization: Case expressions that return NULL in FILTER
> --
>
> Key: HIVE-20277
> URL: https://issues.apache.org/jira/browse/HIVE-20277
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-20277.WIP.01.patch
>
>
> In cases like Query89, the vertex with the filter is not vectorized.
> {code}
>Filter Operator
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (null) END 
> (type: boolean)
> {code}
> {code}
> Reducer 3 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> notVectorizedReason: FILTER operator: Unexpected hive type 
> name void
> vectorized: false
> {code}
> The query specifically has 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) else null end > 0.1
> {code}
> while rewriting it to 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) > 0.1 else false end
> {code}
> does vectorize into 
> {code}
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: SelectColumnIsTrue(col 
> 12:boolean)(children: VectorUDFAdaptor(CASE WHEN ((avg_window_0 <> 0.0D)) 
> THEN (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) 
> END)(children: DoubleColNotEqualDoubleScalar(col 7:double, val 0.0) -> 
> 8:boolean, DoubleColGreaterDoubleScalar(col 9:double, val 0.1)(children: 
> DoubleColDivideDoubleColumn(col 10:double, col 7:double)(children: 
> FuncAbsDoubleToDouble(col 9:double)(children: 
> DoubleColSubtractDoubleColumn(col 6:double, col 7:double) -> 9:double) -> 
> 10:double) -> 9:double) -> 11:boolean) -> 12:boolean)
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) END 
> (type: boolean)
>   Statistics: Num rows: 11 Data size: 5291 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19097) related equals and in operators may cause inaccurate stats estimations

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568520#comment-16568520
 ] 

Hive QA commented on HIVE-19097:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
44s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
35s{color} | {color:blue} common in master has 64 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
21s{color} | {color:blue} ql in master has 2301 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
47s{color} | {color:red} ql: The patch generated 13 new + 637 unchanged - 19 
fixed = 650 total (was 656) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
41s{color} | {color:red} ql generated 4 new + 2297 unchanged - 4 fixed = 2301 
total (was 2301) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 30m 27s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.interpretNodeAs(PrimitiveTypeInfo,
 ExprNodeDesc) invokes inefficient new Byte(String) constructor; use 
Byte.valueOf(String) instead  At TypeCheckProcFactory.java:Byte(String) 
constructor; use Byte.valueOf(String) instead  At 
TypeCheckProcFactory.java:[line 1259] |
|  |  
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.interpretNodeAs(PrimitiveTypeInfo,
 ExprNodeDesc) invokes inefficient new Integer(String) constructor; use 
Integer.valueOf(String) instead  At TypeCheckProcFactory.java:Integer(String) 
constructor; use Integer.valueOf(String) instead  At 
TypeCheckProcFactory.java:[line 1251] |
|  |  
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.interpretNodeAs(PrimitiveTypeInfo,
 ExprNodeDesc) invokes inefficient new Long(String) constructor; use 
Long.valueOf(String) instead  At TypeCheckProcFactory.java:Long(String) 
constructor; use Long.valueOf(String) instead  At 
TypeCheckProcFactory.java:[line 1253] |
|  |  
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.interpretNodeAs(PrimitiveTypeInfo,
 ExprNodeDesc) invokes inefficient new Short(String) constructor; use 
Short.valueOf(String) instead  At TypeCheckProcFactory.java:Short(String) 
constructor; use Short.valueOf(String) instead  At 
TypeCheckProcFactory.java:[line 1261] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13027/dev-support/hive-personali

[jira] [Commented] (HIVE-20277) Vectorization: Case expressions that return NULL in FILTER

2018-08-03 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568512#comment-16568512
 ] 

Matt McCline commented on HIVE-20277:
-

11 test failures are queries that now vectorized and have EXPLAIN PLAN 
differences "Execution mode: vectorized" AND No Wrong Results.

A couple failures look unrelated (but would still need to pass under new rules):
TestSSL.testSSLConnectionWithProperty
TestSSL.testMetastoreConnectionWrongCertCN

And a bad run (?) due to:
   TestMiniDruidCliDriver - did not produce a TEST-*.xml file 
   TestCliDriver - did not produce a TEST-*.xml file




> Vectorization: Case expressions that return NULL in FILTER
> --
>
> Key: HIVE-20277
> URL: https://issues.apache.org/jira/browse/HIVE-20277
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-20277.WIP.01.patch
>
>
> In cases like Query89, the vertex with the filter is not vectorized.
> {code}
>Filter Operator
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (null) END 
> (type: boolean)
> {code}
> {code}
> Reducer 3 
> Execution mode: llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> notVectorizedReason: FILTER operator: Unexpected hive type 
> name void
> vectorized: false
> {code}
> The query specifically has 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) else null end > 0.1
> {code}
> while rewriting it to 
> {code}
> where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
> avg_monthly_sales) / avg_monthly_sales) > 0.1 else false end
> {code}
> does vectorize into 
> {code}
> Filter Operator
>   Filter Vectorization:
>   className: VectorFilterOperator
>   native: true
>   predicateExpression: SelectColumnIsTrue(col 
> 12:boolean)(children: VectorUDFAdaptor(CASE WHEN ((avg_window_0 <> 0.0D)) 
> THEN (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) 
> END)(children: DoubleColNotEqualDoubleScalar(col 7:double, val 0.0) -> 
> 8:boolean, DoubleColGreaterDoubleScalar(col 9:double, val 0.1)(children: 
> DoubleColDivideDoubleColumn(col 10:double, col 7:double)(children: 
> FuncAbsDoubleToDouble(col 9:double)(children: 
> DoubleColSubtractDoubleColumn(col 6:double, col 7:double) -> 9:double) -> 
> 10:double) -> 9:double) -> 11:boolean) -> 12:boolean)
>   predicate: CASE WHEN ((avg_window_0 <> 0.0D)) THEN 
> (((abs((_col6 - avg_window_0)) / avg_window_0) > 0.1D)) ELSE (false) END 
> (type: boolean)
>   Statistics: Num rows: 11 Data size: 5291 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20291) Allow HiveStreamingConnection to receive a WriteId

2018-08-03 Thread Jaume M (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaume M updated HIVE-20291:
---
Attachment: HIVE-20291.2.patch
Status: Patch Available  (was: Open)

> Allow HiveStreamingConnection to receive a WriteId
> --
>
> Key: HIVE-20291
> URL: https://issues.apache.org/jira/browse/HIVE-20291
> Project: Hive
>  Issue Type: Improvement
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20291.1.patch, HIVE-20291.2.patch
>
>
> If the writeId is received externally it won't need to open connections to 
> the metastore. It won't be able to the commit in this case as well so it must 
> be done by the entity passing the writeId.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-19985) ACID: Skip decoding the ROW__ID sections for read-only queries

2018-08-03 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568479#comment-16568479
 ] 

Eugene Koifman edited comment on HIVE-19985 at 8/3/18 5:04 PM:
---

[~gopalv], patch 4 includes LLAP handling
cc [~ashutoshc]

includes hive.optimize.acid.meta.columns option so this feature can be disabled


was (Author: ekoifman):
[~gopalv], patch 4 includes LLAP handling
cc [~ashutoshc]

> ACID: Skip decoding the ROW__ID sections for read-only queries 
> ---
>
> Key: HIVE-19985
> URL: https://issues.apache.org/jira/browse/HIVE-19985
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal V
>Assignee: Eugene Koifman
>Priority: Major
>  Labels: Branch3Candidate
> Attachments: HIVE-19985.01.patch, HIVE-19985.04.patch
>
>
> For a base_n file there are no aborted transactions within the file and if 
> there are no pending delete deltas, the entire ACID ROW__ID can be skipped 
> for all read-only queries (i.e SELECT), though it still needs to be projected 
> out for MERGE, UPDATE and DELETE queries.
> This patch tries to entirely ignore the ACID ROW__ID fields for all tables 
> where there are no possible deletes or aborted transactions for an ACID split.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19985) ACID: Skip decoding the ROW__ID sections for read-only queries

2018-08-03 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568479#comment-16568479
 ] 

Eugene Koifman commented on HIVE-19985:
---

[~gopalv], patch 4 includes LLAP handling
cc [~ashutoshc]

> ACID: Skip decoding the ROW__ID sections for read-only queries 
> ---
>
> Key: HIVE-19985
> URL: https://issues.apache.org/jira/browse/HIVE-19985
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal V
>Assignee: Eugene Koifman
>Priority: Major
>  Labels: Branch3Candidate
> Attachments: HIVE-19985.01.patch, HIVE-19985.04.patch
>
>
> For a base_n file there are no aborted transactions within the file and if 
> there are no pending delete deltas, the entire ACID ROW__ID can be skipped 
> for all read-only queries (i.e SELECT), though it still needs to be projected 
> out for MERGE, UPDATE and DELETE queries.
> This patch tries to entirely ignore the ACID ROW__ID fields for all tables 
> where there are no possible deletes or aborted transactions for an ACID split.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19985) ACID: Skip decoding the ROW__ID sections for read-only queries

2018-08-03 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-19985:
--
Attachment: HIVE-19985.04.patch

> ACID: Skip decoding the ROW__ID sections for read-only queries 
> ---
>
> Key: HIVE-19985
> URL: https://issues.apache.org/jira/browse/HIVE-19985
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal V
>Assignee: Eugene Koifman
>Priority: Major
>  Labels: Branch3Candidate
> Attachments: HIVE-19985.01.patch, HIVE-19985.04.patch
>
>
> For a base_n file there are no aborted transactions within the file and if 
> there are no pending delete deltas, the entire ACID ROW__ID can be skipped 
> for all read-only queries (i.e SELECT), though it still needs to be projected 
> out for MERGE, UPDATE and DELETE queries.
> This patch tries to entirely ignore the ACID ROW__ID fields for all tables 
> where there are no possible deletes or aborted transactions for an ACID split.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14162) Allow disabling of long running job on Hive On Spark On YARN

2018-08-03 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568478#comment-16568478
 ] 

Hive QA commented on HIVE-14162:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12934264/HIVE-14162.6.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 14862 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parallel_orderby] 
(batchId=57)
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druid_timestamptz]
 (batchId=193)
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_joins]
 (batchId=193)
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_masking]
 (batchId=193)
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_test1]
 (batchId=193)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13026/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13026/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13026/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12934264 - PreCommit-HIVE-Build

> Allow disabling of long running job on Hive On Spark On YARN
> 
>
> Key: HIVE-14162
> URL: https://issues.apache.org/jira/browse/HIVE-14162
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Scott
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-14162.1.patch, HIVE-14162.2.patch, 
> HIVE-14162.3.patch, HIVE-14162.4.patch, HIVE-14162.5.patch, HIVE-14162.6.patch
>
>
> Hive On Spark launches a long running process on the first query to handle 
> all queries for that user session. In some use cases this is not desired, for 
> instance when using Hue with large intervals between query executions.
> Could we have a property that would cause long running spark jobs to be 
> terminated after each query execution and started again for the next one?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14557) Nullpointer When both SkewJoin and Mapjoin Enabled

2018-08-03 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568476#comment-16568476
 ] 

BELUGA BEHR commented on HIVE-14557:


Maybe related to [HIVE-20304]?

> Nullpointer When both SkewJoin  and Mapjoin Enabled
> ---
>
> Key: HIVE-14557
> URL: https://issues.apache.org/jira/browse/HIVE-14557
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 1.1.0, 2.1.0
>Reporter: Nemon Lou
>Priority: Major
> Attachments: HIVE-14557.patch
>
>
> The following sql failed with return code 2 on mr.
> {noformat}
> create table a(id int,id1 int);
> create table b(id int,id1 int);
> create table c(id int,id1 int);
> set hive.optimize.skewjoin=true;
> select a.id,b.id,c.id1 from a,b,c where a.id=b.id and a.id1=c.id1;
> {noformat}
> Error log as follows:
> {noformat}
> 2016-08-17 21:13:42,081 INFO [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
> Id =0
>   
> Id =21
>   
> Id =28
>   
> Id =16
>   
>   <\Children>
>   Id = 28 null<\Parent>
> <\FS>
>   <\Children>
>   Id = 21 nullId = 33 
> Id =33
>   null
>   <\Children>
>   <\Parent>
> <\HASHTABLEDUMMY><\Parent>
> <\MAPJOIN>
>   <\Children>
>   Id = 0 null<\Parent>
> <\TS>
>   <\Children>
>   <\Parent>
> <\MAP>
> 2016-08-17 21:13:42,084 INFO [main] 
> org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing operator TS[21]
> 2016-08-17 21:13:42,084 INFO [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Initializing dummy operator
> 2016-08-17 21:13:42,086 INFO [main] 
> org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0, 
> RECORDS_IN:0, 
> 2016-08-17 21:13:42,087 ERROR [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Hit error while closing 
> operators - failing tree
> 2016-08-17 21:13:42,088 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: Hive Runtime Error 
> while closing operators
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:474)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:682)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:696)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:696)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189)
>   ... 8 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20304) When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, and the execution engine is mr, same stage may launch twice due to the wrong generated plan

2018-08-03 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568475#comment-16568475
 ] 

BELUGA BEHR commented on HIVE-20304:


Is this related to [HIVE-14557]?

> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is mr, same stage may launch twice due to the wrong 
> generated plan
> 
>
> Key: HIVE-20304
> URL: https://issues.apache.org/jira/browse/HIVE-20304
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Major
> Fix For: 1.2.1, 2.3.3
>
> Attachments: HIVE-20304.1.patch, HIVE-20304.patch
>
>
> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is set to mr, same stage of a query may launch twice 
> due to the wrong generated plan. If hive.exec.parallel is also true, the same 
> stage will launch at the same time and the job will failed due to the first 
> completed stage clear the map.xml/reduce.xml file stored in the hdfs.
> use following sql to reproduce the issue:
> {code:java}
> CREATE TABLE `tbl1`(
>   `fence` string);
> CREATE TABLE `tbl2`(
>   `order_id` string,
>   `phone` string,
>   `search_id` string
> )
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl3`(
>   `order_id` string,
>   `platform` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl4`(
>   `groupname` string,
>   `phone` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl5`(
>   `search_id` string,
>   `fence` string)
> PARTITIONED BY (
>   `dt` string);
> SET hive.exec.parallel = TRUE;
> SET hive.auto.convert.join = TRUE;
> SET hive.optimize.skewjoin = TRUE;
> SELECT dt,
>platform,
>groupname,
>count(1) as cnt
> FROM
> (SELECT dt,
> platform,
> groupname
>  FROM
>  (SELECT fence
>   FROM tbl1)ta
>JOIN
>(SELECT a0.dt,
>a1.platform,
>a2.groupname,
>a3.fence
> FROM
> (SELECT dt,
> order_id,
> phone,
> search_id
>  FROM tbl2
>  WHERE dt =20180703 )a0
>   JOIN
>   (SELECT order_id,
>   platform,
>   dt
>FROM tbl3
>WHERE dt =20180703 )a1 ON a0.order_id = a1.order_id
>   INNER JOIN
>   (SELECT groupname,
>   phone,
>   dt
>FROM tbl4
>WHERE dt =20180703 )a2 ON a0.phone = a2.phone
>   LEFT JOIN
>   (SELECT search_id,
>   fence,
>   dt
>FROM tbl5
>WHERE dt =20180703)a3 ON a0.search_id = a3.search_id)t0 ON 
> ta.fence = t0.fence)t11
> GROUP BY dt,
>  platform,
>  groupname;
> DROP TABLE tbl1;
> DROP TABLE tbl2;
> DROP TABLE tbl3;
> DROP TABLE tbl4;
> DROP TABLE tbl5;
> {code}
> We will get some error message like this:
> Examining task ID: task_1531284442065_3637_m_00 (and more) from job 
> job_1531284442065_3637
> Task with the most failures(4):
> -
> Task ID:
>   task_1531284442065_3637_m_00
> URL:
>   
> http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1531284442065_3637&tipid=task_1531284442065_3637_m_00
> -
> Diagnostic Messages for this Task:
> File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> java.io.FileNotFoundException: File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> Looking into the plan by executing explain, I found that the Stage-4 and 
> Stage-5 can reached from multi root tasks.
> {code:java}
> Explain
> STAGE DEPENDENCIES:
>   Stage-21 is a root stage , consists of Stage-34, Stage-5
>   Stage-34 has a backup stage: Stage-5
>   Stage-20 depends on stages: Stage-34
>   Stage-17 depends on stages: Stage-5, Stage-18, Stage-20 , consists of 
> Stage-32, Stage-33, Stage-1
>   Stage-32 has a backup stage: Stage-1
>   Stage-15 depends on stages: Stage-32
>   Stage-10 depends on stages: Stage-1, Stage-15, Stage-16 , consists of 
> Stage-31, Stage-2
> 

[jira] [Updated] (HIVE-20304) When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, and the execution engine is mr, same stage may launch twice due to the wrong generated plan

2018-08-03 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-20304:
-
Status: Patch Available  (was: In Progress)

Add one test .q file

> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is mr, same stage may launch twice due to the wrong 
> generated plan
> 
>
> Key: HIVE-20304
> URL: https://issues.apache.org/jira/browse/HIVE-20304
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Major
> Fix For: 2.3.3, 1.2.1
>
> Attachments: HIVE-20304.1.patch, HIVE-20304.patch
>
>
> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is set to mr, same stage of a query may launch twice 
> due to the wrong generated plan. If hive.exec.parallel is also true, the same 
> stage will launch at the same time and the job will failed due to the first 
> completed stage clear the map.xml/reduce.xml file stored in the hdfs.
> use following sql to reproduce the issue:
> {code:java}
> CREATE TABLE `tbl1`(
>   `fence` string);
> CREATE TABLE `tbl2`(
>   `order_id` string,
>   `phone` string,
>   `search_id` string
> )
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl3`(
>   `order_id` string,
>   `platform` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl4`(
>   `groupname` string,
>   `phone` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl5`(
>   `search_id` string,
>   `fence` string)
> PARTITIONED BY (
>   `dt` string);
> SET hive.exec.parallel = TRUE;
> SET hive.auto.convert.join = TRUE;
> SET hive.optimize.skewjoin = TRUE;
> SELECT dt,
>platform,
>groupname,
>count(1) as cnt
> FROM
> (SELECT dt,
> platform,
> groupname
>  FROM
>  (SELECT fence
>   FROM tbl1)ta
>JOIN
>(SELECT a0.dt,
>a1.platform,
>a2.groupname,
>a3.fence
> FROM
> (SELECT dt,
> order_id,
> phone,
> search_id
>  FROM tbl2
>  WHERE dt =20180703 )a0
>   JOIN
>   (SELECT order_id,
>   platform,
>   dt
>FROM tbl3
>WHERE dt =20180703 )a1 ON a0.order_id = a1.order_id
>   INNER JOIN
>   (SELECT groupname,
>   phone,
>   dt
>FROM tbl4
>WHERE dt =20180703 )a2 ON a0.phone = a2.phone
>   LEFT JOIN
>   (SELECT search_id,
>   fence,
>   dt
>FROM tbl5
>WHERE dt =20180703)a3 ON a0.search_id = a3.search_id)t0 ON 
> ta.fence = t0.fence)t11
> GROUP BY dt,
>  platform,
>  groupname;
> DROP TABLE tbl1;
> DROP TABLE tbl2;
> DROP TABLE tbl3;
> DROP TABLE tbl4;
> DROP TABLE tbl5;
> {code}
> We will get some error message like this:
> Examining task ID: task_1531284442065_3637_m_00 (and more) from job 
> job_1531284442065_3637
> Task with the most failures(4):
> -
> Task ID:
>   task_1531284442065_3637_m_00
> URL:
>   
> http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1531284442065_3637&tipid=task_1531284442065_3637_m_00
> -
> Diagnostic Messages for this Task:
> File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> java.io.FileNotFoundException: File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> Looking into the plan by executing explain, I found that the Stage-4 and 
> Stage-5 can reached from multi root tasks.
> {code:java}
> Explain
> STAGE DEPENDENCIES:
>   Stage-21 is a root stage , consists of Stage-34, Stage-5
>   Stage-34 has a backup stage: Stage-5
>   Stage-20 depends on stages: Stage-34
>   Stage-17 depends on stages: Stage-5, Stage-18, Stage-20 , consists of 
> Stage-32, Stage-33, Stage-1
>   Stage-32 has a backup stage: Stage-1
>   Stage-15 depends on stages: Stage-32
>   Stage-10 depends on stages: Stage-1, Stage-15, Stage-16 , consists of 
> Stage-31, Stage-2
>   Stage-31
>   Stage-9 de

[jira] [Updated] (HIVE-20304) When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, and the execution engine is mr, same stage may launch twice due to the wrong generated plan

2018-08-03 Thread Hui Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Huang updated HIVE-20304:
-
Attachment: HIVE-20304.1.patch

> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is mr, same stage may launch twice due to the wrong 
> generated plan
> 
>
> Key: HIVE-20304
> URL: https://issues.apache.org/jira/browse/HIVE-20304
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.2.1, 2.3.3
>Reporter: Hui Huang
>Assignee: Hui Huang
>Priority: Major
> Fix For: 1.2.1, 2.3.3
>
> Attachments: HIVE-20304.1.patch, HIVE-20304.patch
>
>
> When hive.optimize.skewjoin and hive.auto.convert.join are both set to true, 
> and the execution engine is set to mr, same stage of a query may launch twice 
> due to the wrong generated plan. If hive.exec.parallel is also true, the same 
> stage will launch at the same time and the job will failed due to the first 
> completed stage clear the map.xml/reduce.xml file stored in the hdfs.
> use following sql to reproduce the issue:
> {code:java}
> CREATE TABLE `tbl1`(
>   `fence` string);
> CREATE TABLE `tbl2`(
>   `order_id` string,
>   `phone` string,
>   `search_id` string
> )
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl3`(
>   `order_id` string,
>   `platform` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl4`(
>   `groupname` string,
>   `phone` string)
> PARTITIONED BY (
>   `dt` string);
> CREATE TABLE `tbl5`(
>   `search_id` string,
>   `fence` string)
> PARTITIONED BY (
>   `dt` string);
> SET hive.exec.parallel = TRUE;
> SET hive.auto.convert.join = TRUE;
> SET hive.optimize.skewjoin = TRUE;
> SELECT dt,
>platform,
>groupname,
>count(1) as cnt
> FROM
> (SELECT dt,
> platform,
> groupname
>  FROM
>  (SELECT fence
>   FROM tbl1)ta
>JOIN
>(SELECT a0.dt,
>a1.platform,
>a2.groupname,
>a3.fence
> FROM
> (SELECT dt,
> order_id,
> phone,
> search_id
>  FROM tbl2
>  WHERE dt =20180703 )a0
>   JOIN
>   (SELECT order_id,
>   platform,
>   dt
>FROM tbl3
>WHERE dt =20180703 )a1 ON a0.order_id = a1.order_id
>   INNER JOIN
>   (SELECT groupname,
>   phone,
>   dt
>FROM tbl4
>WHERE dt =20180703 )a2 ON a0.phone = a2.phone
>   LEFT JOIN
>   (SELECT search_id,
>   fence,
>   dt
>FROM tbl5
>WHERE dt =20180703)a3 ON a0.search_id = a3.search_id)t0 ON 
> ta.fence = t0.fence)t11
> GROUP BY dt,
>  platform,
>  groupname;
> DROP TABLE tbl1;
> DROP TABLE tbl2;
> DROP TABLE tbl3;
> DROP TABLE tbl4;
> DROP TABLE tbl5;
> {code}
> We will get some error message like this:
> Examining task ID: task_1531284442065_3637_m_00 (and more) from job 
> job_1531284442065_3637
> Task with the most failures(4):
> -
> Task ID:
>   task_1531284442065_3637_m_00
> URL:
>   
> http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1531284442065_3637&tipid=task_1531284442065_3637_m_00
> -
> Diagnostic Messages for this Task:
> File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> java.io.FileNotFoundException: File does not exist: 
> hdfs://test/tmp/hive-hadoop/hadoop/fe5efa94-abb1-420f-b6ba-ec782e7b79ad/hive_2018-08-03_17-00-17_707_592882314975289971-5/-mr-10045/757eb1f7-7a37-4a7e-abc0-4a3b8b06510c/reduce.xml
> Looking into the plan by executing explain, I found that the Stage-4 and 
> Stage-5 can reached from multi root tasks.
> {code:java}
> Explain
> STAGE DEPENDENCIES:
>   Stage-21 is a root stage , consists of Stage-34, Stage-5
>   Stage-34 has a backup stage: Stage-5
>   Stage-20 depends on stages: Stage-34
>   Stage-17 depends on stages: Stage-5, Stage-18, Stage-20 , consists of 
> Stage-32, Stage-33, Stage-1
>   Stage-32 has a backup stage: Stage-1
>   Stage-15 depends on stages: Stage-32
>   Stage-10 depends on stages: Stage-1, Stage-15, Stage-16 , consists of 
> Stage-31, Stage-2
>   Stage-31
>   Stage-9 depends on stages: Stage-31
>   Stage

[jira] [Updated] (HIVE-20301) Enable vectorization for materialized view rewriting tests

2018-08-03 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20301:
---
Component/s: Test

> Enable vectorization for materialized view rewriting tests
> --
>
> Key: HIVE-20301
> URL: https://issues.apache.org/jira/browse/HIVE-20301
> Project: Hive
>  Issue Type: Test
>  Components: Materialized views, Test
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20301.patch, HIVE-20301.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20301) Enable vectorization for materialized view rewriting tests

2018-08-03 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568461#comment-16568461
 ] 

Jesus Camacho Rodriguez commented on HIVE-20301:


[~ashutoshc], could you take a look? Simple patch, only test changes. Thanks

> Enable vectorization for materialized view rewriting tests
> --
>
> Key: HIVE-20301
> URL: https://issues.apache.org/jira/browse/HIVE-20301
> Project: Hive
>  Issue Type: Test
>  Components: Materialized views, Test
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20301.patch, HIVE-20301.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20301) Enable vectorization for materialized view rewriting tests

2018-08-03 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20301:
---
Component/s: Materialized views

> Enable vectorization for materialized view rewriting tests
> --
>
> Key: HIVE-20301
> URL: https://issues.apache.org/jira/browse/HIVE-20301
> Project: Hive
>  Issue Type: Test
>  Components: Materialized views, Test
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20301.patch, HIVE-20301.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20301) Enable vectorization for materialized view rewriting tests

2018-08-03 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20301:
---
Attachment: HIVE-20301.patch

> Enable vectorization for materialized view rewriting tests
> --
>
> Key: HIVE-20301
> URL: https://issues.apache.org/jira/browse/HIVE-20301
> Project: Hive
>  Issue Type: Test
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20301.patch, HIVE-20301.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >