date:20170906

[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-09-06 Thread Ke Jia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated HIVE-17139:
--
Attachment: HIVE-17139.8.patch

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch, HIVE-17139.4.patch, HIVE-17139.5.patch, 
> HIVE-17139.6.patch, HIVE-17139.7.patch, HIVE-17139.8.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Work started] (HIVE-17475) Disable mapjoin using hint

2017-09-06 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17475 started by Deepak Jaiswal.
-
> Disable mapjoin using hint
> --
>
> Key: HIVE-17475
> URL: https://issues.apache.org/jira/browse/HIVE-17475
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> Using hint disable mapjoin for a given query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17475) Disable mapjoin using hint

2017-09-06 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-17475:
-


> Disable mapjoin using hint
> --
>
> Key: HIVE-17475
> URL: https://issues.apache.org/jira/browse/HIVE-17475
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> Using hint disable mapjoin for a given query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17456) Set current database for external LLAP interface

2017-09-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156525#comment-16156525
 ] 

Hive QA commented on HIVE-17456:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885648/HIVE-17456.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11028 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6705/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6705/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6705/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885648 - PreCommit-HIVE-Build

> Set current database for external LLAP interface
> 
>
> Key: HIVE-17456
> URL: https://issues.apache.org/jira/browse/HIVE-17456
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17456.1.patch, HIVE-17456.2.patch
>
>
> Currently the query passed in to external LLAP client has the default DB as 
> the current database.
> Allow user to specify a different current database.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17474) Different logical plan of same query(TPC-DS/70) with same settings

2017-09-06 Thread liyunzhang_intel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-17474:

Summary: Different logical plan of same query(TPC-DS/70) with same settings 
 (was: Different physical plan of same query(TPC-DS/70) on HOS)

> Different logical plan of same query(TPC-DS/70) with same settings
> --
>
> Key: HIVE-17474
> URL: https://issues.apache.org/jira/browse/HIVE-17474
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>
> in 
> [DS/query70|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query70.sql].
>  On hive version(d3b88f6),  i found that the physical plan is different in 
> runtime with the same settings.
> sometimes the physical plan
> {code}
> TS[0]-FIL[63]-SEL[2]-RS[43]-JOIN[45]-RS[46]-JOIN[48]-SEL[49]-GBY[50]-RS[51]-GBY[52]-SEL[53]-RS[54]-SEL[55]-PTF[56]-SEL[57]-RS[59]-SEL[60]-LIM[61]-FS[62]
> TS[3]-FIL[64]-SEL[5]-RS[44]-JOIN[45]
> TS[6]-FIL[65]-SEL[8]-RS[39]-JOIN[41]-RS[47]-JOIN[48]
> TS[9]-FIL[67]-SEL[11]-RS[18]-JOIN[20]-RS[21]-JOIN[23]-SEL[24]-GBY[25]-RS[26]-GBY[27]-RS[29]-SEL[30]-PTF[31]-FIL[66]-SEL[32]-GBY[38]-RS[40]-JOIN[41]
> TS[12]-FIL[68]-SEL[14]-RS[19]-JOIN[20]
> TS[15]-FIL[69]-SEL[17]-RS[22]-JOIN[23]
> {code}
>  TS\[6\] connects with TS\[9\] on JOIN\[41\] and connects with TS\[0\] on 
> JOIN\[48\].
> sometimes 
> {code}
> TS[0]-FIL[63]-RS[3]-JOIN[6]-RS[8]-JOIN[11]-RS[41]-JOIN[44]-SEL[46]-GBY[47]-RS[48]-GBY[49]-RS[50]-GBY[51]-RS[52]-SEL[53]-PTF[54]-SEL[55]-RS[57]-SEL[58]-LIM[59]-FS[60]
> TS[1]-FIL[64]-RS[5]-JOIN[6]
> TS[2]-FIL[65]-RS[10]-JOIN[11]
> TS[12]-FIL[68]-RS[16]-JOIN[19]-RS[20]-JOIN[23]-FIL[67]-SEL[25]-GBY[26]-RS[27]-GBY[28]-RS[29]-GBY[30]-RS[31]-SEL[32]-PTF[33]-FIL[66]-SEL[34]-GBY[39]-RS[43]-JOIN[44]
> TS[13]-FIL[69]-RS[18]-JOIN[19]
> TS[14]-FIL[70]-RS[22]-JOIN[23]
> {code}
> TS\[2\] connects with TS\[0\] on JOIN\[11\]
> Although TS\[2\] and TS\[6\] has different operator id, they are table store 
> in the query.
> The difference causes different spark execution plan and different execution 
> time.  I'm very confused why there are different physical plan with same 
> setting. Can anyone know where to investigate the root cause?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17474) Different logical plan of same query(TPC-DS/70) with same settings

2017-09-06 Thread liyunzhang_intel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-17474:

Description: 
in 
[DS/query70|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query70.sql].
 On hive version(d3b88f6),  i found that the logical plan is different in 
runtime with the same settings.

sometimes the logical plan
{code}
TS[0]-FIL[63]-SEL[2]-RS[43]-JOIN[45]-RS[46]-JOIN[48]-SEL[49]-GBY[50]-RS[51]-GBY[52]-SEL[53]-RS[54]-SEL[55]-PTF[56]-SEL[57]-RS[59]-SEL[60]-LIM[61]-FS[62]
TS[3]-FIL[64]-SEL[5]-RS[44]-JOIN[45]
TS[6]-FIL[65]-SEL[8]-RS[39]-JOIN[41]-RS[47]-JOIN[48]
TS[9]-FIL[67]-SEL[11]-RS[18]-JOIN[20]-RS[21]-JOIN[23]-SEL[24]-GBY[25]-RS[26]-GBY[27]-RS[29]-SEL[30]-PTF[31]-FIL[66]-SEL[32]-GBY[38]-RS[40]-JOIN[41]
TS[12]-FIL[68]-SEL[14]-RS[19]-JOIN[20]
TS[15]-FIL[69]-SEL[17]-RS[22]-JOIN[23]
{code}
 TS\[6\] connects with TS\[9\] on JOIN\[41\] and connects with TS\[0\] on 
JOIN\[48\].

sometimes 
{code}
TS[0]-FIL[63]-RS[3]-JOIN[6]-RS[8]-JOIN[11]-RS[41]-JOIN[44]-SEL[46]-GBY[47]-RS[48]-GBY[49]-RS[50]-GBY[51]-RS[52]-SEL[53]-PTF[54]-SEL[55]-RS[57]-SEL[58]-LIM[59]-FS[60]
TS[1]-FIL[64]-RS[5]-JOIN[6]
TS[2]-FIL[65]-RS[10]-JOIN[11]
TS[12]-FIL[68]-RS[16]-JOIN[19]-RS[20]-JOIN[23]-FIL[67]-SEL[25]-GBY[26]-RS[27]-GBY[28]-RS[29]-GBY[30]-RS[31]-SEL[32]-PTF[33]-FIL[66]-SEL[34]-GBY[39]-RS[43]-JOIN[44]
TS[13]-FIL[69]-RS[18]-JOIN[19]
TS[14]-FIL[70]-RS[22]-JOIN[23]
{code}
TS\[2\] connects with TS\[0\] on JOIN\[11\]

Although TS\[2\] and TS\[6\] has different operator id, they are table store in 
the query.

The difference causes different spark execution plan and different execution 
time.  I'm very confused why there are different logical plan with same 
setting. Can anyone know where to investigate the root cause?

  was:
in 
[DS/query70|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query70.sql].
 On hive version(d3b88f6),  i found that the physical plan is different in 
runtime with the same settings.

sometimes the physical plan
{code}
TS[0]-FIL[63]-SEL[2]-RS[43]-JOIN[45]-RS[46]-JOIN[48]-SEL[49]-GBY[50]-RS[51]-GBY[52]-SEL[53]-RS[54]-SEL[55]-PTF[56]-SEL[57]-RS[59]-SEL[60]-LIM[61]-FS[62]
TS[3]-FIL[64]-SEL[5]-RS[44]-JOIN[45]
TS[6]-FIL[65]-SEL[8]-RS[39]-JOIN[41]-RS[47]-JOIN[48]
TS[9]-FIL[67]-SEL[11]-RS[18]-JOIN[20]-RS[21]-JOIN[23]-SEL[24]-GBY[25]-RS[26]-GBY[27]-RS[29]-SEL[30]-PTF[31]-FIL[66]-SEL[32]-GBY[38]-RS[40]-JOIN[41]
TS[12]-FIL[68]-SEL[14]-RS[19]-JOIN[20]
TS[15]-FIL[69]-SEL[17]-RS[22]-JOIN[23]
{code}
 TS\[6\] connects with TS\[9\] on JOIN\[41\] and connects with TS\[0\] on 
JOIN\[48\].

sometimes 
{code}
TS[0]-FIL[63]-RS[3]-JOIN[6]-RS[8]-JOIN[11]-RS[41]-JOIN[44]-SEL[46]-GBY[47]-RS[48]-GBY[49]-RS[50]-GBY[51]-RS[52]-SEL[53]-PTF[54]-SEL[55]-RS[57]-SEL[58]-LIM[59]-FS[60]
TS[1]-FIL[64]-RS[5]-JOIN[6]
TS[2]-FIL[65]-RS[10]-JOIN[11]
TS[12]-FIL[68]-RS[16]-JOIN[19]-RS[20]-JOIN[23]-FIL[67]-SEL[25]-GBY[26]-RS[27]-GBY[28]-RS[29]-GBY[30]-RS[31]-SEL[32]-PTF[33]-FIL[66]-SEL[34]-GBY[39]-RS[43]-JOIN[44]
TS[13]-FIL[69]-RS[18]-JOIN[19]
TS[14]-FIL[70]-RS[22]-JOIN[23]
{code}
TS\[2\] connects with TS\[0\] on JOIN\[11\]

Although TS\[2\] and TS\[6\] has different operator id, they are table store in 
the query.

The difference causes different spark execution plan and different execution 
time.  I'm very confused why there are different physical plan with same 
setting. Can anyone know where to investigate the root cause?


> Different logical plan of same query(TPC-DS/70) with same settings
> --
>
> Key: HIVE-17474
> URL: https://issues.apache.org/jira/browse/HIVE-17474
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>
> in 
> [DS/query70|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query70.sql].
>  On hive version(d3b88f6),  i found that the logical plan is different in 
> runtime with the same settings.
> sometimes the logical plan
> {code}
> TS[0]-FIL[63]-SEL[2]-RS[43]-JOIN[45]-RS[46]-JOIN[48]-SEL[49]-GBY[50]-RS[51]-GBY[52]-SEL[53]-RS[54]-SEL[55]-PTF[56]-SEL[57]-RS[59]-SEL[60]-LIM[61]-FS[62]
> TS[3]-FIL[64]-SEL[5]-RS[44]-JOIN[45]
> TS[6]-FIL[65]-SEL[8]-RS[39]-JOIN[41]-RS[47]-JOIN[48]
> TS[9]-FIL[67]-SEL[11]-RS[18]-JOIN[20]-RS[21]-JOIN[23]-SEL[24]-GBY[25]-RS[26]-GBY[27]-RS[29]-SEL[30]-PTF[31]-FIL[66]-SEL[32]-GBY[38]-RS[40]-JOIN[41]
> TS[12]-FIL[68]-SEL[14]-RS[19]-JOIN[20]
> TS[15]-FIL[69]-SEL[17]-RS[22]-JOIN[23]
> {code}
>  TS\[6\] connects with TS\[9\] on JOIN\[41\] and connects with TS\[0\] on 
> JOIN\[48\].
> sometimes 
> {code}
> TS[0]-FIL[63]-RS[3]-JOIN[6]-RS[8]-JOIN[11]-RS[41]-JOIN[44]-SEL[46]-GBY[47]-RS[48]-GBY[49]-RS[50]-GBY[51]-RS[52]-SEL[53]-PTF[54]-SEL[55]-RS[57]-SEL[58]-LIM[59]-FS[60]
> TS[1]-FIL[64]-RS[5]-JOIN[6]
> TS[2]-FIL[65]-RS[10]-JOIN[11]
> TS[12]-FIL[68]-RS[16]-JOIN[19]-RS[20]-JOIN[23]-FIL[67

[jira] [Comment Edited] (HIVE-17460) `insert overwrite` should support table schema evolution (e.g. add columns)

2017-09-06 Thread Chaozhong Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156479#comment-16156479
 ] 

Chaozhong Yang edited comment on HIVE-17460 at 9/7/17 5:14 AM:
---

`CASCADE` works well for me, I will close this issue. Thanks [~mmccline] 
[~wzheng]


was (Author: debugger87):
`CASCADE` works for me, I will close this issue. Thanks [~mmccline] [~wzheng]

> `insert overwrite` should support table schema evolution (e.g. add columns)
> ---
>
> Key: HIVE-17460
> URL: https://issues.apache.org/jira/browse/HIVE-17460
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17460.2.patch, HIVE-17460.patch
>
>
> In Hive, adding columns into original table is a common use case. However, if 
> we insert overwrite older partitions after adding columns, added columns will 
> not be accessed.
> ```
> create table src_table(
> i int
> )
> PARTITIONED BY (`date` string);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3);
> select * from src_table where `date` = '20170905';
> alter table src_table add columns (bi bigint);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3, 5);
> select * from src_table where `date` = '20170905';
> ```
> The result will be as follows:
> ```
> 3, NULL, '20170905'
> ```
> Obviously, it doesn't meet our expectation. The expected result should be:
> ```
> 3, 5, '20170905'
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17460) `insert overwrite` should support table schema evolution (e.g. add columns)

2017-09-06 Thread Chaozhong Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaozhong Yang updated HIVE-17460:
--
Resolution: Not A Problem
Status: Resolved  (was: Patch Available)

> `insert overwrite` should support table schema evolution (e.g. add columns)
> ---
>
> Key: HIVE-17460
> URL: https://issues.apache.org/jira/browse/HIVE-17460
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17460.2.patch, HIVE-17460.patch
>
>
> In Hive, adding columns into original table is a common use case. However, if 
> we insert overwrite older partitions after adding columns, added columns will 
> not be accessed.
> ```
> create table src_table(
> i int
> )
> PARTITIONED BY (`date` string);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3);
> select * from src_table where `date` = '20170905';
> alter table src_table add columns (bi bigint);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3, 5);
> select * from src_table where `date` = '20170905';
> ```
> The result will be as follows:
> ```
> 3, NULL, '20170905'
> ```
> Obviously, it doesn't meet our expectation. The expected result should be:
> ```
> 3, 5, '20170905'
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17460) `insert overwrite` should support table schema evolution (e.g. add columns)

2017-09-06 Thread Chaozhong Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156479#comment-16156479
 ] 

Chaozhong Yang commented on HIVE-17460:
---

`CASCADE` works for me, I will close this issue. Thanks [~mmccline] [~wzheng]

> `insert overwrite` should support table schema evolution (e.g. add columns)
> ---
>
> Key: HIVE-17460
> URL: https://issues.apache.org/jira/browse/HIVE-17460
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17460.2.patch, HIVE-17460.patch
>
>
> In Hive, adding columns into original table is a common use case. However, if 
> we insert overwrite older partitions after adding columns, added columns will 
> not be accessed.
> ```
> create table src_table(
> i int
> )
> PARTITIONED BY (`date` string);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3);
> select * from src_table where `date` = '20170905';
> alter table src_table add columns (bi bigint);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3, 5);
> select * from src_table where `date` = '20170905';
> ```
> The result will be as follows:
> ```
> 3, NULL, '20170905'
> ```
> Obviously, it doesn't meet our expectation. The expected result should be:
> ```
> 3, 5, '20170905'
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17460) `insert overwrite` should support table schema evolution (e.g. add columns)

2017-09-06 Thread Chaozhong Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156457#comment-16156457
 ] 

Chaozhong Yang commented on HIVE-17460:
---

[~wzheng] [~mmccline] Thanks for your suggestion, I will try CASCADE in DDL. If 
everything goes right, I will close this issue.

> `insert overwrite` should support table schema evolution (e.g. add columns)
> ---
>
> Key: HIVE-17460
> URL: https://issues.apache.org/jira/browse/HIVE-17460
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17460.2.patch, HIVE-17460.patch
>
>
> In Hive, adding columns into original table is a common use case. However, if 
> we insert overwrite older partitions after adding columns, added columns will 
> not be accessed.
> ```
> create table src_table(
> i int
> )
> PARTITIONED BY (`date` string);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3);
> select * from src_table where `date` = '20170905';
> alter table src_table add columns (bi bigint);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3, 5);
> select * from src_table where `date` = '20170905';
> ```
> The result will be as follows:
> ```
> 3, NULL, '20170905'
> ```
> Obviously, it doesn't meet our expectation. The expected result should be:
> ```
> 3, 5, '20170905'
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17468) Shade and package appropriate jackson version for druid storage handler

2017-09-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156449#comment-16156449
 ] 

Hive QA commented on HIVE-17468:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885683/HIVE-17468.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11028 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.druid.TestHiveDruidQueryBasedInputFormat.testTimeZone 
(batchId=247)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6704/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6704/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6704/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885683 - PreCommit-HIVE-Build

> Shade and package appropriate jackson version for druid storage handler
> ---
>
> Key: HIVE-17468
> URL: https://issues.apache.org/jira/browse/HIVE-17468
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-17468.patch, hive-druid-deps.txt
>
>
> Currently we are excluding all the jackson core dependencies coming from 
> druid. This is wrong in my opinion since this will lead to the packaging of 
> unwanted jackson library from other projects.
> As you can see the file hive-druid-deps.txt currently jacskon core is coming 
> from calcite and the version is 2.6.3 which is very different from 2.4.6 used 
> by druid. This patch exclude the unwanted jars and make sure to bring in 
> druid jackson dependency from druid it self.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17460) `insert overwrite` should support table schema evolution (e.g. add columns)

2017-09-06 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156423#comment-16156423
 ] 

Wei Zheng commented on HIVE-17460:
--

[~debugger87] I discussed with Matt regarding this issue as he is the domain 
expert for schema evolution. He's saying you can achieve what you want by 
adding CASCADE in your DDL.

> `insert overwrite` should support table schema evolution (e.g. add columns)
> ---
>
> Key: HIVE-17460
> URL: https://issues.apache.org/jira/browse/HIVE-17460
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17460.2.patch, HIVE-17460.patch
>
>
> In Hive, adding columns into original table is a common use case. However, if 
> we insert overwrite older partitions after adding columns, added columns will 
> not be accessed.
> ```
> create table src_table(
> i int
> )
> PARTITIONED BY (`date` string);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3);
> select * from src_table where `date` = '20170905';
> alter table src_table add columns (bi bigint);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3, 5);
> select * from src_table where `date` = '20170905';
> ```
> The result will be as follows:
> ```
> 3, NULL, '20170905'
> ```
> Obviously, it doesn't meet our expectation. The expected result should be:
> ```
> 3, 5, '20170905'
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17429) Hive JDBC doesn't return rows when querying Impala

2017-09-06 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156404#comment-16156404
 ] 

Aihua Xu commented on HIVE-17429:
-

The tests don't look related to the change. +1. 

> Hive JDBC doesn't return rows when querying Impala
> --
>
> Key: HIVE-17429
> URL: https://issues.apache.org/jira/browse/HIVE-17429
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.1.0
>Reporter: Zach Amsden
>Assignee: Zach Amsden
> Fix For: 2.1.0
>
> Attachments: HIVE-17429.1.patch, HIVE-17429.2.patch
>
>
> The Hive JDBC driver used to return a result set when querying Impala.  Now, 
> instead, it gets data back but interprets the data as query logs instead of a 
> resultSet.  This causes many issues (we see complaints about beeline as well 
> as test failures).
> This appears to be a regression introduced with asynchronous operation 
> against Hive.
> Ideally, we could make both behaviors work.  I have a simple patch that 
> should fix the problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17460) `insert overwrite` should support table schema evolution (e.g. add columns)

2017-09-06 Thread Chaozhong Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156402#comment-16156402
 ] 

Chaozhong Yang commented on HIVE-17460:
---

[~mmccline] Maybe you are right. However, Spark SQL can do right things to meet 
our expectation. We added columns to original table and insert overwrite some 
existed partitions, Spark SQL fetch all column values, whereas, Hive does not. 
Is there any proper solutions?

> `insert overwrite` should support table schema evolution (e.g. add columns)
> ---
>
> Key: HIVE-17460
> URL: https://issues.apache.org/jira/browse/HIVE-17460
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17460.2.patch, HIVE-17460.patch
>
>
> In Hive, adding columns into original table is a common use case. However, if 
> we insert overwrite older partitions after adding columns, added columns will 
> not be accessed.
> ```
> create table src_table(
> i int
> )
> PARTITIONED BY (`date` string);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3);
> select * from src_table where `date` = '20170905';
> alter table src_table add columns (bi bigint);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3, 5);
> select * from src_table where `date` = '20170905';
> ```
> The result will be as follows:
> ```
> 3, NULL, '20170905'
> ```
> Obviously, it doesn't meet our expectation. The expected result should be:
> ```
> 3, 5, '20170905'
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-06 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17466:

Status: Patch Available  (was: Open)

> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17387) implement Tez AM registry in Hive

2017-09-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156383#comment-16156383
 ] 

Hive QA commented on HIVE-17387:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885665/HIVE-17387.01.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 37 failed/errored test(s), 11028 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=234)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testChangeGuaranteedTotal
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testConcurrentUpdateWithError
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testConcurrentUpdates
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testConcurrentUpdatesBeforeMessage
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDelayedLocalityDelayedAllocation
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDelayedLocalityFallbackToNonLocal
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDelayedLocalityNodeCommErrorDelayedAllocation
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDelayedLocalityNodeCommErrorImmediateAllocation
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDelayedQueeTaskSelectionAfterScheduled
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testForceLocalityTest1
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testForcedLocalityMultiplePreemptionsSameHost1
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testForcedLocalityMultiplePreemptionsSameHost2
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testForcedLocalityNotInDelayedQueue
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testForcedLocalityPreemption
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testForcedLocalityUnknownHost
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testGuaranteedScheduling
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testGuaranteedTransfer
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testHostPreferenceMissesConsistentPartialAlive
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testHostPreferenceMissesConsistentRollover
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testHostPreferenceUnknownAndNotSpecified
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testNoForceLocalityCounterTest1
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testNoLocalityNotInDelayedQueue
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testNodeDisabled
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testNodeReEnabled
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testPreemption
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testPreemptionChoiceTimeOrdering
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testSimpleLocalAllocation
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testSimpleNoLocalityAllocation
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testUpdateOnFinishingTask
 (batchId=245)
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testUpdateWithError
 (batchId=245)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6702/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6702/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6702/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hi

[jira] [Updated] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-06 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17466:

Attachment: (was: HIVE-17466.1.patch)

> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-06 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17466:

Attachment: HIVE-17466.1.patch

> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-06 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17466:

Status: Open  (was: Patch Available)

> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-17473) Hive WM: implement workload management pools

2017-09-06 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156345#comment-16156345
 ] 

Sergey Shelukhin edited comment on HIVE-17473 at 9/7/17 2:37 AM:
-

On top of the low level WM patch. Needs tests, and also to use real workload 
management schema instead of dummy classes eventually. cc [~prasanth_j] for 
reference this is what I'm using for now. WorkloadManager has some dummy 
classes I'm using where needed.


was (Author: sershe):
On top of the low level WM patch.

> Hive WM: implement workload management pools
> 
>
> Key: HIVE-17473
> URL: https://issues.apache.org/jira/browse/HIVE-17473
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17473.WIP.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17473) Hive WM: implement workload management pools

2017-09-06 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156346#comment-16156346
 ] 

Sergey Shelukhin commented on HIVE-17473:
-

Hmm, I realized we still don't have an umbrella and one pager. I was intending 
to create them after HIVE-17386 patch. 
Will do tomorrow.

> Hive WM: implement workload management pools
> 
>
> Key: HIVE-17473
> URL: https://issues.apache.org/jira/browse/HIVE-17473
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17473.WIP.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17473) Hive WM: implement workload management pools

2017-09-06 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17473:

Attachment: HIVE-17473.WIP.patch

On top of the low level WM patch.

> Hive WM: implement workload management pools
> 
>
> Key: HIVE-17473
> URL: https://issues.apache.org/jira/browse/HIVE-17473
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17473.WIP.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17473) Hive WM: implement workload management pools

2017-09-06 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-17473:
---


> Hive WM: implement workload management pools
> 
>
> Key: HIVE-17473
> URL: https://issues.apache.org/jira/browse/HIVE-17473
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17468) Shade and package appropriate jackson version for druid storage handler

2017-09-06 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156319#comment-16156319
 ] 

Sergey Shelukhin commented on HIVE-17468:
-

Hmm.. wouldn't this in turn break the calcite dependency, especially since it's 
using the newer version?

> Shade and package appropriate jackson version for druid storage handler
> ---
>
> Key: HIVE-17468
> URL: https://issues.apache.org/jira/browse/HIVE-17468
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-17468.patch, hive-druid-deps.txt
>
>
> Currently we are excluding all the jackson core dependencies coming from 
> druid. This is wrong in my opinion since this will lead to the packaging of 
> unwanted jackson library from other projects.
> As you can see the file hive-druid-deps.txt currently jacskon core is coming 
> from calcite and the version is 2.6.3 which is very different from 2.4.6 used 
> by druid. This patch exclude the unwanted jars and make sure to bring in 
> druid jackson dependency from druid it self.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156318#comment-16156318
 ] 

Hive QA commented on HIVE-17466:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885664/HIVE-17466.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6701/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6701/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6701/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-09-07 02:00:42.308
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-6701/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-09-07 02:00:42.311
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 849fa02 HIVE-17455: External LLAP client: connection to HS2 
should be kept open until explicitly closed (Jason Dere, reviewed by Sergey 
Shelukhin)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 849fa02 HIVE-17455: External LLAP client: connection to HS2 
should be kept open until explicitly closed (Jason Dere, reviewed by Sergey 
Shelukhin)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-09-07 02:00:47.758
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p1
patching file 
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
patching file 
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
patching file 
metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
patching file 
metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
patching file metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java
patching file 
metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
patching file 
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
patching file 
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
patching file 
standalone-metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp
patching file standalone-metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h
patching file 
standalone-metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp
patching file 
standalone-metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp
patching file standalone-metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h
patching file 
standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AbortTxnsRequest.java
patching file 
standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AddDynamicPartitions.java
patching file 
standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClearFileMetadataRequest.java
patching file 
standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClientCapabilities.java
patching file 
standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionRequest.java
patching file 
standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FireEventRequest.java
patching file 
standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Function.java
patching file 
standalone-metastore/src/gen/thrift/gen-javabe

[jira] [Commented] (HIVE-17459) View deletion operation failed to replicate on target cluster

2017-09-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156316#comment-16156316
 ] 

Hive QA commented on HIVE-17459:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885660/HIVE-17459.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11028 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[drop_table_failure2]
 (batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6700/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6700/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6700/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885660 - PreCommit-HIVE-Build

> View deletion operation failed to replicate on target cluster
> -
>
> Key: HIVE-17459
> URL: https://issues.apache.org/jira/browse/HIVE-17459
> Project: Hive
>  Issue Type: Bug
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17459.1.patch
>
>
> View dropping is not replicated during incremental repl.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17414) HoS DPP + Vectorization generates invalid explain plan due to CombineEquivalentWorkResolver

2017-09-06 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156308#comment-16156308
 ] 

liyunzhang_intel commented on HIVE-17414:
-

thanks for [~lirui] and [~stakiar]'s review

> HoS DPP + Vectorization generates invalid explain plan due to 
> CombineEquivalentWorkResolver
> ---
>
> Key: HIVE-17414
> URL: https://issues.apache.org/jira/browse/HIVE-17414
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: liyunzhang_intel
> Fix For: 3.0.0
>
> Attachments: HIVE-17414.1.patch, HIVE-17414.2.patch, 
> HIVE-17414.3.patch, HIVE-17414.4.patch, HIVE-17414.5.patch, HIVE-17414.patch
>
>
> Similar to HIVE-16948, the following query generates an invalid explain plan 
> when HoS DPP is enabled + vectorization:
> {code:sql}
> select ds from (select distinct(ds) as ds from srcpart union all select 
> distinct(ds) as ds from srcpart) s where s.ds in (select max(srcpart.ds) from 
> srcpart union all select min(srcpart.ds) from srcpart)
> {code}
> Explain Plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>  A masked pattern was here 
>   Vertices:
> Map 10
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order:
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Execution mode: vectorized
> Map 12
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order:
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Execution mode: vectorized
> Reducer 11
> Execution mode: vectorized
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColu

[jira] [Commented] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter

2017-09-06 Thread Junjie Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156286#comment-16156286
 ] 

Junjie Chen commented on HIVE-17261:


Thanks [~Ferd]
As for 4, since jobconf is a member variable, so it doesn't need to explicit 
transfer.

> Hive use deprecated ParquetInputSplit constructor which blocked parquet 
> dictionary filter
> -
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
> Attachments: HIVE-17261.2.patch, HIVE-17261.3.patch, 
> HIVE-17261.4.patch, HIVE-17261.5.patch, HIVE-17261.diff, HIVE-17261.patch
>
>
> Hive use deprecated ParquetInputSplit in 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary 
> filter in parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter

2017-09-06 Thread Junjie Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junjie Chen updated HIVE-17261:
---
Attachment: HIVE-17261.5.patch

> Hive use deprecated ParquetInputSplit constructor which blocked parquet 
> dictionary filter
> -
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
> Attachments: HIVE-17261.2.patch, HIVE-17261.3.patch, 
> HIVE-17261.4.patch, HIVE-17261.5.patch, HIVE-17261.diff, HIVE-17261.patch
>
>
> Hive use deprecated ParquetInputSplit in 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary 
> filter in parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17472) Drop-partition for multi-level partition fails, if data does not exist.

2017-09-06 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17472:

Status: Patch Available  (was: Open)

> Drop-partition for multi-level partition fails, if data does not exist.
> ---
>
> Key: HIVE-17472
> URL: https://issues.apache.org/jira/browse/HIVE-17472
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17472.1.patch, HIVE-17472.2.patch
>
>
> Raising this on behalf of [~cdrome] and [~selinazh]. 
> Here's how to reproduce the problem:
> {code:sql}
> CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY ( dt STRING, 
> region STRING ) STORED AS RCFILE LOCATION '/tmp/foobar';
> ALTER TABLE foobar ADD PARTITION ( dt='1', region='A' ) ;
> dfs -rm -R -skipTrash /tmp/foobar/dt=1;
> ALTER TABLE foobar DROP PARTITION ( dt='1' );
> {code}
> This causes a client-side error as follows:
> {code}
> 15/02/26 23:08:32 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unknown error. Please check 
> logs.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17472) Drop-partition for multi-level partition fails, if data does not exist.

2017-09-06 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17472:

Attachment: HIVE-17472.2.patch

And now, with tests.

> Drop-partition for multi-level partition fails, if data does not exist.
> ---
>
> Key: HIVE-17472
> URL: https://issues.apache.org/jira/browse/HIVE-17472
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17472.1.patch, HIVE-17472.2.patch
>
>
> Raising this on behalf of [~cdrome] and [~selinazh]. 
> Here's how to reproduce the problem:
> {code:sql}
> CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY ( dt STRING, 
> region STRING ) STORED AS RCFILE LOCATION '/tmp/foobar';
> ALTER TABLE foobar ADD PARTITION ( dt='1', region='A' ) ;
> dfs -rm -R -skipTrash /tmp/foobar/dt=1;
> ALTER TABLE foobar DROP PARTITION ( dt='1' );
> {code}
> This causes a client-side error as follows:
> {code}
> 15/02/26 23:08:32 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unknown error. Please check 
> logs.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17472) Drop-partition for multi-level partition fails, if data does not exist.

2017-09-06 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17472:

Status: Open  (was: Patch Available)

> Drop-partition for multi-level partition fails, if data does not exist.
> ---
>
> Key: HIVE-17472
> URL: https://issues.apache.org/jira/browse/HIVE-17472
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17472.1.patch
>
>
> Raising this on behalf of [~cdrome] and [~selinazh]. 
> Here's how to reproduce the problem:
> {code:sql}
> CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY ( dt STRING, 
> region STRING ) STORED AS RCFILE LOCATION '/tmp/foobar';
> ALTER TABLE foobar ADD PARTITION ( dt='1', region='A' ) ;
> dfs -rm -R -skipTrash /tmp/foobar/dt=1;
> ALTER TABLE foobar DROP PARTITION ( dt='1' );
> {code}
> This causes a client-side error as follows:
> {code}
> 15/02/26 23:08:32 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unknown error. Please check 
> logs.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17152) Improve security of random generator for HS2 cookies

2017-09-06 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156259#comment-16156259
 ] 

Thejas M Nair commented on HIVE-17152:
--

+1


> Improve security of random generator for HS2 cookies
> 
>
> Key: HIVE-17152
> URL: https://issues.apache.org/jira/browse/HIVE-17152
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17152.1.patch
>
>
> The random number generated is used as a secret to append to a sequence and 
> SHA to implement a CookieSigner. If this is attackable, then it's possible 
> for an attacker to sign a cookie as if we had. We should fix this and use 
> SecureRandom as a stronger random function .
> HTTPAuthUtils has a similar issue. If that is attackable, an attacker might 
> be able to create a similar cookie. Paired with the above issue with the 
> CookieSigner, it could reasonably spoof a HS2 cookie.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17450) rename TestTxnCommandsBase

2017-09-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156249#comment-16156249
 ] 

Hive QA commented on HIVE-17450:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885655/HIVE-17450.02.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11014 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=234)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=103)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6699/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6699/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6699/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885655 - PreCommit-HIVE-Build

> rename TestTxnCommandsBase 
> ---
>
> Key: HIVE-17450
> URL: https://issues.apache.org/jira/browse/HIVE-17450
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Peter Vary
> Attachments: HIVE-17450.02.patch, HIVE-17450.patch
>
>
> TestTxnCommandsBase is an abstract class, added in HIVE-17205; it matches the 
> maven test pattern...because of that there is a failining test in every test 
> output



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17448) ArrayIndexOutOfBoundsException on ORC tables after adding a struct field

2017-09-06 Thread Nikolay Sokolov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay Sokolov updated HIVE-17448:
---
Description: 
When ORC files have been created with older schema, which had smaller set of 
struct fields, and schema have been changed to one with more struct fields, and 
there are sibling fields of struct type going after struct itself, 
ArrayIndexOutOfBoundsException is being thrown. Steps to reproduce:
{code:none}
create external table test_broken_struct(a struct, b int) 
stored as orc;
insert into table test_broken_struct 
select named_struct("f1", 1, "f2", 2), 3;
drop table test_broken_struct;
create external table test_broken_struct(a struct, b 
int) stored as orc;
select * from test_broken_struct;
{code}
Same scenario is not causing crash on hive 0.14.

Debug log and stack trace:
{code:none}
2017-09-07T00:21:40,266  INFO [main] orc.OrcInputFormat: Using schema evolution 
configuration variables schema.evol
ution.columns [a, b] / schema.evolution.columns.types 
[struct, int] (isAcidRead false)
2017-09-07T00:21:40,267 DEBUG [main] orc.OrcInputFormat: No ORC pushdown 
predicate
2017-09-07T00:21:40,267  INFO [main] orc.ReaderImpl: Reading ORC rows from 
hdfs://cluster-7199-m/user/hive/warehous
e/test_broken_struct/00_0 with {include: [true, true, true, true, true], 
offset: 3, length: 159, schema: struct
,b:int>}
Failed with exception 
java.io.IOException:java.lang.ArrayIndexOutOfBoundsException: 5
2017-09-07T00:21:40,273 ERROR [main] CliDriver: Failed with exception 
java.io.IOException:java.lang.ArrayIndexOutOf
BoundsException: 5
java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 5
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2098)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
at 
org.apache.orc.impl.SchemaEvolution.buildConversionFileTypesArray(SchemaEvolution.java:195)
at 
org.apache.orc.impl.SchemaEvolution.buildConversionFileTypesArray(SchemaEvolution.java:253)
at org.apache.orc.impl.SchemaEvolution.(SchemaEvolution.java:59)
at 
org.apache.orc.impl.RecordReaderImpl.(RecordReaderImpl.java:149)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:63)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:87)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:314)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.(OrcInputFormat.java:225)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1691)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:69
5)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
... 15 more
{code}

  was:
When ORC files have been created with older schema, which had smaller set of 
struct fields, and schema have been changed to one with more struct fields, and 
there are sibling fields of struct type going after struct itself, 
ArrayIndexOutOfBoundsException is being thrown. Steps to reproduce:
{code:none}
create external table test_broken_struct(a struct, b int);
insert into table test_broken_struct 
select named_struct("f1", 1, "f2", 2), 3;
drop table test_broken_struct;
create external table test_broken_struct(a struct, b 
int);
select * from test_broken_struct;
{code}

Same scenario is not causing crash on hive 0.14.


> ArrayIndexOutOfBoundsException on ORC tables after adding a struct field
> --

[jira] [Commented] (HIVE-17460) `insert overwrite` should support table schema evolution (e.g. add columns)

2017-09-06 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156233#comment-16156233
 ] 

Matt McCline commented on HIVE-17460:
-

I don't think this is right -- you will end up with upset customers because 
query results will be different.

Unfortunately, the current semantics of adding a column are that the default 
behavior is RESTRICT not CASCADE.  RESTRICT means the partition schema's do not 
get updated with the new columns.  Thus, the new columns default to NULL when 
queried.  In order to get the behavior you are talking about you would need to 
specify the CASCADE option.

So I'm a -1 on this change.

[~wzheng]

> `insert overwrite` should support table schema evolution (e.g. add columns)
> ---
>
> Key: HIVE-17460
> URL: https://issues.apache.org/jira/browse/HIVE-17460
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17460.2.patch, HIVE-17460.patch
>
>
> In Hive, adding columns into original table is a common use case. However, if 
> we insert overwrite older partitions after adding columns, added columns will 
> not be accessed.
> ```
> create table src_table(
> i int
> )
> PARTITIONED BY (`date` string);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3);
> select * from src_table where `date` = '20170905';
> alter table src_table add columns (bi bigint);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3, 5);
> select * from src_table where `date` = '20170905';
> ```
> The result will be as follows:
> ```
> 3, NULL, '20170905'
> ```
> Obviously, it doesn't meet our expectation. The expected result should be:
> ```
> 3, 5, '20170905'
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17421) Clear incorrect stats after replication

2017-09-06 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156231#comment-16156231
 ] 

Daniel Dai commented on HIVE-17421:
---

[~anishek], can you review?

> Clear incorrect stats after replication
> ---
>
> Key: HIVE-17421
> URL: https://issues.apache.org/jira/browse/HIVE-17421
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-17421.1.patch, HIVE-17421.2.patch
>
>
> After replication, some stats summary are incorrect. If 
> hive.compute.query.using.stats set to true, we will get wrong result on the 
> destination side.
> This will not happen with bootstrap replication. This is because stats 
> summary are in table properties and will be replicated to the destination. 
> However, in incremental replication, this won't work. When creating table, 
> the stats summary are empty (eg, numRows=0). Later when we insert data, stats 
> summary are updated with 
> update_table_column_statistics/update_partition_column_statistics, however, 
> both events are not captured in incremental replication. Thus on the 
> destination side, we will get count\(*\)=0. The simple solution is to remove 
> COLUMN_STATS_ACCURATE property after incremental replication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17468) Shade and package appropriate jackson version for druid storage handler

2017-09-06 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156228#comment-16156228
 ] 

slim bouguerra commented on HIVE-17468:
---

It was pulling random stuff from transitive dependencies 
bq. slim bouguerra, before we pulled the calcite-druid package, were we packing 
then the hive jackson dependencies into the uber jar? How was that ever working

> Shade and package appropriate jackson version for druid storage handler
> ---
>
> Key: HIVE-17468
> URL: https://issues.apache.org/jira/browse/HIVE-17468
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-17468.patch, hive-druid-deps.txt
>
>
> Currently we are excluding all the jackson core dependencies coming from 
> druid. This is wrong in my opinion since this will lead to the packaging of 
> unwanted jackson library from other projects.
> As you can see the file hive-druid-deps.txt currently jacskon core is coming 
> from calcite and the version is 2.6.3 which is very different from 2.4.6 used 
> by druid. This patch exclude the unwanted jars and make sure to bring in 
> druid jackson dependency from druid it self.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17472) Drop-partition for multi-level partition fails, if data does not exist.

2017-09-06 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17472:

Affects Version/s: 3.0.0
   2.2.0
   Status: Patch Available  (was: Open)

Submitting for tests.

> Drop-partition for multi-level partition fails, if data does not exist.
> ---
>
> Key: HIVE-17472
> URL: https://issues.apache.org/jira/browse/HIVE-17472
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17472.1.patch
>
>
> Raising this on behalf of [~cdrome] and [~selinazh]. 
> Here's how to reproduce the problem:
> {code:sql}
> CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY ( dt STRING, 
> region STRING ) STORED AS RCFILE LOCATION '/tmp/foobar';
> ALTER TABLE foobar ADD PARTITION ( dt='1', region='A' ) ;
> dfs -rm -R -skipTrash /tmp/foobar/dt=1;
> ALTER TABLE foobar DROP PARTITION ( dt='1' );
> {code}
> This causes a client-side error as follows:
> {code}
> 15/02/26 23:08:32 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unknown error. Please check 
> logs.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17472) Drop-partition for multi-level partition fails, if data does not exist.

2017-09-06 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17472:

Attachment: HIVE-17472.1.patch

> Drop-partition for multi-level partition fails, if data does not exist.
> ---
>
> Key: HIVE-17472
> URL: https://issues.apache.org/jira/browse/HIVE-17472
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17472.1.patch
>
>
> Raising this on behalf of [~cdrome] and [~selinazh]. 
> Here's how to reproduce the problem:
> {code:sql}
> CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY ( dt STRING, 
> region STRING ) STORED AS RCFILE LOCATION '/tmp/foobar';
> ALTER TABLE foobar ADD PARTITION ( dt='1', region='A' ) ;
> dfs -rm -R -skipTrash /tmp/foobar/dt=1;
> ALTER TABLE foobar DROP PARTITION ( dt='1' );
> {code}
> This causes a client-side error as follows:
> {code}
> 15/02/26 23:08:32 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unknown error. Please check 
> logs.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17472) Drop-partition for multi-level partition fails, if data does not exist.

2017-09-06 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17472:
---


> Drop-partition for multi-level partition fails, if data does not exist.
> ---
>
> Key: HIVE-17472
> URL: https://issues.apache.org/jira/browse/HIVE-17472
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>
> Raising this on behalf of [~cdrome] and [~selinazh]. 
> Here's how to reproduce the problem:
> {code:sql}
> CREATE TABLE foobar ( foo STRING, bar STRING ) PARTITIONED BY ( dt STRING, 
> region STRING ) STORED AS RCFILE LOCATION '/tmp/foobar';
> ALTER TABLE foobar ADD PARTITION ( dt='1', region='A' ) ;
> dfs -rm -R -skipTrash /tmp/foobar/dt=1;
> ALTER TABLE foobar DROP PARTITION ( dt='1' );
> {code}
> This causes a client-side error as follows:
> {code}
> 15/02/26 23:08:32 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unknown error. Please check 
> logs.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17460) `insert overwrite` should support table schema evolution (e.g. add columns)

2017-09-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156184#comment-16156184
 ] 

Hive QA commented on HIVE-17460:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885647/HIVE-17460.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 11029 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_cascade] 
(batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_5] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat18]
 (batchId=69)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part_all_complex]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part_all_primitive]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_complex]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_primitive]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_complex]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_primitive]
 (batchId=158)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning 
(batchId=291)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDecimalXY 
(batchId=183)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteTimestamp 
(batchId=183)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6698/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6698/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6698/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 20 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885647 - PreCommit-HIVE-Build

> `insert overwrite` should support table schema evolution (e.g. add columns)
> ---
>
> Key: HIVE-17460
> URL: https://issues.apache.org/jira/browse/HIVE-17460
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17460.2.patch, HIVE-17460.patch
>
>
> In Hive, adding columns into original table is a common use case. However, if 
> we insert overwrite older partitions after adding columns, added columns will 
> not be accessed.
> ```
> create table src_table(
> i int
> )
> PARTITIONED BY (`date` string);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3);
> select * from src_table where `date` = '20170905';
> alter table src_table add columns (bi bigint);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3, 5);
> select * from src_table where `date` = '20170905';
> ```
> The result will be as follows:
> ```
> 3, NULL, '20170905'
> ```
> Obviously, it doesn't meet our expectation. The expected result should be:
> ```
> 3, 5, '20170905'
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17471) Vectorization: Enable hive.vectorized.row.identifier.enabled to true by default

2017-09-06 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156174#comment-16156174
 ] 

Sergey Shelukhin commented on HIVE-17471:
-

[~teddy.choi] should this setting be turned on?

> Vectorization: Enable hive.vectorized.row.identifier.enabled to true by 
> default
> ---
>
> Key: HIVE-17471
> URL: https://issues.apache.org/jira/browse/HIVE-17471
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>
> We set it disabled in https://issues.apache.org/jira/browse/HIVE-17116 
> "Vectorization: Add infrastructure for vectorization of ROW__ID struct"
> But forgot to turn it on to true by default in Teddy's ACID ROW__ID work... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17471) Vectorization: Enable hive.vectorized.row.identifier.enabled to true by default

2017-09-06 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156173#comment-16156173
 ] 

Sergey Shelukhin commented on HIVE-17471:
-

hmm

> Vectorization: Enable hive.vectorized.row.identifier.enabled to true by 
> default
> ---
>
> Key: HIVE-17471
> URL: https://issues.apache.org/jira/browse/HIVE-17471
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>
> We set it disabled in https://issues.apache.org/jira/browse/HIVE-17116 
> "Vectorization: Add infrastructure for vectorization of ROW__ID struct"
> But forgot to turn it on to true by default in Teddy's ACID ROW__ID work... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-17468) Shade and package appropriate jackson version for druid storage handler

2017-09-06 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156170#comment-16156170
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-17468 at 9/6/17 11:04 PM:
-

{quote}
if we exclude all the druid stuff then what is left is not what we want is 
going to be either hive jackson ones or from other transitive dependency. 
{quote}

[~bslim], before we pulled the calcite-druid package, were we packing then the 
hive jackson dependencies into the uber jar? How was that ever working?


was (Author: jcamachorodriguez):
{quote}
if we exclude all the druid stuff then what is left is not what we want is 
going to be either hive jackson ones or from other transitive dependency. 
{quote}

[~bslim], before we pulled the calcite-druid package, were we packing then the 
hive jackson dependencies into the uber jar? How was that working?

> Shade and package appropriate jackson version for druid storage handler
> ---
>
> Key: HIVE-17468
> URL: https://issues.apache.org/jira/browse/HIVE-17468
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-17468.patch, hive-druid-deps.txt
>
>
> Currently we are excluding all the jackson core dependencies coming from 
> druid. This is wrong in my opinion since this will lead to the packaging of 
> unwanted jackson library from other projects.
> As you can see the file hive-druid-deps.txt currently jacskon core is coming 
> from calcite and the version is 2.6.3 which is very different from 2.4.6 used 
> by druid. This patch exclude the unwanted jars and make sure to bring in 
> druid jackson dependency from druid it self.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17471) Vectorization: Enable hive.vectorized.row.identifier.enabled to true by default

2017-09-06 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-17471:
---


> Vectorization: Enable hive.vectorized.row.identifier.enabled to true by 
> default
> ---
>
> Key: HIVE-17471
> URL: https://issues.apache.org/jira/browse/HIVE-17471
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>
> We set it disabled in https://issues.apache.org/jira/browse/HIVE-17116 
> "Vectorization: Add infrastructure for vectorization of ROW__ID struct"
> But forgot to turn it on to true by default in Teddy's ACID ROW__ID work... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17468) Shade and package appropriate jackson version for druid storage handler

2017-09-06 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156170#comment-16156170
 ] 

Jesus Camacho Rodriguez commented on HIVE-17468:


{quote}
if we exclude all the druid stuff then what is left is not what we want is 
going to be either hive jackson ones or from other transitive dependency. 
{quote}

[~bslim], before we pulled the calcite-druid package, were we packing then the 
hive jackson dependencies into the uber jar? How was that working?

> Shade and package appropriate jackson version for druid storage handler
> ---
>
> Key: HIVE-17468
> URL: https://issues.apache.org/jira/browse/HIVE-17468
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-17468.patch, hive-druid-deps.txt
>
>
> Currently we are excluding all the jackson core dependencies coming from 
> druid. This is wrong in my opinion since this will lead to the packaging of 
> unwanted jackson library from other projects.
> As you can see the file hive-druid-deps.txt currently jacskon core is coming 
> from calcite and the version is 2.6.3 which is very different from 2.4.6 used 
> by druid. This patch exclude the unwanted jars and make sure to bring in 
> druid jackson dependency from druid it self.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17382) Change startsWith relation introduced in HIVE-17316

2017-09-06 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156167#comment-16156167
 ] 

Lefty Leverenz commented on HIVE-17382:
---

No-doc note:  This renames two configuration parameters that were created by 
HIVE-16146.  They are for internal use only, so no documentation is needed.

*  hive.in.test.short.logs  ->  hive.testing.short.logs
* hive.in.test.remove.logs  ->  hive.testing.remove.logs

> Change startsWith relation introduced in HIVE-17316
> ---
>
> Key: HIVE-17382
> URL: https://issues.apache.org/jira/browse/HIVE-17382
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-17382.01.patch, HIVE-17382.02.patch, 
> HIVE-17382.03.patch, HIVE-17382.04.patch
>
>
> In HiveConf the new name should be checked if it starts with a 
> restricted/hidden variable prefix and not vice-versa.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16146) If possible find a better way to filter the TestBeeLineDriver output

2017-09-06 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156168#comment-16156168
 ] 

Lefty Leverenz commented on HIVE-16146:
---

HIVE-17382 renames the configs (no doc needed, internal use only):

*  hive.in.test.short.logs  ->  hive.testing.short.logs
* hive.in.test.remove.logs  ->  hive.testing.remove.logs

> If possible find a better way to filter the TestBeeLineDriver output
> 
>
> Key: HIVE-16146
> URL: https://issues.apache.org/jira/browse/HIVE-16146
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16146.02.patch, HIVE-16146.03.patch, 
> HIVE-16146.04.patch, HIVE-16146.05.patch, HIVE-16146.06.patch, 
> HIVE-16146.patch
>
>
> Currently we apply a blacklist to filter the output of the BeeLine Qtest runs.
> It might be a good idea to go thorough of the possibilities and find a better 
> way, if possible.
> I think our main goal could be for the TestBeeLineDriver test output to match 
> the TestCliDriver output of the came query file. Or if it is not possible, 
> then at least a similar one
> CC: [~vihangk1]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17366) Constraint replication in bootstrap

2017-09-06 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-17366:
--
Attachment: HIVE-17366.2.patch

Addressing Sankar's review comments.

> Constraint replication in bootstrap
> ---
>
> Key: HIVE-17366
> URL: https://issues.apache.org/jira/browse/HIVE-17366
> Project: Hive
>  Issue Type: New Feature
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-17366.1.patch, HIVE-17366.2.patch
>
>
> Incremental constraint replication is tracked in HIVE-15705. This is to track 
> the bootstrap replication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-13989) Extended ACLs are not handled according to specification

2017-09-06 Thread Chris Drome (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156150#comment-16156150
 ] 

Chris Drome commented on HIVE-13989:


Ran tests local before and after the patch on branch-2 and none of the failures 
appear to be attributable to the patch:

|| Test ||  branch-2 HEAD (b3a6e52) || branch-2 HEAD + HIVE-13989 ||
| org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] | PASSED | 
PASSED |
| 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs]
 | FAILED | FAILED |
| org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] | 
FAILED | FAILED |
| org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
| FAILED | FAILED |
| 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 | PASSED | PASSED |
| org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure | FAILED | 
FAILED |
| org.apache.hive.jdbc.TestJdbcDriver2.testYarnATSGuid | PASSED | PASSED |

> Extended ACLs are not handled according to specification
> 
>
> Key: HIVE-13989
> URL: https://issues.apache.org/jira/browse/HIVE-13989
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13989.1-branch-1.patch, HIVE-13989.1.patch, 
> HIVE-13989.4-branch-2.2.patch, HIVE-13989.4-branch-2.patch, 
> HIVE-13989-branch-1.patch, HIVE-13989-branch-2.2.patch, 
> HIVE-13989-branch-2.2.patch, HIVE-13989-branch-2.2.patch
>
>
> Hive takes two approaches to working with extended ACLs depending on whether 
> data is being produced via a Hive query or HCatalog APIs. A Hive query will 
> run an FsShell command to recursively set the extended ACLs for a directory 
> sub-tree. HCatalog APIs will attempt to build up the directory sub-tree 
> programmatically and runs some code to set the ACLs to match the parent 
> directory.
> Some incorrect assumptions were made when implementing the extended ACLs 
> support. Refer to https://issues.apache.org/jira/browse/HDFS-4685 for the 
> design documents of extended ACLs in HDFS. These documents model the 
> implementation after the POSIX implementation on Linux, which can be found at 
> http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html.
> The code for setting extended ACLs via HCatalog APIs is found in 
> HdfsUtils.java:
> {code}
> if (aclEnabled) {
>   aclStatus =  sourceStatus.getAclStatus();
>   if (aclStatus != null) {
> LOG.trace(aclStatus.toString());
> aclEntries = aclStatus.getEntries();
> removeBaseAclEntries(aclEntries);
> //the ACL api's also expect the tradition user/group/other permission 
> in the form of ACL
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.USER, 
> sourcePerm.getUserAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.GROUP, 
> sourcePerm.getGroupAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.OTHER, 
> sourcePerm.getOtherAction()));
>   }
> }
> {code}
> We found that DEFAULT extended ACL rules were not being inherited properly by 
> the directory sub-tree, so the above code is incomplete because it 
> effectively drops the DEFAULT rules. The second problem is with the call to 
> {{sourcePerm.getGroupAction()}}, which is incorrect in the case of extended 
> ACLs. When extended ACLs are used the GROUP permission is replaced with the 
> extended ACL mask. So the above code will apply the wrong permissions to the 
> GROUP. Instead the correct GROUP permissions now need to be pulled from the 
> AclEntry as returned by {{getAclStatus().getEntries()}}. See the 
> implementation of the new method {{getDefaultAclEntries}} for details.
> Similar issues exist with the HCatalog API. None of the API accounts for 
> setting extended ACLs on the directory sub-tree. The changes to the HCatalog 
> API allow the extended ACLs to be passed into the required methods similar to 
> how basic permissions are passed in. When building the directory sub-tree the 
> extended ACLs of the table directory are inherited by all sub-directories, 
> including the DEFAULT rules.
> Replicating the problem:
> Create a table to write data into (I will use acl_test as the destination and 
> words_text as the source) and set the ACLs as follows:
> {noformat}
> $ hdfs dfs -setfacl -m 
> default:user::rwx,default:group::r-x,default:mask::rwx,default:user:hdfs:rwx,group::r-x,user:hdfs:rwx
>  /user/cdrome/hive/acl_test
> $ hdfs dfs -ls -d /user/cdrome/hive/acl_test
> drwxrwx---+  - cdrome hdfs  0 2016-07-13 20:36 
> /user/cdrome/hive/acl_test
> $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test
> # file

[jira] [Comment Edited] (HIVE-13989) Extended ACLs are not handled according to specification

2017-09-06 Thread Chris Drome (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156150#comment-16156150
 ] 

Chris Drome edited comment on HIVE-13989 at 9/6/17 10:46 PM:
-

Ran tests locally before and after the patch on branch-2 and none of the 
failures appear to be attributable to the patch:

|| Test ||  branch-2 HEAD (b3a6e52) || branch-2 HEAD + HIVE-13989 ||
| org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] | PASSED | 
PASSED |
| 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs]
 | FAILED | FAILED |
| org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] | 
FAILED | FAILED |
| org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
| FAILED | FAILED |
| 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 | PASSED | PASSED |
| org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure | FAILED | 
FAILED |
| org.apache.hive.jdbc.TestJdbcDriver2.testYarnATSGuid | PASSED | PASSED |


was (Author: cdrome):
Ran tests local before and after the patch on branch-2 and none of the failures 
appear to be attributable to the patch:

|| Test ||  branch-2 HEAD (b3a6e52) || branch-2 HEAD + HIVE-13989 ||
| org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] | PASSED | 
PASSED |
| 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs]
 | FAILED | FAILED |
| org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] | 
FAILED | FAILED |
| org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
| FAILED | FAILED |
| 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 | PASSED | PASSED |
| org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure | FAILED | 
FAILED |
| org.apache.hive.jdbc.TestJdbcDriver2.testYarnATSGuid | PASSED | PASSED |

> Extended ACLs are not handled according to specification
> 
>
> Key: HIVE-13989
> URL: https://issues.apache.org/jira/browse/HIVE-13989
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13989.1-branch-1.patch, HIVE-13989.1.patch, 
> HIVE-13989.4-branch-2.2.patch, HIVE-13989.4-branch-2.patch, 
> HIVE-13989-branch-1.patch, HIVE-13989-branch-2.2.patch, 
> HIVE-13989-branch-2.2.patch, HIVE-13989-branch-2.2.patch
>
>
> Hive takes two approaches to working with extended ACLs depending on whether 
> data is being produced via a Hive query or HCatalog APIs. A Hive query will 
> run an FsShell command to recursively set the extended ACLs for a directory 
> sub-tree. HCatalog APIs will attempt to build up the directory sub-tree 
> programmatically and runs some code to set the ACLs to match the parent 
> directory.
> Some incorrect assumptions were made when implementing the extended ACLs 
> support. Refer to https://issues.apache.org/jira/browse/HDFS-4685 for the 
> design documents of extended ACLs in HDFS. These documents model the 
> implementation after the POSIX implementation on Linux, which can be found at 
> http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html.
> The code for setting extended ACLs via HCatalog APIs is found in 
> HdfsUtils.java:
> {code}
> if (aclEnabled) {
>   aclStatus =  sourceStatus.getAclStatus();
>   if (aclStatus != null) {
> LOG.trace(aclStatus.toString());
> aclEntries = aclStatus.getEntries();
> removeBaseAclEntries(aclEntries);
> //the ACL api's also expect the tradition user/group/other permission 
> in the form of ACL
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.USER, 
> sourcePerm.getUserAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.GROUP, 
> sourcePerm.getGroupAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.OTHER, 
> sourcePerm.getOtherAction()));
>   }
> }
> {code}
> We found that DEFAULT extended ACL rules were not being inherited properly by 
> the directory sub-tree, so the above code is incomplete because it 
> effectively drops the DEFAULT rules. The second problem is with the call to 
> {{sourcePerm.getGroupAction()}}, which is incorrect in the case of extended 
> ACLs. When extended ACLs are used the GROUP permission is replaced with the 
> extended ACL mask. So the above code will apply the wrong permissions to the 
> GROUP. Instead the correct GROUP permissions now need to be pulled from the 
> AclEntry as returned by {{getAclStatus().getEntries()}}. See the 
> implementation of the new method {{getDefaultAclEntries}} for details.
> Similar issues exist with th

[jira] [Commented] (HIVE-17468) Shade and package appropriate jackson version for druid storage handler

2017-09-06 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156148#comment-16156148
 ] 

slim bouguerra commented on HIVE-17468:
---

[~jcamachorodriguez] not sure am following your thought, if we exclude all the 
druid stuff then what is left is not what we want is going to be either hive 
jackson ones or from other transitive dependency. 
My take on this is that the lib that are needed and brought by druid it will be 
shaded anyway so no druid-jackson will be on the class path.
Please let me know if that make sens ?

> Shade and package appropriate jackson version for druid storage handler
> ---
>
> Key: HIVE-17468
> URL: https://issues.apache.org/jira/browse/HIVE-17468
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-17468.patch, hive-druid-deps.txt
>
>
> Currently we are excluding all the jackson core dependencies coming from 
> druid. This is wrong in my opinion since this will lead to the packaging of 
> unwanted jackson library from other projects.
> As you can see the file hive-druid-deps.txt currently jacskon core is coming 
> from calcite and the version is 2.6.3 which is very different from 2.4.6 used 
> by druid. This patch exclude the unwanted jars and make sure to bring in 
> druid jackson dependency from druid it self.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17470) eliminate potential vector copies when merging ACID deltas in LLAP IO path

2017-09-06 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-17470:
---

Assignee: (was: Sergey Shelukhin)

> eliminate potential vector copies when merging ACID deltas in LLAP IO path
> --
>
> Key: HIVE-17470
> URL: https://issues.apache.org/jira/browse/HIVE-17470
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> See the comments on HIVE-12631. Probably LlapRecordReader should be able to 
> receive VRBs directly; that or ACID reader should be able to operate on 
> either CVB or VRB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17470) eliminate potential vector copies when merging ACID deltas in LLAP IO path

2017-09-06 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-17470:
---

Assignee: Sergey Shelukhin

> eliminate potential vector copies when merging ACID deltas in LLAP IO path
> --
>
> Key: HIVE-17470
> URL: https://issues.apache.org/jira/browse/HIVE-17470
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> See the comments on HIVE-12631. Probably LlapRecordReader should be able to 
> receive VRBs directly; that or ACID reader should be able to operate on 
> either CVB or VRB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-12631) LLAP: support ORC ACID tables

2017-09-06 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12631:

Attachment: HIVE-12631.27.patch

Updated the patch. Not sure why the config was set in 
UpdateDeleteSemanticAnalyzer so I commented that out for now. I looked a bit at 
the CVB-VRB-CVB-VRB conversion, given that handling a selected vector after 
ACID reader requires copying stuff, it doesn't seem ideal. Can be handled in a 
followup. Either a selected vector can be added to CVB and the ACID merger 
thing made operate on both (the code is common between the two), or 
LLAPRecordReader can be enabled to accept VRBs directly.

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, 
> HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, 
> HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, 
> HIVE-12631.17.patch, HIVE-12631.18.patch, HIVE-12631.19.patch, 
> HIVE-12631.1.patch, HIVE-12631.20.patch, HIVE-12631.21.patch, 
> HIVE-12631.22.patch, HIVE-12631.23.patch, HIVE-12631.24.patch, 
> HIVE-12631.25.patch, HIVE-12631.26.patch, HIVE-12631.27.patch, 
> HIVE-12631.2.patch, HIVE-12631.3.patch, HIVE-12631.4.patch, 
> HIVE-12631.5.patch, HIVE-12631.6.patch, HIVE-12631.7.patch, 
> HIVE-12631.8.patch, HIVE-12631.8.patch, HIVE-12631.9.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HIVE-17469) The HiveMetaStoreClient should randomize the connection to HMS HA

2017-09-06 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-17469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña resolved HIVE-17469.

Resolution: Invalid

> The HiveMetaStoreClient should randomize the connection to HMS HA
> -
>
> Key: HIVE-17469
> URL: https://issues.apache.org/jira/browse/HIVE-17469
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>
> In an environment with multiple HMS servers, the HiveMetaStoreClient class 
> selects the 1st URI to connect on every open() connection. We should 
> randomize that connection to help balancing the HMS servers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17469) The HiveMetaStoreClient should randomize the connection to HMS HA

2017-09-06 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-17469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156115#comment-16156115
 ] 

Sergio Peña commented on HIVE-17469:


Ah, thanks, I was looking to an old code.

> The HiveMetaStoreClient should randomize the connection to HMS HA
> -
>
> Key: HIVE-17469
> URL: https://issues.apache.org/jira/browse/HIVE-17469
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>
> In an environment with multiple HMS servers, the HiveMetaStoreClient class 
> selects the 1st URI to connect on every open() connection. We should 
> randomize that connection to help balancing the HMS servers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17469) The HiveMetaStoreClient should randomize the connection to HMS HA

2017-09-06 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156112#comment-16156112
 ] 

Vihang Karajgaonkar commented on HIVE-17469:


[~spena] isn't it already doing it here? 
https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java#L207

> The HiveMetaStoreClient should randomize the connection to HMS HA
> -
>
> Key: HIVE-17469
> URL: https://issues.apache.org/jira/browse/HIVE-17469
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>
> In an environment with multiple HMS servers, the HiveMetaStoreClient class 
> selects the 1st URI to connect on every open() connection. We should 
> randomize that connection to help balancing the HMS servers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17468) Shade and package appropriate jackson version for druid storage handler

2017-09-06 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156105#comment-16156105
 ] 

Jesus Camacho Rodriguez commented on HIVE-17468:


[~bslim], for this fix wouldn't it be enough to exclude those coming from 
{{calcite-druid}}? Those three jackson dependencies that we exclude, we are 
including them into the uber jar via shading/renaming. As far as I remember, 
otherwise they might create conflicts with those used by Hive.

> Shade and package appropriate jackson version for druid storage handler
> ---
>
> Key: HIVE-17468
> URL: https://issues.apache.org/jira/browse/HIVE-17468
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-17468.patch, hive-druid-deps.txt
>
>
> Currently we are excluding all the jackson core dependencies coming from 
> druid. This is wrong in my opinion since this will lead to the packaging of 
> unwanted jackson library from other projects.
> As you can see the file hive-druid-deps.txt currently jacskon core is coming 
> from calcite and the version is 2.6.3 which is very different from 2.4.6 used 
> by druid. This patch exclude the unwanted jars and make sure to bring in 
> druid jackson dependency from druid it self.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17456) Set current database for external LLAP interface

2017-09-06 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17456:
--
Status: Patch Available  (was: Open)

> Set current database for external LLAP interface
> 
>
> Key: HIVE-17456
> URL: https://issues.apache.org/jira/browse/HIVE-17456
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17456.1.patch, HIVE-17456.2.patch
>
>
> Currently the query passed in to external LLAP client has the default DB as 
> the current database.
> Allow user to specify a different current database.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17455) External LLAP client: connection to HS2 should be kept open until explicitly closed

2017-09-06 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17455:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> External LLAP client: connection to HS2 should be kept open until explicitly 
> closed
> ---
>
> Key: HIVE-17455
> URL: https://issues.apache.org/jira/browse/HIVE-17455
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 3.0.0
>
> Attachments: HIVE-17455.1.patch, HIVE-17455.2.patch, 
> HIVE-17455.3.patch
>
>
> In the case that a complex query (aggregation/join) is passed to external 
> LLAP client, the query result is first saved as a Hive temp table before 
> being read by LLAP to client. Currently the HS2 connection used to fetch the 
> LLAP splits is closed right after the splits are fetched, which means the 
> temp table is gone by the time LLAP tries to read it.
> Try to keep the connection open so that the table is still around when LLAP 
> tries to read it. Add close methods which can be used to close the connection 
> when the client is done with the query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17455) External LLAP client: connection to HS2 should be kept open until explicitly closed

2017-09-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156090#comment-16156090
 ] 

Hive QA commented on HIVE-17455:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885642/HIVE-17455.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11028 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking1 
(batchId=282)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6697/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6697/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6697/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885642 - PreCommit-HIVE-Build

> External LLAP client: connection to HS2 should be kept open until explicitly 
> closed
> ---
>
> Key: HIVE-17455
> URL: https://issues.apache.org/jira/browse/HIVE-17455
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17455.1.patch, HIVE-17455.2.patch, 
> HIVE-17455.3.patch
>
>
> In the case that a complex query (aggregation/join) is passed to external 
> LLAP client, the query result is first saved as a Hive temp table before 
> being read by LLAP to client. Currently the HS2 connection used to fetch the 
> LLAP splits is closed right after the splits are fetched, which means the 
> temp table is gone by the time LLAP tries to read it.
> Try to keep the connection open so that the table is still around when LLAP 
> tries to read it. Add close methods which can be used to close the connection 
> when the client is done with the query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17468) Shade and package appropriate jackson version for druid storage handler

2017-09-06 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17468:
--
Attachment: HIVE-17468.patch

> Shade and package appropriate jackson version for druid storage handler
> ---
>
> Key: HIVE-17468
> URL: https://issues.apache.org/jira/browse/HIVE-17468
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-17468.patch, hive-druid-deps.txt
>
>
> Currently we are excluding all the jackson core dependencies coming from 
> druid. This is wrong in my opinion since this will lead to the packaging of 
> unwanted jackson library from other projects.
> As you can see the file hive-druid-deps.txt currently jacskon core is coming 
> from calcite and the version is 2.6.3 which is very different from 2.4.6 used 
> by druid. This patch exclude the unwanted jars and make sure to bring in 
> druid jackson dependency from druid it self.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17468) Shade and package appropriate jackson version for druid storage handler

2017-09-06 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17468:
--
Status: Patch Available  (was: Open)

> Shade and package appropriate jackson version for druid storage handler
> ---
>
> Key: HIVE-17468
> URL: https://issues.apache.org/jira/browse/HIVE-17468
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: hive-druid-deps.txt
>
>
> Currently we are excluding all the jackson core dependencies coming from 
> druid. This is wrong in my opinion since this will lead to the packaging of 
> unwanted jackson library from other projects.
> As you can see the file hive-druid-deps.txt currently jacskon core is coming 
> from calcite and the version is 2.6.3 which is very different from 2.4.6 used 
> by druid. This patch exclude the unwanted jars and make sure to bring in 
> druid jackson dependency from druid it self.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17468) Shade and package appropriate jackson version for druid storage handler

2017-09-06 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-17468:
-

Assignee: Jesus Camacho Rodriguez

> Shade and package appropriate jackson version for druid storage handler
> ---
>
> Key: HIVE-17468
> URL: https://issues.apache.org/jira/browse/HIVE-17468
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: hive-druid-deps.txt
>
>
> Currently we are excluding all the jackson core dependencies coming from 
> druid. This is wrong in my opinion since this will lead to the packaging of 
> unwanted jackson library from other projects.
> As you can see the file hive-druid-deps.txt currently jacskon core is coming 
> from calcite and the version is 2.6.3 which is very different from 2.4.6 used 
> by druid. This patch exclude the unwanted jars and make sure to bring in 
> druid jackson dependency from druid it self.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17468) Shade and package appropriate jackson version for druid storage handler

2017-09-06 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17468:
--
Attachment: hive-druid-deps.txt

> Shade and package appropriate jackson version for druid storage handler
> ---
>
> Key: HIVE-17468
> URL: https://issues.apache.org/jira/browse/HIVE-17468
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: hive-druid-deps.txt
>
>
> Currently we are excluding all the jackson core dependencies coming from 
> druid. This is wrong in my opinion since this will lead to the packaging of 
> unwanted jackson library from other projects.
> As you can see the file hive-druid-deps.txt currently jacskon core is coming 
> from calcite and the version is 2.6.3 which is very different from 2.4.6 used 
> by druid. This patch exclude the unwanted jars and make sure to bring in 
> druid jackson dependency from druid it self.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17459) View deletion operation failed to replicate on target cluster

2017-09-06 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156059#comment-16156059
 ] 

Thejas M Nair commented on HIVE-17459:
--

[~taoli-hwx]
Can you also please add a unit test ?


> View deletion operation failed to replicate on target cluster
> -
>
> Key: HIVE-17459
> URL: https://issues.apache.org/jira/browse/HIVE-17459
> Project: Hive
>  Issue Type: Bug
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17459.1.patch
>
>
> View dropping is not replicated during incremental repl.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17431) change configuration handling in TezSessionState

2017-09-06 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156046#comment-16156046
 ] 

Siddharth Seth commented on HIVE-17431:
---

{code}
refreshLocalResourcesFromConf(conf);
{code}
in openInternal seems to be a potential problem area. Either it is missing LRs 
for the new session, or this code should not exist anymore.

For the most part, I suspect some of the other parameters in this class can be 
made final as well.

Unrelated to the patch:
- There's places where the queue apparently gets changed from TezSessionPool. 
Didn't know a single SessionState could be moved across queues. Seems 
unnecessary.
- replaceSession - maybe simpler to move the implementation into 
TezSessionState itself. e.g. additionLocalResourcesNotFromConf is fetched and 
then passed back in to the open method...

> change configuration handling in TezSessionState
> 
>
> Key: HIVE-17431
> URL: https://issues.apache.org/jira/browse/HIVE-17431
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17431.patch
>
>
> The configuration is only set when opening the session; that seems 
> unnecessary - it could be set in the ctor and made final. E.g. when updating 
> the session and localizing new resources we may theoretically open the 
> session with a new config, but we don't update the config and only update the 
> files if the session is already open, which seems to imply that it's ok to 
> not update the config. 
> In most cases, the session is opened only once or reopened without intending 
> to change the config (e.g. if it times out).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17455) External LLAP client: connection to HS2 should be kept open until explicitly closed

2017-09-06 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17455:
--
Attachment: HIVE-17455.3.patch

adding some comments

> External LLAP client: connection to HS2 should be kept open until explicitly 
> closed
> ---
>
> Key: HIVE-17455
> URL: https://issues.apache.org/jira/browse/HIVE-17455
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17455.1.patch, HIVE-17455.2.patch, 
> HIVE-17455.3.patch
>
>
> In the case that a complex query (aggregation/join) is passed to external 
> LLAP client, the query result is first saved as a Hive temp table before 
> being read by LLAP to client. Currently the HS2 connection used to fetch the 
> LLAP splits is closed right after the splits are fetched, which means the 
> temp table is gone by the time LLAP tries to read it.
> Try to keep the connection open so that the table is still around when LLAP 
> tries to read it. Add close methods which can be used to close the connection 
> when the client is done with the query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17429) Hive JDBC doesn't return rows when querying Impala

2017-09-06 Thread Zach Amsden (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156030#comment-16156030
 ] 

Zach Amsden commented on HIVE-17429:


org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] looks like 
a difference in test output caused by a WARNING line output in different order.

Running org.apache.hadoop.hive.cli.TestAccumuloCliDriver definitely looks like 
a timeout, this is the last output from Maven:
{noformat}
[INFO] 
[INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @ 
hive-it-qfile-accumulo ---
[INFO] Surefire report directory: 
/home/hiveptest/35.193.110.99-hiveptest-0/apache-github-source-source/itests/qtest-accumulo/target/surefire-reports

---
 T E S T S
---
Running org.apache.hadoop.hive.cli.TestAccumuloCliDriver
{noformat}


> Hive JDBC doesn't return rows when querying Impala
> --
>
> Key: HIVE-17429
> URL: https://issues.apache.org/jira/browse/HIVE-17429
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.1.0
>Reporter: Zach Amsden
>Assignee: Zach Amsden
> Fix For: 2.1.0
>
> Attachments: HIVE-17429.1.patch, HIVE-17429.2.patch
>
>
> The Hive JDBC driver used to return a result set when querying Impala.  Now, 
> instead, it gets data back but interprets the data as query logs instead of a 
> resultSet.  This causes many issues (we see complaints about beeline as well 
> as test failures).
> This appears to be a regression introduced with asynchronous operation 
> against Hive.
> Ideally, we could make both behaviors work.  I have a simple patch that 
> should fix the problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17464) Fix to be able to disable max shuffle size DHJ config

2017-09-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156005#comment-16156005
 ] 

Hive QA commented on HIVE-17464:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885637/HIVE-17464.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11027 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=234)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testWriteSetTracking6 
(batchId=282)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6696/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6696/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6696/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885637 - PreCommit-HIVE-Build

> Fix to be able to disable max shuffle size DHJ config
> -
>
> Key: HIVE-17464
> URL: https://issues.apache.org/jira/browse/HIVE-17464
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17464.patch
>
>
> Setting {{hive.auto.convert.join.shuffle.max.size}} to -1 does not work as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17467) HCatClient APIs for discovering partition key-values

2017-09-06 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17467:

Attachment: HIVE-17467.1.patch

> HCatClient APIs for discovering partition key-values
> 
>
> Key: HIVE-17467
> URL: https://issues.apache.org/jira/browse/HIVE-17467
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17467.1.patch
>
>
> This is a followup to HIVE-17466, which adds the {{HiveMetaStore}} level call 
> to retrieve unique combinations of part-key values that satisfy a specified 
> predicate.
> Attached herewith are the {{HCatClient}} APIs that will be used by Apache 
> Oozie, before launching workflows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17467) HCatClient APIs for discovering partition key-values

2017-09-06 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17467:
---


> HCatClient APIs for discovering partition key-values
> 
>
> Key: HIVE-17467
> URL: https://issues.apache.org/jira/browse/HIVE-17467
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Metastore
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> This is a followup to HIVE-17466, which adds the {{HiveMetaStore}} level call 
> to retrieve unique combinations of part-key values that satisfy a specified 
> predicate.
> Attached herewith are the {{HCatClient}} APIs that will be used by Apache 
> Oozie, before launching workflows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17387) implement Tez AM registry in Hive

2017-09-06 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17387:

Attachment: HIVE-17387.01.patch

> implement Tez AM registry in Hive
> -
>
> Key: HIVE-17387
> URL: https://issues.apache.org/jira/browse/HIVE-17387
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17387.01.patch, HIVE-17387.patch
>
>
> Necessary for HS2 HA, to transfer AMs between HS2s, etc.
> Helpful for workload management.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-06 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17466:

Status: Patch Available  (was: Open)

> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-06 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17466:

Attachment: HIVE-17466.1.patch

> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-17466.1.patch
>
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17466) Metastore API to list unique partition-key-value combinations

2017-09-06 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17466:
---


> Metastore API to list unique partition-key-value combinations
> -
>
> Key: HIVE-17466
> URL: https://issues.apache.org/jira/browse/HIVE-17466
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Thiruvel Thirumoolan
>
> Raising this on behalf of [~thiruvel], who wrote this initially as part of a 
> tangential "data-discovery" system.
> Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch 
> workflows based on the availability of table/partitions. Partitions are 
> currently discovered by listing partitions using (what boils down to) 
> {{HiveMetaStoreClient.listPartitions()}}. This can be slow and cumbersome, 
> given that {{Partition}} objects are heavyweight and carry redundant 
> information. The alternative is to use partition-names, which will need 
> client-side parsing to extract part-key values.
> When checking which hourly partitions for a particular day have been 
> published already, it would be preferable to have an API that pushed down 
> part-key extraction into the {{RawStore}} layer, and returned key-values as 
> the result. This would be similar to how {{SELECT DISTINCT part_key FROM 
> my_table;}} would run, but at the {{HiveMetaStoreClient}} level.
> Here's what we've been using at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17459) View deletion operation failed to replicate on target cluster

2017-09-06 Thread Tao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17459:
--
Status: Patch Available  (was: Open)

> View deletion operation failed to replicate on target cluster
> -
>
> Key: HIVE-17459
> URL: https://issues.apache.org/jira/browse/HIVE-17459
> Project: Hive
>  Issue Type: Bug
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17459.1.patch
>
>
> View dropping is not replicated during incremental repl.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17459) View deletion operation failed to replicate on target cluster

2017-09-06 Thread Tao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17459:
--
Attachment: HIVE-17459.1.patch

> View deletion operation failed to replicate on target cluster
> -
>
> Key: HIVE-17459
> URL: https://issues.apache.org/jira/browse/HIVE-17459
> Project: Hive
>  Issue Type: Bug
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17459.1.patch
>
>
> View dropping is not replicated during incremental repl.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17455) External LLAP client: connection to HS2 should be kept open until explicitly closed

2017-09-06 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155951#comment-16155951
 ] 

Sergey Shelukhin commented on HIVE-17455:
-

+1. Looks like there's no better way that doesn't involve a lot of work and 
network calls to know when all the splits are done.

> External LLAP client: connection to HS2 should be kept open until explicitly 
> closed
> ---
>
> Key: HIVE-17455
> URL: https://issues.apache.org/jira/browse/HIVE-17455
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17455.1.patch, HIVE-17455.2.patch
>
>
> In the case that a complex query (aggregation/join) is passed to external 
> LLAP client, the query result is first saved as a Hive temp table before 
> being read by LLAP to client. Currently the HS2 connection used to fetch the 
> LLAP splits is closed right after the splits are fetched, which means the 
> temp table is gone by the time LLAP tries to read it.
> Try to keep the connection open so that the table is still around when LLAP 
> tries to read it. Add close methods which can be used to close the connection 
> when the client is done with the query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17460) `insert overwrite` should support table schema evolution (e.g. add columns)

2017-09-06 Thread Chaozhong Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155930#comment-16155930
 ] 

Chaozhong Yang commented on HIVE-17460:
---

For autoColumnStats_5.q, the difference between my results and original q.out 
is:

< cint  
< dstring 

After  running `alter table partitioned1 add columns(c int, d string)` and 
`desc formatted partitioned1 partition(part=1)`, the right result should 
contains `c` and `d`.

Maybe I should re-generate those q.out files which contains wrong results ? 
[~wzheng]


> `insert overwrite` should support table schema evolution (e.g. add columns)
> ---
>
> Key: HIVE-17460
> URL: https://issues.apache.org/jira/browse/HIVE-17460
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17460.2.patch, HIVE-17460.patch
>
>
> In Hive, adding columns into original table is a common use case. However, if 
> we insert overwrite older partitions after adding columns, added columns will 
> not be accessed.
> ```
> create table src_table(
> i int
> )
> PARTITIONED BY (`date` string);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3);
> select * from src_table where `date` = '20170905';
> alter table src_table add columns (bi bigint);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3, 5);
> select * from src_table where `date` = '20170905';
> ```
> The result will be as follows:
> ```
> 3, NULL, '20170905'
> ```
> Obviously, it doesn't meet our expectation. The expected result should be:
> ```
> 3, 5, '20170905'
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17393) AMReporter need hearbeat every external 'AM'

2017-09-06 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17393:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the patch!

> AMReporter need hearbeat every external 'AM'
> 
>
> Key: HIVE-17393
> URL: https://issues.apache.org/jira/browse/HIVE-17393
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17393.1.patch, HIVE-17393.2.patch, 
> HIVE-17393.3.patch
>
>
> AMReporter only remember first AM that submit the query and heartbeat to it. 
> In case of external client, there might be multiple 'AM's and every of them 
> need node heartbeat.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17450) rename TestTxnCommandsBase

2017-09-06 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17450:
--
Attachment: HIVE-17450.02.patch

Errors should not be related, but running the tests again anyway

> rename TestTxnCommandsBase 
> ---
>
> Key: HIVE-17450
> URL: https://issues.apache.org/jira/browse/HIVE-17450
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Peter Vary
> Attachments: HIVE-17450.02.patch, HIVE-17450.patch
>
>
> TestTxnCommandsBase is an abstract class, added in HIVE-17205; it matches the 
> maven test pattern...because of that there is a failining test in every test 
> output



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17450) rename TestTxnCommandsBase

2017-09-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155903#comment-16155903
 ] 

Hive QA commented on HIVE-17450:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885582/HIVE-17450.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11027 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6695/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6695/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6695/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885582 - PreCommit-HIVE-Build

> rename TestTxnCommandsBase 
> ---
>
> Key: HIVE-17450
> URL: https://issues.apache.org/jira/browse/HIVE-17450
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Peter Vary
> Attachments: HIVE-17450.patch
>
>
> TestTxnCommandsBase is an abstract class, added in HIVE-17205; it matches the 
> maven test pattern...because of that there is a failining test in every test 
> output



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively

2017-09-06 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17465:
---
Component/s: Statistics
 Physical Optimizer

> Statistics: Drill-down filters don't reduce row-counts progressively
> 
>
> Key: HIVE-17465
> URL: https://issues.apache.org/jira/browse/HIVE-17465
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Gopal V
>
> {code}
> explain select count(d_date_sk) from date_dim where d_year=2001 ;
> explain select count(d_date_sk) from date_dim where d_year=2001  and d_moy = 
> 9;
> explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9 
> and d_dom = 21;
> {code}
> All 3 queries end up with the same row-count estimates after the filter.
> {code}
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_year = 2001) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_year = 2001) (type: boolean)
> Statistics: Num rows: 363 Data size: 4356 Basic stats: 
> COMPLETE Column stats: COMPLETE
>  
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
> Statistics: Num rows: 363 Data size: 5808 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
> Statistics: Num rows: 363 Data size: 7260 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17465) Statistics: Drill-down filters don't reduce row-counts progressively

2017-09-06 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-17465:
--

Assignee: Vineet Garg

> Statistics: Drill-down filters don't reduce row-counts progressively
> 
>
> Key: HIVE-17465
> URL: https://issues.apache.org/jira/browse/HIVE-17465
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Gopal V
>Assignee: Vineet Garg
>
> {code}
> explain select count(d_date_sk) from date_dim where d_year=2001 ;
> explain select count(d_date_sk) from date_dim where d_year=2001  and d_moy = 
> 9;
> explain select count(d_date_sk) from date_dim where d_year=2001 and d_moy = 9 
> and d_dom = 21;
> {code}
> All 3 queries end up with the same row-count estimates after the filter.
> {code}
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: (d_year = 2001) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (d_year = 2001) (type: boolean)
> Statistics: Num rows: 363 Data size: 4356 Basic stats: 
> COMPLETE Column stats: COMPLETE
>  
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9)) (type: 
> boolean)
> Statistics: Num rows: 363 Data size: 5808 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: date_dim
>   filterExpr: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
>   Statistics: Num rows: 73049 Data size: 82034027 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 9) and (d_dom = 
> 21)) (type: boolean)
> Statistics: Num rows: 363 Data size: 7260 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-17460) `insert overwrite` should support table schema evolution (e.g. add columns)

2017-09-06 Thread Chaozhong Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155867#comment-16155867
 ] 

Chaozhong Yang edited comment on HIVE-17460 at 9/6/17 6:41 PM:
---

[~wei.zheng] Yes, I have moved those code into alterPartitionSpecInMemory and 
submit patch again.  


was (Author: debugger87):
[~wei.zheng] Yes, I have move those code into alterPartitionSpecInMemory. 

> `insert overwrite` should support table schema evolution (e.g. add columns)
> ---
>
> Key: HIVE-17460
> URL: https://issues.apache.org/jira/browse/HIVE-17460
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17460.2.patch, HIVE-17460.patch
>
>
> In Hive, adding columns into original table is a common use case. However, if 
> we insert overwrite older partitions after adding columns, added columns will 
> not be accessed.
> ```
> create table src_table(
> i int
> )
> PARTITIONED BY (`date` string);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3);
> select * from src_table where `date` = '20170905';
> alter table src_table add columns (bi bigint);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3, 5);
> select * from src_table where `date` = '20170905';
> ```
> The result will be as follows:
> ```
> 3, NULL, '20170905'
> ```
> Obviously, it doesn't meet our expectation. The expected result should be:
> ```
> 3, 5, '20170905'
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17460) `insert overwrite` should support table schema evolution (e.g. add columns)

2017-09-06 Thread Chaozhong Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155867#comment-16155867
 ] 

Chaozhong Yang commented on HIVE-17460:
---

[~wei.zheng] Yes, I have move those code into alterPartitionSpecInMemory. 

> `insert overwrite` should support table schema evolution (e.g. add columns)
> ---
>
> Key: HIVE-17460
> URL: https://issues.apache.org/jira/browse/HIVE-17460
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17460.2.patch, HIVE-17460.patch
>
>
> In Hive, adding columns into original table is a common use case. However, if 
> we insert overwrite older partitions after adding columns, added columns will 
> not be accessed.
> ```
> create table src_table(
> i int
> )
> PARTITIONED BY (`date` string);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3);
> select * from src_table where `date` = '20170905';
> alter table src_table add columns (bi bigint);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3, 5);
> select * from src_table where `date` = '20170905';
> ```
> The result will be as follows:
> ```
> 3, NULL, '20170905'
> ```
> Obviously, it doesn't meet our expectation. The expected result should be:
> ```
> 3, 5, '20170905'
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17456) Set current database for external LLAP interface

2017-09-06 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155864#comment-16155864
 ] 

Sergey Shelukhin commented on HIVE-17456:
-

+1 pending tests

> Set current database for external LLAP interface
> 
>
> Key: HIVE-17456
> URL: https://issues.apache.org/jira/browse/HIVE-17456
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17456.1.patch, HIVE-17456.2.patch
>
>
> Currently the query passed in to external LLAP client has the default DB as 
> the current database.
> Allow user to specify a different current database.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-17456) Set current database for external LLAP interface

2017-09-06 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155864#comment-16155864
 ] 

Sergey Shelukhin edited comment on HIVE-17456 at 9/6/17 6:40 PM:
-

+1 pending HiveQA


was (Author: sershe):
+1 pending tests

> Set current database for external LLAP interface
> 
>
> Key: HIVE-17456
> URL: https://issues.apache.org/jira/browse/HIVE-17456
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17456.1.patch, HIVE-17456.2.patch
>
>
> Currently the query passed in to external LLAP client has the default DB as 
> the current database.
> Allow user to specify a different current database.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17455) External LLAP client: connection to HS2 should be kept open until explicitly closed

2017-09-06 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155861#comment-16155861
 ] 

Sergey Shelukhin commented on HIVE-17455:
-

Hm... this seems very error prone, esp. if users don't even set the handle in 
the jobconf.
Perhaps some method (init of some sort? or getSplits itself, since it's called 
explicitly) should return a closeable encapsulating the handle, so that it 
could be closed via normal means?

> External LLAP client: connection to HS2 should be kept open until explicitly 
> closed
> ---
>
> Key: HIVE-17455
> URL: https://issues.apache.org/jira/browse/HIVE-17455
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17455.1.patch, HIVE-17455.2.patch
>
>
> In the case that a complex query (aggregation/join) is passed to external 
> LLAP client, the query result is first saved as a Hive temp table before 
> being read by LLAP to client. Currently the HS2 connection used to fetch the 
> LLAP splits is closed right after the splits are fetched, which means the 
> temp table is gone by the time LLAP tries to read it.
> Try to keep the connection open so that the table is still around when LLAP 
> tries to read it. Add close methods which can be used to close the connection 
> when the client is done with the query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17456) Set current database for external LLAP interface

2017-09-06 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17456:
--
Attachment: HIVE-17456.2.patch

Updated tests

> Set current database for external LLAP interface
> 
>
> Key: HIVE-17456
> URL: https://issues.apache.org/jira/browse/HIVE-17456
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17456.1.patch, HIVE-17456.2.patch
>
>
> Currently the query passed in to external LLAP client has the default DB as 
> the current database.
> Allow user to specify a different current database.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17460) `insert overwrite` should support table schema evolution (e.g. add columns)

2017-09-06 Thread Chaozhong Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaozhong Yang updated HIVE-17460:
--
Attachment: HIVE-17460.2.patch

> `insert overwrite` should support table schema evolution (e.g. add columns)
> ---
>
> Key: HIVE-17460
> URL: https://issues.apache.org/jira/browse/HIVE-17460
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17460.2.patch, HIVE-17460.patch
>
>
> In Hive, adding columns into original table is a common use case. However, if 
> we insert overwrite older partitions after adding columns, added columns will 
> not be accessed.
> ```
> create table src_table(
> i int
> )
> PARTITIONED BY (`date` string);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3);
> select * from src_table where `date` = '20170905';
> alter table src_table add columns (bi bigint);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3, 5);
> select * from src_table where `date` = '20170905';
> ```
> The result will be as follows:
> ```
> 3, NULL, '20170905'
> ```
> Obviously, it doesn't meet our expectation. The expected result should be:
> ```
> 3, 5, '20170905'
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17460) `insert overwrite` should support table schema evolution (e.g. add columns)

2017-09-06 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155845#comment-16155845
 ] 

Wei Zheng commented on HIVE-17460:
--

Some existing q.out files are wrong, but I noticed some other failures, e.g. 
autoColumnStats_5.q. I suggest you try moving the fix into 
alterPartitionSpecInMemory, under the "if (inheritTableSpecs)" block and have 
another test run.

> `insert overwrite` should support table schema evolution (e.g. add columns)
> ---
>
> Key: HIVE-17460
> URL: https://issues.apache.org/jira/browse/HIVE-17460
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17460.2.patch, HIVE-17460.patch
>
>
> In Hive, adding columns into original table is a common use case. However, if 
> we insert overwrite older partitions after adding columns, added columns will 
> not be accessed.
> ```
> create table src_table(
> i int
> )
> PARTITIONED BY (`date` string);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3);
> select * from src_table where `date` = '20170905';
> alter table src_table add columns (bi bigint);
> insert overwrite table src_table partition(`date`='20170905') valu
> es (3, 5);
> select * from src_table where `date` = '20170905';
> ```
> The result will be as follows:
> ```
> 3, NULL, '20170905'
> ```
> Obviously, it doesn't meet our expectation. The expected result should be:
> ```
> 3, 5, '20170905'
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17455) External LLAP client: connection to HS2 should be kept open until explicitly closed

2017-09-06 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17455:
--
Attachment: HIVE-17455.2.patch

updated patch to make map final.

> External LLAP client: connection to HS2 should be kept open until explicitly 
> closed
> ---
>
> Key: HIVE-17455
> URL: https://issues.apache.org/jira/browse/HIVE-17455
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17455.1.patch, HIVE-17455.2.patch
>
>
> In the case that a complex query (aggregation/join) is passed to external 
> LLAP client, the query result is first saved as a Hive temp table before 
> being read by LLAP to client. Currently the HS2 connection used to fetch the 
> LLAP splits is closed right after the splits are fetched, which means the 
> temp table is gone by the time LLAP tries to read it.
> Try to keep the connection open so that the table is still around when LLAP 
> tries to read it. Add close methods which can be used to close the connection 
> when the client is done with the query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17464) Fix to be able to disable max shuffle size DHJ config

2017-09-06 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17464:
---
Attachment: HIVE-17464.patch

> Fix to be able to disable max shuffle size DHJ config
> -
>
> Key: HIVE-17464
> URL: https://issues.apache.org/jira/browse/HIVE-17464
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17464.patch
>
>
> Setting {{hive.auto.convert.join.shuffle.max.size}} to -1 does not work as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Work started] (HIVE-17464) Fix to be able to disable max shuffle size DHJ config

2017-09-06 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17464 started by Jesus Camacho Rodriguez.
--
> Fix to be able to disable max shuffle size DHJ config
> -
>
> Key: HIVE-17464
> URL: https://issues.apache.org/jira/browse/HIVE-17464
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Setting {{hive.auto.convert.join.shuffle.max.size}} to -1 does not work as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17464) Fix to be able to disable max shuffle size DHJ config

2017-09-06 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17464:
---
Status: Patch Available  (was: In Progress)

> Fix to be able to disable max shuffle size DHJ config
> -
>
> Key: HIVE-17464
> URL: https://issues.apache.org/jira/browse/HIVE-17464
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Setting {{hive.auto.convert.join.shuffle.max.size}} to -1 does not work as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17463) ORC: include orc-shims in hive-exec.jar

2017-09-06 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155807#comment-16155807
 ] 

Owen O'Malley commented on HIVE-17463:
--

This is part of upgrading Hive trunk to use the upcoming ORC 1.5.0 release.

> ORC: include orc-shims in hive-exec.jar
> ---
>
> Key: HIVE-17463
> URL: https://issues.apache.org/jira/browse/HIVE-17463
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-17463.1.patch
>
>
> ORC-234 added a new shims module - this needs to be part of hive-exec shading 
> to use ORC-1.5.x branch in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

1 2 >

1 - 100 of 151 matches

Mail list logo