[jira] [Comment Edited] (HIVE-17193) HoS: don't combine map works that are targets of different DPPs

2017-10-22 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214737#comment-16214737
 ] 

Rui Li edited comment on HIVE-17193 at 10/23/17 6:57 AM:
-

Hi [~kellyzly],
bq. how to compare the result of dpp work in the period of physical plan?
We can compare the DPP works the same way as we compare other works, i.e. if 
two works have the same operator tree and each operator has an equivalent 
counterpart, then the two works can be combined.


was (Author: lirui):
Hi [~kellyzly],
bq. how to compare the result of dpp work in the period of physical plan?
We can compare the DPP works the same way as we compare other works, i.e. if 
two works have the same operator tree and all the each operator has an 
equivalent counterpart, then the two works can be combined.

> HoS: don't combine map works that are targets of different DPPs
> ---
>
> Key: HIVE-17193
> URL: https://issues.apache.org/jira/browse/HIVE-17193
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
>
> Suppose {{srcpart}} is partitioned by {{ds}}. The following query can trigger 
> the issue:
> {code}
> explain
> select * from
>   (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) 
> a
> join
>   (select srcpart.ds,srcpart.key from srcpart join src on 
> srcpart.ds=src.value) b
> on a.key=b.key;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17193) HoS: don't combine map works that are targets of different DPPs

2017-10-22 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214737#comment-16214737
 ] 

Rui Li commented on HIVE-17193:
---

Hi [~kellyzly],
bq. how to compare the result of dpp work in the period of physical plan?
We can compare the DPP works the same way as we compare other works, i.e. if 
two works have the same operator tree and all the each operator has an 
equivalent counterpart, then the two works can be combined.

> HoS: don't combine map works that are targets of different DPPs
> ---
>
> Key: HIVE-17193
> URL: https://issues.apache.org/jira/browse/HIVE-17193
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
>
> Suppose {{srcpart}} is partitioned by {{ds}}. The following query can trigger 
> the issue:
> {code}
> explain
> select * from
>   (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) 
> a
> join
>   (select srcpart.ds,srcpart.key from srcpart join src on 
> srcpart.ds=src.value) b
> on a.key=b.key;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17830) dbnotification fails to work with rdbms other than postgres

2017-10-22 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214731#comment-16214731
 ] 

Daniel Dai commented on HIVE-17830:
---

Ok so "SET @@session.sql_mode=ANSI_QUOTES" will be required, right? Last time I 
read the code, it seems prepareTxn will be invoked every time we created a new 
ObjectStore. However, I must miss somewhere as otherwise, we will never hit the 
sql syntax error.

> dbnotification fails to work with rdbms other than postgres
> ---
>
> Key: HIVE-17830
> URL: https://issues.apache.org/jira/browse/HIVE-17830
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: anishek
>Assignee: Daniel Dai
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-17830.0.patch, HIVE-17830.1.patch
>
>
> as part of HIVE-17721 we had changed the direct sql to acquire the lock for 
> postgres as
> {code}
> select "NEXT_EVENT_ID" from "NOTIFICATION_SEQUENCE" for update;
> {code}
> however this breaks other databases and we have to use different sql 
> statements for different databases 
> for postgres use
> {code}
> select "NEXT_EVENT_ID" from "NOTIFICATION_SEQUENCE" for update;
> {code}
> for SQLServer 
> {code}
> select "NEXT_EVENT_ID" from "NOTIFICATION_SEQUENCE" with (updlock);
> {code}
> for other databases 
> {code}
> select NEXT_EVENT_ID from NOTIFICATION_SEQUENCE for update;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17193) HoS: don't combine map works that are targets of different DPPs

2017-10-22 Thread liyunzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214728#comment-16214728
 ] 

liyunzhang commented on HIVE-17193:
---

[~lirui]:
{quote}
1. The simplest solution is, if the DPP works' IDs (tracked by the target map 
works) are different, then we consider the target map works are different and 
don't combine them.
2. Another solution is we walk the parent tasks first, and combine equivalent 
DPP works. Two DPP works can be considered equivalent as long as they output 
same records.
{quote}
For #1, it can be implemented from the current code. For #2, how to compare the 
result of dpp work in the period of physical plan?  You mean directly comparing 
the estimated data size(Statistics: Num rows: 58 Data size: 5812)?

{code}
 Map 9 
Map Operator Tree:
TableScan
  alias: src
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: value is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Select Operator
  expressions: value (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Select Operator
expressions: _col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  keys: _col0 (type: string)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Spark Partition Pruning Sink Operator
Target column: ds (string)
partition key expr: ds
Statistics: Num rows: 58 Data size: 5812 Basic 
stats: COMPLETE Column stats: NONE
target work: Map 5
{code}


{code}
  Map 8 
Map Operator Tree:
TableScan
  alias: src
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Select Operator
  expressions: key (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Select Operator
expressions: _col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  keys: _col0 (type: string)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Spark Partition Pruning Sink Operator
Target column: ds (string)
partition key expr: ds
Statistics: Num rows: 58 Data size: 5812 Basic 
stats: COMPLETE Column stats: NONE
target work: Map 1

{code}


> HoS: don't combine map works that are targets of different DPPs
> ---
>
> Key: HIVE-17193
> URL: https://issues.apache.org/jira/browse/HIVE-17193
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
>
> Suppose {{srcpart}} is partitioned by {{ds}}. The following query can trigger 
> the issue:
> {code}
> explain
> select * from
>   (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) 
> a
> join
>   (select srcpart.ds,srcpart.key from srcpart join src on 
> srcpart.ds=src.value) b
> on a.key=b.key;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17830) dbnotification fails to work with rdbms other than postgres

2017-10-22 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214723#comment-16214723
 ] 

anishek commented on HIVE-17830:


Thanks [~daijy] for this patch. A Quick question. 

on looking at the code which sets the ANSI_QUOTE its in 
*MetastoreDirectSql.java*
{code}


public void prepareTxn() throws MetaException {
if (dbType != DatabaseProduct.MYSQL) return;
try {
  assert pm.currentTransaction().isActive(); // must be inside tx together 
with queries
  executeNoResult("SET @@session.sql_mode=ANSI_QUOTES");
} catch (SQLException sqlEx) {
  throw new MetaException("Error setting ansi quotes: " + 
sqlEx.getMessage());
}
  }

{code}


here we are setting the sql_mode only for the *session* and not *global*. I 
just ran the below on a mysql server without modifying the sql_mode

{code}
mysql> select "NEXT_EVENT_ID" from NOTIFICATION_SEQUENCE;
+---+
| NEXT_EVENT_ID |
+---+
| NEXT_EVENT_ID |
+---+
1 row in set (0.00 sec)
{code}

since we use connection pooling depending on which connection is used to 
execute the above statement we will get different results, wont we. May be i am 
missing something here. 

cc [~thejas]


> dbnotification fails to work with rdbms other than postgres
> ---
>
> Key: HIVE-17830
> URL: https://issues.apache.org/jira/browse/HIVE-17830
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: anishek
>Assignee: Daniel Dai
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-17830.0.patch, HIVE-17830.1.patch
>
>
> as part of HIVE-17721 we had changed the direct sql to acquire the lock for 
> postgres as
> {code}
> select "NEXT_EVENT_ID" from "NOTIFICATION_SEQUENCE" for update;
> {code}
> however this breaks other databases and we have to use different sql 
> statements for different databases 
> for postgres use
> {code}
> select "NEXT_EVENT_ID" from "NOTIFICATION_SEQUENCE" for update;
> {code}
> for SQLServer 
> {code}
> select "NEXT_EVENT_ID" from "NOTIFICATION_SEQUENCE" with (updlock);
> {code}
> for other databases 
> {code}
> select NEXT_EVENT_ID from NOTIFICATION_SEQUENCE for update;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16198) Vectorize GenericUDFIndex for ARRAY

2017-10-22 Thread Colin Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214720#comment-16214720
 ] 

Colin Ma commented on HIVE-16198:
-

hi, [~teddy.choi], [~mmccline], because of the problem HIVE-17133, I rebased 
the patch based on HIVE-2.3.0 with some minor changes. To evaluate the 
performance improvement, the following table is used:
{code}
hive> describe temperature_orc_5g;
   t_date  string   
 
   citystring   
 
   temperaturesarray
hive> show tblproperties temperature_orc_5g;
   COLUMN_STATS_ACCURATE   {"BASIC_STATS":"true"}
   numFiles   20
   numRows 1
   rawDataSize   241
   totalSize   1793960785
{code}
Tested by HIVE on Spark, with the sql {color:#59afe1}select city, 
avg(temperatures\[0\]), avg(temperatures\[5\]) from temperature_orc_5g where 
temperatures\[2\] > 20 group by city limit 10{color}, the following are the 
result:
|| ||Disable vectorization||Enable vectorization||
|execution time|{color:#d04437}34s{color}|{color:#14892c}26s{color}|
Specifically, the detail time cost for the same task which will process 
15154763 rows as follow table:
|| ||Disable vectorization||Enable vectorization||
|Time with RecorderReader|{color:#d04437}8.9s{color}|{color:#14892c}5.9s{color}|
|Time with filter 
operator|{color:#d04437}3.1s{color}|{color:#14892c}0.1s{color}|
|Time with groupBy and followup operators|10.8s|11.5s|
I think the improvement is obviously, do you know why the patch isn't committed 
until now, thanks.

> Vectorize GenericUDFIndex for ARRAY
> ---
>
> Key: HIVE-16198
> URL: https://issues.apache.org/jira/browse/HIVE-16198
> Project: Hive
>  Issue Type: Sub-task
>  Components: UDF, Vectorization
>Reporter: Teddy Choi
>Assignee: Teddy Choi
> Attachments: HIVE-16198.1.patch, HIVE-16198.2.patch, 
> HIVE-16198.3.patch
>
>
> Vectorize GenericUDFIndex for array data type.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17874) Parquet vectorization fails on tables with complex columns when there are no projected columns

2017-10-22 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214719#comment-16214719
 ] 

Ferdinand Xu commented on HIVE-17874:
-

Hi [~vihangk1], can you help check the failed test cases? 

> Parquet vectorization fails on tables with complex columns when there are no 
> projected columns
> --
>
> Key: HIVE-17874
> URL: https://issues.apache.org/jira/browse/HIVE-17874
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17874.01-branch-2.patch, HIVE-17874.01.patch, 
> HIVE-17874.02.patch
>
>
> When a parquet table contains an unsupported type like {{Map}}, {{LIST}} or 
> {{UNION}} simple queries like {{select count(*) from table}} fails with 
> {{unsupported type exception}} even though vectorized reader doesn't really 
> need read the complex type into batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17874) Parquet vectorization fails on tables with complex columns when there are no projected columns

2017-10-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214690#comment-16214690
 ] 

Hive QA commented on HIVE-17874:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12893482/HIVE-17874.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11317 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_parquet_projection]
 (batchId=42)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan]
 (batchId=158)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_parquet_projection]
 (batchId=121)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query39] 
(batchId=243)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=204)
org.apache.hadoop.hive.ql.io.parquet.TestVectorizedColumnReader.testNullSplitForParquetReader
 (batchId=262)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=221)
org.apache.hadoop.hive.ql.parse.authorization.plugin.sqlstd.TestOperation2Privilege.checkHiveOperationTypeMatch
 (batchId=269)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7441/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7441/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7441/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12893482 - PreCommit-HIVE-Build

> Parquet vectorization fails on tables with complex columns when there are no 
> projected columns
> --
>
> Key: HIVE-17874
> URL: https://issues.apache.org/jira/browse/HIVE-17874
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17874.01-branch-2.patch, HIVE-17874.01.patch, 
> HIVE-17874.02.patch
>
>
> When a parquet table contains an unsupported type like {{Map}}, {{LIST}} or 
> {{UNION}} simple queries like {{select count(*) from table}} fails with 
> {{unsupported type exception}} even though vectorized reader doesn't really 
> need read the complex type into batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17874) Parquet vectorization fails on tables with complex columns when there are no projected columns

2017-10-22 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214670#comment-16214670
 ] 

Ferdinand Xu commented on HIVE-17874:
-

LGTM +1 pending on the Precommit.

> Parquet vectorization fails on tables with complex columns when there are no 
> projected columns
> --
>
> Key: HIVE-17874
> URL: https://issues.apache.org/jira/browse/HIVE-17874
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17874.01-branch-2.patch, HIVE-17874.01.patch, 
> HIVE-17874.02.patch
>
>
> When a parquet table contains an unsupported type like {{Map}}, {{LIST}} or 
> {{UNION}} simple queries like {{select count(*) from table}} fails with 
> {{unsupported type exception}} even though vectorized reader doesn't really 
> need read the complex type into batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17193) HoS: don't combine map works that are targets of different DPPs

2017-10-22 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214644#comment-16214644
 ] 

liyunzhang_intel edited comment on HIVE-17193 at 10/23/17 5:24 AM:
---

I can reproduce after disabling cbo
{code}

set hive.explain.user=false;
set hive.spark.dynamic.partition.pruning=true;
set hive.tez.dynamic.partition.pruning=true;
set hive.auto.convert.join=false;
set hive.cbo.enable=false;
explain
select * from
  (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) a
join
  (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.value) 
b
on a.key=b.key;
{code}

the explain
{code}
STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-1 depends on stages: Stage-2
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-2
Spark
  DagName: root_20171023004308_4b3c304e-3deb-4193-846d-12cf9e6a50ab:2
  Vertices:
Map 8 
Map Operator Tree:
TableScan
  alias: src
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Select Operator
  expressions: key (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Spark Partition Pruning Sink Operator
  Target column: ds (string)
  partition key expr: ds
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  target work: Map 1

  Stage: Stage-1
Spark
  Edges:
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 4 (PARTITION-LEVEL 
SORT, 1)
Reducer 3 <- Reducer 2 (PARTITION-LEVEL SORT, 1), Reducer 6 
(PARTITION-LEVEL SORT, 1)
Reducer 6 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 7 (PARTITION-LEVEL 
SORT, 1)
  DagName: root_20171023004308_4b3c304e-3deb-4193-846d-12cf9e6a50ab:1
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: srcpart
  Statistics: Num rows: 232 Data size: 23248 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 232 Data size: 23248 Basic stats: 
COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: ds (type: string)
  sort order: +
  Map-reduce partition columns: ds (type: string)
  Statistics: Num rows: 232 Data size: 23248 Basic stats: 
COMPLETE Column stats: NONE
  value expressions: key (type: string)
Map 4 
Map Operator Tree:
TableScan
  alias: src
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: key (type: string)
  sort order: +
  Map-reduce partition columns: key (type: string)
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Map 7 
Map Operator Tree:
TableScan
  alias: src
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: value is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: value (type: string)
  sort order: +
  Map-reduce partition columns: value (type: string)
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Reducer 2 
Reduce Operator Tree:
  Join Operator
condition map:
   

[jira] [Updated] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark

2017-10-22 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-16948:

Attachment: 17193_compare_RS_in_Map_5_1.PNG

> Invalid explain when running dynamic partition pruning query in Hive On Spark
> -
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: 3.0.0
>
> Attachments: 17193_compare_RS_in_Map_5_1.PNG, HIVE-16948.2.patch, 
> HIVE-16948.5.patch, HIVE-16948.6.patch, HIVE-16948.7.patch, HIVE-16948.patch, 
> HIVE-16948_1.patch
>
>
> in 
> [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107]
>  in spark_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>   Group

[jira] [Commented] (HIVE-17193) HoS: don't combine map works that are targets of different DPPs

2017-10-22 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214644#comment-16214644
 ] 

liyunzhang_intel commented on HIVE-17193:
-

I can reproduce after disabling cbo
{code}

set hive.explain.user=false;
set hive.spark.dynamic.partition.pruning=true;
set hive.tez.dynamic.partition.pruning=true;
set hive.auto.convert.join=false;
set hive.cbo.enable=false;
explain
select * from
  (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) a
join
  (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.value) 
b
on a.key=b.key;
{code}

the explain
{code}
STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-1 depends on stages: Stage-2
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-2
Spark
  DagName: root_20171023004308_4b3c304e-3deb-4193-846d-12cf9e6a50ab:2
  Vertices:
Map 8 
Map Operator Tree:
TableScan
  alias: src
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Select Operator
  expressions: key (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Spark Partition Pruning Sink Operator
  Target column: ds (string)
  partition key expr: ds
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  target work: Map 1

  Stage: Stage-1
Spark
  Edges:
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 4 (PARTITION-LEVEL 
SORT, 1)
Reducer 3 <- Reducer 2 (PARTITION-LEVEL SORT, 1), Reducer 6 
(PARTITION-LEVEL SORT, 1)
Reducer 6 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 7 (PARTITION-LEVEL 
SORT, 1)
  DagName: root_20171023004308_4b3c304e-3deb-4193-846d-12cf9e6a50ab:1
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: srcpart
  Statistics: Num rows: 232 Data size: 23248 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 232 Data size: 23248 Basic stats: 
COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: ds (type: string)
  sort order: +
  Map-reduce partition columns: ds (type: string)
  Statistics: Num rows: 232 Data size: 23248 Basic stats: 
COMPLETE Column stats: NONE
  value expressions: key (type: string)
Map 4 
Map Operator Tree:
TableScan
  alias: src
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: key (type: string)
  sort order: +
  Map-reduce partition columns: key (type: string)
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Map 7 
Map Operator Tree:
TableScan
  alias: src
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: value is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: value (type: string)
  sort order: +
  Map-reduce partition columns: value (type: string)
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Reducer 2 
Reduce Operator Tree:
  Join Operator
condition map:
 Inner Join 0 to 1
   

[jira] [Updated] (HIVE-17874) Parquet vectorization fails on tables with complex columns when there are no projected columns

2017-10-22 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17874:
---
Attachment: HIVE-17874.02.patch

> Parquet vectorization fails on tables with complex columns when there are no 
> projected columns
> --
>
> Key: HIVE-17874
> URL: https://issues.apache.org/jira/browse/HIVE-17874
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17874.01-branch-2.patch, HIVE-17874.01.patch, 
> HIVE-17874.02.patch
>
>
> When a parquet table contains an unsupported type like {{Map}}, {{LIST}} or 
> {{UNION}} simple queries like {{select count(*) from table}} fails with 
> {{unsupported type exception}} even though vectorized reader doesn't really 
> need read the complex type into batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17874) Parquet vectorization fails on tables with complex columns when there are no projected columns

2017-10-22 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214620#comment-16214620
 ] 

Vihang Karajgaonkar commented on HIVE-17874:


Thanks for the review [~Ferd]. I made changes as you suggested. I moved 
{{colsToInclude = ColumnProjectionUtils.getReadColumnIDs(conf);}} in the 
{{initialize}} method because I got rid of unnecessary field 
{{indexColumnsWanted}} and reused colsToInclude instead. I have moved the 
{{rbCtx = Utilities.getVectorizedRowBatchCtx(conf);}} in the initialize method 
as well like you suggested. Also updated the comment and removed unnecessary 
diff. Feel free to let me know if you want me to publish the patch on RB as 
well.

> Parquet vectorization fails on tables with complex columns when there are no 
> projected columns
> --
>
> Key: HIVE-17874
> URL: https://issues.apache.org/jira/browse/HIVE-17874
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17874.01-branch-2.patch, HIVE-17874.01.patch, 
> HIVE-17874.02.patch
>
>
> When a parquet table contains an unsupported type like {{Map}}, {{LIST}} or 
> {{UNION}} simple queries like {{select count(*) from table}} fails with 
> {{unsupported type exception}} even though vectorized reader doesn't really 
> need read the complex type into batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17193) HoS: don't combine map works that are targets of different DPPs

2017-10-22 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214617#comment-16214617
 ] 

Rui Li commented on HIVE-17193:
---

[~kellyzly], the problem is map works for {{srcpart}} (in your case Map1 and 
Map5) are combined, while they shouldn't because they're targets of different 
DPPs and therefore are likely to output different results. I think you can 
disable CBO to see if the issue can be reproduced. Another way is to change the 
outer query into a union instead of a join.

> HoS: don't combine map works that are targets of different DPPs
> ---
>
> Key: HIVE-17193
> URL: https://issues.apache.org/jira/browse/HIVE-17193
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
>
> Suppose {{srcpart}} is partitioned by {{ds}}. The following query can trigger 
> the issue:
> {code}
> explain
> select * from
>   (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) 
> a
> join
>   (select srcpart.ds,srcpart.key from srcpart join src on 
> srcpart.ds=src.value) b
> on a.key=b.key;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17193) HoS: don't combine map works that are targets of different DPPs

2017-10-22 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214603#comment-16214603
 ] 

liyunzhang_intel commented on HIVE-17193:
-

[~lirui]: I remember this problem when i developed HIVE-16948. But I can not 
reproduce this problem on hive(commit a51ae9c) now
{code}
set hive.explain.user=false;
set hive.spark.dynamic.partition.pruning=true;
set hive.tez.dynamic.partition.pruning=true;
set hive.auto.convert.join=false;
explain
select * from
  (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) a
join
  (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.value) 
b
on a.key=b.key;
{code}
the explain 
{code}
STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-1 depends on stages: Stage-2
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-2
Spark
  DagName: root_20171022233200_990c146c-b49f-49b9-9a5b-a0028e34f200:2
  Vertices:
Map 8 
Map Operator Tree:
TableScan
  alias: src
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Select Operator
  expressions: key (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Select Operator
expressions: _col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  keys: _col0 (type: string)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Spark Partition Pruning Sink Operator
Target column: ds (string)
partition key expr: ds
Statistics: Num rows: 58 Data size: 5812 Basic 
stats: COMPLETE Column stats: NONE
target work: Map 1
Map 9 
Map Operator Tree:
TableScan
  alias: src
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: value is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Select Operator
  expressions: value (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Select Operator
expressions: _col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  keys: _col0 (type: string)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
  Spark Partition Pruning Sink Operator
Target column: ds (string)
partition key expr: ds
Statistics: Num rows: 58 Data size: 5812 Basic 
stats: COMPLETE Column stats: NONE
target work: Map 5

  Stage: Stage-1
Spark
  Edges:
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 4 (PARTITION-LEVEL 
SORT, 1)
Reducer 3 <- Reducer 2 (PARTITION-LEVEL SORT, 1), Reducer 6 
(PARTITION-LEVEL SORT, 1)
Reducer 6 <- Map 5 (PARTITION-LEVEL SORT, 1), Map 7 (PARTITION-LEVEL 
SORT, 1)
  DagName: root_20171022233200_990c146c-b49f-49b9-9a5b-a0028e34f200:1
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: srcpart
  Statistics: Num rows: 232 Data size: 23248 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 232 Data size: 23248 Basic stats: 
COMPLETE Column stats: NONE
Select Operator
  expressions: key (type: string), ds (type: string)
 

[jira] [Commented] (HIVE-17259) Hive JDBC does not recognize UNIONTYPE columns

2017-10-22 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214577#comment-16214577
 ] 

Ashutosh Chauhan commented on HIVE-17259:
-

[~pvillard] Patch didn't apply cleanly. You need to rebase the patch and upload 
again.

> Hive JDBC does not recognize UNIONTYPE columns
> --
>
> Key: HIVE-17259
> URL: https://issues.apache.org/jira/browse/HIVE-17259
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, JDBC
> Environment: Hive 1.2.1000.2.6.1.0-129
> Beeline version 1.2.1000.2.6.1.0-129 by Apache Hive
>Reporter: Pierre Villard
>Assignee: Pierre Villard
> Attachments: HIVE-17259.patch
>
>
> Hive JDBC does not recognize UNIONTYPE columns.
> I've an external table backed by an avro schema containing a union type field.
> {noformat}
> "name" : "value",
> "type" : [ "int", "string", "null" ]
> {noformat}
> When describing the table I've:
> {noformat}
> describe test_table;
> +---+---+--+--+
> | col_name  |   data_type 
>   | comment  |
> +---+---+--+--+
> | description   | string  
>   |  |
> | name  | string  
>   |  |
> | value | uniontype  
>   |  |
> +---+---+--+--+
> {noformat}
> When doing a select query over the data using the Hive CLI, it works:
> {noformat}
> hive> select value from test_table;
> OK
> {0:10}
> {0:10}
> {0:9}
> {0:9}
> ...
> {noformat}
> But when using beeline, it fails:
> {noformat}
> 0: jdbc:hive2://> select * from test_table;
> Error: Unrecognized column type: UNIONTYPE (state=,code=0)
> {noformat}
> By applying the patch provided with this JIRA, the command succeeds and 
> return the expected output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17259) Hive JDBC does not recognize UNIONTYPE columns

2017-10-22 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17259:

Status: Open  (was: Patch Available)

> Hive JDBC does not recognize UNIONTYPE columns
> --
>
> Key: HIVE-17259
> URL: https://issues.apache.org/jira/browse/HIVE-17259
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, JDBC
> Environment: Hive 1.2.1000.2.6.1.0-129
> Beeline version 1.2.1000.2.6.1.0-129 by Apache Hive
>Reporter: Pierre Villard
>Assignee: Pierre Villard
> Attachments: HIVE-17259.patch
>
>
> Hive JDBC does not recognize UNIONTYPE columns.
> I've an external table backed by an avro schema containing a union type field.
> {noformat}
> "name" : "value",
> "type" : [ "int", "string", "null" ]
> {noformat}
> When describing the table I've:
> {noformat}
> describe test_table;
> +---+---+--+--+
> | col_name  |   data_type 
>   | comment  |
> +---+---+--+--+
> | description   | string  
>   |  |
> | name  | string  
>   |  |
> | value | uniontype  
>   |  |
> +---+---+--+--+
> {noformat}
> When doing a select query over the data using the Hive CLI, it works:
> {noformat}
> hive> select value from test_table;
> OK
> {0:10}
> {0:10}
> {0:9}
> {0:9}
> ...
> {noformat}
> But when using beeline, it fails:
> {noformat}
> 0: jdbc:hive2://> select * from test_table;
> Error: Unrecognized column type: UNIONTYPE (state=,code=0)
> {noformat}
> By applying the patch provided with this JIRA, the command succeeds and 
> return the expected output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17696) Vectorized reader does not seem to be pushing down projection columns in certain code paths

2017-10-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214566#comment-16214566
 ] 

Hive QA commented on HIVE-17696:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12892984/HIVE-17696.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11315 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_partitioned_date_time]
 (batchId=163)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=204)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=221)
org.apache.hadoop.hive.ql.parse.authorization.plugin.sqlstd.TestOperation2Privilege.checkHiveOperationTypeMatch
 (batchId=269)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighShuffleBytes
 (batchId=228)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7440/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7440/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7440/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12892984 - PreCommit-HIVE-Build

> Vectorized reader does not seem to be pushing down projection columns in 
> certain code paths
> ---
>
> Key: HIVE-17696
> URL: https://issues.apache.org/jira/browse/HIVE-17696
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Ferdinand Xu
> Attachments: HIVE-17696.patch
>
>
> This is the code snippet from {{VectorizedParquetRecordReader.java}}
> {noformat}
> MessageType tableSchema;
> if (indexAccess) {
>   List indexSequence = new ArrayList<>();
>   // Generates a sequence list of indexes
>   for(int i = 0; i < columnNamesList.size(); i++) {
> indexSequence.add(i);
>   }
>   tableSchema = DataWritableReadSupport.getSchemaByIndex(fileSchema, 
> columnNamesList,
> indexSequence);
> } else {
>   tableSchema = DataWritableReadSupport.getSchemaByName(fileSchema, 
> columnNamesList,
> columnTypesList);
> }
> indexColumnsWanted = 
> ColumnProjectionUtils.getReadColumnIDs(configuration);
> if (!ColumnProjectionUtils.isReadAllColumns(configuration) && 
> !indexColumnsWanted.isEmpty()) {
>   requestedSchema =
> DataWritableReadSupport.getSchemaByIndex(tableSchema, 
> columnNamesList, indexColumnsWanted);
> } else {
>   requestedSchema = fileSchema;
> }
> this.reader = new ParquetFileReader(
>   configuration, footer.getFileMetaData(), file, blocks, 
> requestedSchema.getColumns());
> {noformat}
> Couple of things to notice here:
> Most of this code is duplicated from {{DataWritableReadSupport.init()}} 
> method. 
> the else condition passes in fileSchema instead of using tableSchema like we 
> do in DataWritableReadSupport.init() method. Does this cause projection 
> columns to be missed when we read parquet files? We should probably just 
> reuse ReadContext returned from {{DataWritableReadSupport.init()}} method 
> here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17873) External LLAP client: allow same handleID to be used more than once

2017-10-22 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214549#comment-16214549
 ] 

Gunther Hagleitner commented on HIVE-17873:
---

LGTM +1

> External LLAP client: allow same handleID to be used more than once
> ---
>
> Key: HIVE-17873
> URL: https://issues.apache.org/jira/browse/HIVE-17873
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17873.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17874) Parquet vectorization fails on tables with complex columns when there are no projected columns

2017-10-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214519#comment-16214519
 ] 

Hive QA commented on HIVE-17874:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12893472/HIVE-17874.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11317 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_parquet_projection]
 (batchId=42)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=101)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_parquet_projection]
 (batchId=121)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=204)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=221)
org.apache.hadoop.hive.ql.parse.authorization.plugin.sqlstd.TestOperation2Privilege.checkHiveOperationTypeMatch
 (batchId=269)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7439/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7439/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7439/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12893472 - PreCommit-HIVE-Build

> Parquet vectorization fails on tables with complex columns when there are no 
> projected columns
> --
>
> Key: HIVE-17874
> URL: https://issues.apache.org/jira/browse/HIVE-17874
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17874.01-branch-2.patch, HIVE-17874.01.patch
>
>
> When a parquet table contains an unsupported type like {{Map}}, {{LIST}} or 
> {{UNION}} simple queries like {{select count(*) from table}} fails with 
> {{unsupported type exception}} even though vectorized reader doesn't really 
> need read the complex type into batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17874) Parquet vectorization fails on tables with complex columns when there are no projected columns

2017-10-22 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214512#comment-16214512
 ] 

Ferdinand Xu commented on HIVE-17874:
-

Thank you for the patch. Just a few minor comments.

Is the last line of comments not needed or half done?
{code:java}
+  //if there are colsToInclude initialize each columnReader
{code}

I see the following is moving from constructor to the initial method. Is it 
just for clean up code? If so, not sure whether we can move rbCtx = 
Utilities.getVectorizedRowBatchCtx(conf); as well.
{code:java}
colsToInclude = ColumnProjectionUtils.getReadColumnIDs(conf);
{code}

Unnecessary change for the following line.
{code:java}
+  private VectorizedColumnReader  buildVectorizedParquetReader(
{code}


> Parquet vectorization fails on tables with complex columns when there are no 
> projected columns
> --
>
> Key: HIVE-17874
> URL: https://issues.apache.org/jira/browse/HIVE-17874
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17874.01-branch-2.patch, HIVE-17874.01.patch
>
>
> When a parquet table contains an unsupported type like {{Map}}, {{LIST}} or 
> {{UNION}} simple queries like {{select count(*) from table}} fails with 
> {{unsupported type exception}} even though vectorized reader doesn't really 
> need read the complex type into batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17876) row.serde.deserialize broken for non-vectorized file inputformats

2017-10-22 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214500#comment-16214500
 ] 

Vihang Karajgaonkar commented on HIVE-17876:


CC: [~mmccline]

> row.serde.deserialize broken for non-vectorized file inputformats
> -
>
> Key: HIVE-17876
> URL: https://issues.apache.org/jira/browse/HIVE-17876
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Vihang Karajgaonkar
>
> Vectorization using {{hive.vectorized.use.row.serde.deserialize}} errors out 
> for both Orc and Parquet input format.
> Steps to reproduce:
> {noformat}
> set hive.fetch.task.conversion=none;
> set hive.vectorized.use.row.serde.deserialize=true;
> set 
> hive.vectorized.input.format.excludes=org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
> set hive.vectorized.execution.enabled=true;
> explain vectorization select * from alltypesorc where cint = 528534767 limit 
> 10;
> ++
> |  Explain   |
> ++
> | PLAN VECTORIZATION:|
> |   enabled: true|
> |   enabledConditionsMet: [hive.vectorized.execution.enabled IS true] |
> ||
> | STAGE DEPENDENCIES:|
> |   Stage-1 is a root stage  |
> |   Stage-0 depends on stages: Stage-1   |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-1   |
> | Map Reduce |
> |   Map Operator Tree:   |
> |   TableScan|
> | alias: alltypesorc |
> | Statistics: Num rows: 12288 Data size: 2641964 Basic stats: 
> COMPLETE Column stats: NONE |
> | Filter Operator|
> |   predicate: (cint = 528534767) (type: boolean) |
> |   Statistics: Num rows: 6144 Data size: 1320982 Basic stats: 
> COMPLETE Column stats: NONE |
> |   Select Operator  |
> | expressions: ctinyint (type: tinyint), csmallint (type: 
> smallint), 528534767 (type: int), cbigint (type: bigint), cfloat (type: 
> float), cdouble (type: double), cstring1 (type: string), cstring2 (type: 
> string), ctimestamp1 (type: timestamp), ctimestamp2 (type: timestamp), 
> cboolean1 (type: boolean), cboolean2 (type: boolean) |
> | outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
> _col5, _col6, _col7, _col8, _col9, _col10, _col11 |
> | Statistics: Num rows: 6144 Data size: 1320982 Basic stats: 
> COMPLETE Column stats: NONE |
> | Limit  |
> |   Number of rows: 10   |
> |   Statistics: Num rows: 10 Data size: 2150 Basic stats: 
> COMPLETE Column stats: NONE |
> |   File Output Operator |
> | compressed: false  |
> | Statistics: Num rows: 10 Data size: 2150 Basic stats: 
> COMPLETE Column stats: NONE |
> | table: |
> | input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat |
> | output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
> | serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
> |   Execution mode: vectorized   |
> |   Map Vectorization:   |
> |   enabled: true|
> |   enabledConditionsMet: hive.vectorized.use.row.serde.deserialize 
> IS true |
> |   groupByVectorOutput: true|
> |   inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat 
> |
> |   allNative: false |
> |   usesVectorUDFAdaptor: false  |
> |   vectorized: true |
> ||
> |   Stage: Stage-0   |
> | Fetch Operator |
> |   limit: 10|
> |   Processor Tree:  |
> | ListSink   |
> ||
> ++
> 48 rows selected (0.742 seconds)
> 0: jdbc:

[jira] [Assigned] (HIVE-17875) Vectorization support for complex types breaks parquet vectorization

2017-10-22 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-17875:
--


> Vectorization support for complex types breaks parquet vectorization
> 
>
> Key: HIVE-17875
> URL: https://issues.apache.org/jira/browse/HIVE-17875
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>
> HIVE-16589 introduced support for complex types for vectorized execution. It 
> introduces two new configs {{hive.vectorized.complex.types.enabled}}  and 
> {{hive.vectorized.groupby.complex.types.enabled}} which default to true and 
> control whether {{Vectorizer}} creates a vectorized execution plan for 
> queries using complex types. Since Parquet fileformat does not support 
> vectorization for complex types yet, any query running on parquet tables with 
> complex types current fails with a RuntimeException complaining that the 
> complex type is not supported. We should improve the logic in Vectorizer to 
> check if the FileinputFormat supports complex types and if not it should not 
> vectorize the query plan.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17368) DBTokenStore fails to connect in Kerberos enabled remote HMS environment

2017-10-22 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17368:
---
   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

> DBTokenStore fails to connect in Kerberos enabled remote HMS environment
> 
>
> Key: HIVE-17368
> URL: https://issues.apache.org/jira/browse/HIVE-17368
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.0.0, 2.1.0, 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17368.01-branch-2.patch, HIVE-17368.01.patch, 
> HIVE-17368.02-branch-2.patch, HIVE-17368.02.patch, 
> HIVE-17368.03-branch-2.patch, HIVE-17368.04-branch-2.patch, 
> HIVE-17368.05-branch-2.patch, HIVE-17368.06-branch-2.patch
>
>
> In setups where HMS is running as a remote process secured using Kerberos, 
> and when {{DBTokenStore}} is configured as the token store, the HS2 Thrift 
> API call {{GetDelegationToken}} fail with exception trace seen below. HS2 is 
> not able to invoke HMS APIs needed to add/remove/renew tokens from the DB 
> since it is possible that the user which is issue the {{GetDelegationToken}} 
> is not kerberos enabled.
> Eg. Oozie submits a job on behalf of user "Joe". When Oozie opens a session 
> with HS2 it uses Oozie's principal and creates a proxy UGI with Hive. This 
> principal can establish a transport authenticated using Kerberos. It stores 
> the HMS delegation token string in the sessionConf and sessionToken. Now, 
> lets say Oozie issues a {{GetDelegationToken}} which has {{Joe}} as the owner 
> and {{oozie}} as the renewer in {{GetDelegationTokenReq}}. This API call 
> cannot instantiate a HMSClient and open transport to HMS using the HMSToken 
> string available in the sessionConf, since DBTokenStore uses server HiveConf 
> instead of sessionConf. It tries to establish transport using Kerberos and it 
> fails since user Joe is not Kerberos enabled.
> I see the following exception trace in HS2 logs.
> {noformat}
> 2017-08-21T18:07:19,644 ERROR [HiveServer2-Handler-Pool: Thread-61] 
> transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_121]
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> [libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  [libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_121]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_121]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  [hadoop-common-2.7.2.jar:?]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:488)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:255)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)
>  [hive-exec-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method) ~[?:1.8.0_121]
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  [?:1.8.0_121]
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  [?:1.8.0_121]
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
> [?:1.8.0_121]
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1699)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(Retrying

[jira] [Commented] (HIVE-17368) DBTokenStore fails to connect in Kerberos enabled remote HMS environment

2017-10-22 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214492#comment-16214492
 ] 

Vihang Karajgaonkar commented on HIVE-17368:


Patch merged to master

> DBTokenStore fails to connect in Kerberos enabled remote HMS environment
> 
>
> Key: HIVE-17368
> URL: https://issues.apache.org/jira/browse/HIVE-17368
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.0.0, 2.1.0, 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17368.01-branch-2.patch, HIVE-17368.01.patch, 
> HIVE-17368.02-branch-2.patch, HIVE-17368.02.patch, 
> HIVE-17368.03-branch-2.patch, HIVE-17368.04-branch-2.patch, 
> HIVE-17368.05-branch-2.patch, HIVE-17368.06-branch-2.patch
>
>
> In setups where HMS is running as a remote process secured using Kerberos, 
> and when {{DBTokenStore}} is configured as the token store, the HS2 Thrift 
> API call {{GetDelegationToken}} fail with exception trace seen below. HS2 is 
> not able to invoke HMS APIs needed to add/remove/renew tokens from the DB 
> since it is possible that the user which is issue the {{GetDelegationToken}} 
> is not kerberos enabled.
> Eg. Oozie submits a job on behalf of user "Joe". When Oozie opens a session 
> with HS2 it uses Oozie's principal and creates a proxy UGI with Hive. This 
> principal can establish a transport authenticated using Kerberos. It stores 
> the HMS delegation token string in the sessionConf and sessionToken. Now, 
> lets say Oozie issues a {{GetDelegationToken}} which has {{Joe}} as the owner 
> and {{oozie}} as the renewer in {{GetDelegationTokenReq}}. This API call 
> cannot instantiate a HMSClient and open transport to HMS using the HMSToken 
> string available in the sessionConf, since DBTokenStore uses server HiveConf 
> instead of sessionConf. It tries to establish transport using Kerberos and it 
> fails since user Joe is not Kerberos enabled.
> I see the following exception trace in HS2 logs.
> {noformat}
> 2017-08-21T18:07:19,644 ERROR [HiveServer2-Handler-Pool: Thread-61] 
> transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_121]
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> [libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  [libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_121]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_121]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  [hadoop-common-2.7.2.jar:?]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:488)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:255)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)
>  [hive-exec-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method) ~[?:1.8.0_121]
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  [?:1.8.0_121]
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  [?:1.8.0_121]
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
> [?:1.8.0_121]
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1699)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
>  [hive-metastore-2.

[jira] [Updated] (HIVE-17874) Parquet vectorization fails on tables with complex columns when there are no projected columns

2017-10-22 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17874:
---
Status: Patch Available  (was: Open)

> Parquet vectorization fails on tables with complex columns when there are no 
> projected columns
> --
>
> Key: HIVE-17874
> URL: https://issues.apache.org/jira/browse/HIVE-17874
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17874.01-branch-2.patch, HIVE-17874.01.patch
>
>
> When a parquet table contains an unsupported type like {{Map}}, {{LIST}} or 
> {{UNION}} simple queries like {{select count(*) from table}} fails with 
> {{unsupported type exception}} even though vectorized reader doesn't really 
> need read the complex type into batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17874) Parquet vectorization fails on tables with complex columns when there are no projected columns

2017-10-22 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17874:
---
Attachment: HIVE-17874.01.patch
HIVE-17874.01-branch-2.patch

> Parquet vectorization fails on tables with complex columns when there are no 
> projected columns
> --
>
> Key: HIVE-17874
> URL: https://issues.apache.org/jira/browse/HIVE-17874
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17874.01-branch-2.patch, HIVE-17874.01.patch
>
>
> When a parquet table contains an unsupported type like {{Map}}, {{LIST}} or 
> {{UNION}} simple queries like {{select count(*) from table}} fails with 
> {{unsupported type exception}} even though vectorized reader doesn't really 
> need read the complex type into batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17874) Parquet vectorization fails on tables with complex columns when there are no projected columns

2017-10-22 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17874:
---
Affects Version/s: 2.2.0

> Parquet vectorization fails on tables with complex columns when there are no 
> projected columns
> --
>
> Key: HIVE-17874
> URL: https://issues.apache.org/jira/browse/HIVE-17874
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17874.01-branch-2.patch, HIVE-17874.01.patch
>
>
> When a parquet table contains an unsupported type like {{Map}}, {{LIST}} or 
> {{UNION}} simple queries like {{select count(*) from table}} fails with 
> {{unsupported type exception}} even though vectorized reader doesn't really 
> need read the complex type into batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17368) DBTokenStore fails to connect in Kerberos enabled remote HMS environment

2017-10-22 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214485#comment-16214485
 ] 

Vihang Karajgaonkar commented on HIVE-17368:


test failures are unrelated.

> DBTokenStore fails to connect in Kerberos enabled remote HMS environment
> 
>
> Key: HIVE-17368
> URL: https://issues.apache.org/jira/browse/HIVE-17368
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.0.0, 2.1.0, 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17368.01-branch-2.patch, HIVE-17368.01.patch, 
> HIVE-17368.02-branch-2.patch, HIVE-17368.02.patch, 
> HIVE-17368.03-branch-2.patch, HIVE-17368.04-branch-2.patch, 
> HIVE-17368.05-branch-2.patch, HIVE-17368.06-branch-2.patch
>
>
> In setups where HMS is running as a remote process secured using Kerberos, 
> and when {{DBTokenStore}} is configured as the token store, the HS2 Thrift 
> API call {{GetDelegationToken}} fail with exception trace seen below. HS2 is 
> not able to invoke HMS APIs needed to add/remove/renew tokens from the DB 
> since it is possible that the user which is issue the {{GetDelegationToken}} 
> is not kerberos enabled.
> Eg. Oozie submits a job on behalf of user "Joe". When Oozie opens a session 
> with HS2 it uses Oozie's principal and creates a proxy UGI with Hive. This 
> principal can establish a transport authenticated using Kerberos. It stores 
> the HMS delegation token string in the sessionConf and sessionToken. Now, 
> lets say Oozie issues a {{GetDelegationToken}} which has {{Joe}} as the owner 
> and {{oozie}} as the renewer in {{GetDelegationTokenReq}}. This API call 
> cannot instantiate a HMSClient and open transport to HMS using the HMSToken 
> string available in the sessionConf, since DBTokenStore uses server HiveConf 
> instead of sessionConf. It tries to establish transport using Kerberos and it 
> fails since user Joe is not Kerberos enabled.
> I see the following exception trace in HS2 logs.
> {noformat}
> 2017-08-21T18:07:19,644 ERROR [HiveServer2-Handler-Pool: Thread-61] 
> transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>  ~[?:1.8.0_121]
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>  ~[libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> [libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>  [libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_121]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_121]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  [hadoop-common-2.7.2.jar:?]
> at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>  [hive-shims-common-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:488)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:255)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)
>  [hive-exec-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method) ~[?:1.8.0_121]
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  [?:1.8.0_121]
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  [?:1.8.0_121]
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
> [?:1.8.0_121]
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1699)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
>  [hive-metastore-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]

[jira] [Commented] (HIVE-17873) External LLAP client: allow same handleID to be used more than once

2017-10-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214483#comment-16214483
 ] 

Hive QA commented on HIVE-17873:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12893470/HIVE-17873.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11315 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=101)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=204)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=221)
org.apache.hadoop.hive.ql.parse.authorization.plugin.sqlstd.TestOperation2Privilege.checkHiveOperationTypeMatch
 (batchId=269)
org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighShuffleBytes
 (batchId=228)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7438/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7438/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7438/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12893470 - PreCommit-HIVE-Build

> External LLAP client: allow same handleID to be used more than once
> ---
>
> Key: HIVE-17873
> URL: https://issues.apache.org/jira/browse/HIVE-17873
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17873.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17874) Parquet vectorization fails on tables with complex columns when there are no projected columns

2017-10-22 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-17874:
--


> Parquet vectorization fails on tables with complex columns when there are no 
> projected columns
> --
>
> Key: HIVE-17874
> URL: https://issues.apache.org/jira/browse/HIVE-17874
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>
> When a parquet table contains an unsupported type like {{Map}}, {{LIST}} or 
> {{UNION}} simple queries like {{select count(*) from table}} fails with 
> {{unsupported type exception}} even though vectorized reader doesn't really 
> need read the complex type into batches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17737) ObjectStore.getNotificationEventsCount may cause NPE

2017-10-22 Thread Prasad Nagaraj Subramanya (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Nagaraj Subramanya reassigned HIVE-17737:


Assignee: Prasad Nagaraj Subramanya

> ObjectStore.getNotificationEventsCount may cause NPE
> 
>
> Key: HIVE-17737
> URL: https://issues.apache.org/jira/browse/HIVE-17737
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Alexander Kolbasov
>Assignee: Prasad Nagaraj Subramanya
>
> In ObjectStore.getNotificationEventsCount():
> {code}
>  public NotificationEventsCountResponse 
> getNotificationEventsCount(NotificationEventsCountRequest rqst) {
> Long result = 0L;
> try {
>   openTransaction();
>   long fromEventId = rqst.getFromEventId();
>   String inputDbName = rqst.getDbName();
>   String queryStr = "select count(eventId) from " + 
> MNotificationLog.class.getName()
> + " where eventId > fromEventId && dbName == inputDbName";
>   query = pm.newQuery(queryStr);
>   query.declareParameters("java.lang.Long fromEventId, java.lang.String 
> inputDbName");
>   result = (Long) query.execute(fromEventId, inputDbName); // <- Here
>   commited = commitTransaction();
>   return new NotificationEventsCountResponse(result.longValue());
> }
>   }
> {code}
> It is possible that query.execute will return null in which case 
> rsult.longValue() may throw NPE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17873) External LLAP client: allow same handleID to be used more than once

2017-10-22 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17873:
--
Status: Patch Available  (was: Open)

> External LLAP client: allow same handleID to be used more than once
> ---
>
> Key: HIVE-17873
> URL: https://issues.apache.org/jira/browse/HIVE-17873
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17873.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17873) External LLAP client: allow same handleID to be used more than once

2017-10-22 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17873:
--
Attachment: HIVE-17873.1.patch

> External LLAP client: allow same handleID to be used more than once
> ---
>
> Key: HIVE-17873
> URL: https://issues.apache.org/jira/browse/HIVE-17873
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17873.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17873) External LLAP client: allow same handleID to be used more than once

2017-10-22 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-17873:
-


> External LLAP client: allow same handleID to be used more than once
> ---
>
> Key: HIVE-17873
> URL: https://issues.apache.org/jira/browse/HIVE-17873
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17870) Update NoDeleteRollingFileAppender to use Log4j2 api

2017-10-22 Thread Prasad Nagaraj Subramanya (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214435#comment-16214435
 ] 

Prasad Nagaraj Subramanya commented on HIVE-17870:
--

[~aihuaxu] It looks like NoDeleteRollingFileAppender is never used. Should we 
just remove the class instead?

> Update NoDeleteRollingFileAppender to use Log4j2 api
> 
>
> Key: HIVE-17870
> URL: https://issues.apache.org/jira/browse/HIVE-17870
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>
> NoDeleteRollingFileAppender is still using log4jv1 api. Since we already 
> moved to use log4j2 in hive, we better update to use log4jv2 as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)