[jira] [Updated] (HIVE-12369) Native Vector GroupBy

2017-07-24 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12369:

Description: Implement Native Vector GroupBy using fast hash table 
technology developed for Native Vector MapJoin and vector key handling 
developed for recent HIVE-12290 Native Vector ReduceSink JIRA.  (was: Implement 
fast Vector GroupBy using fast hash table technology developed for Native 
Vector MapJoin and vector key handling developed for recent HIVE-12290 Native 
Vector ReduceSink JIRA.

(Patch also includes making Native Vector MapJoin use Hybrid Grace -- but that 
can be separated out))

> Native Vector GroupBy
> -
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch
>
>
> Implement Native Vector GroupBy using fast hash table technology developed 
> for Native Vector MapJoin and vector key handling developed for recent 
> HIVE-12290 Native Vector ReduceSink JIRA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-12369) Native Vector GroupBy

2017-07-24 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12369:

Description: Implement Native Vector GroupBy using fast hash table 
technology developed for Native Vector MapJoin, etc.  (was: Implement Native 
Vector GroupBy using fast hash table technology developed for Native Vector 
MapJoin and vector key handling developed for recent HIVE-12290 Native Vector 
ReduceSink JIRA.)

> Native Vector GroupBy
> -
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch
>
>
> Implement Native Vector GroupBy using fast hash table technology developed 
> for Native Vector MapJoin, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-12369) Native Vector GroupBy

2017-07-24 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12369:

Summary: Native Vector GroupBy  (was: Faster Vector GroupBy)

> Native Vector GroupBy
> -
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch
>
>
> Implement fast Vector GroupBy using fast hash table technology developed for 
> Native Vector MapJoin and vector key handling developed for recent HIVE-12290 
> Native Vector ReduceSink JIRA.
> (Patch also includes making Native Vector MapJoin use Hybrid Grace -- but 
> that can be separated out)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-24 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099513#comment-16099513
 ] 

Lefty Leverenz commented on HIVE-16998:
---

I suggest adding a second sentence to the parameter description:  "If 
hive.spark.dynamic.partition.pruning is set to false, this parameter value is 
ignored."

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15758) Allow correlated scalar subqueries with aggregates which has non-equi join predicates

2017-07-24 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099510#comment-16099510
 ] 

Lefty Leverenz commented on HIVE-15758:
---

I added a TODOC3.0 label but I'm not sure where this should be documented -- 
subqueries, joins, or aggregates?

* [Subqueries | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries]
* [Joins | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins]
* [UDAFs | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inAggregateFunctions(UDAF)]

> Allow correlated scalar subqueries with aggregates which has non-equi join 
> predicates
> -
>
> Key: HIVE-15758
> URL: https://issues.apache.org/jira/browse/HIVE-15758
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: TODOC3.0, sub-query
> Fix For: 3.0.0
>
> Attachments: HIVE-15758.1.patch, HIVE-15758.2.patch, 
> HIVE-15758.3.patch
>
>
> Queries such as 
> {code} select * from part where p_size <> (select count(p_size) from part pp 
> where part.p_type <> pp.p_type); {code} are currently not allowed since HIVE 
> doesn't know how to rewrite such queries to preserve the correctness for 
> cases when there is zero row



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-16791) Tez engine giving inaccurate results on SMB Map joins while map-join and shuffle join gets correct results

2017-07-24 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16791 started by Deepak Jaiswal.
-
> Tez engine giving inaccurate results on SMB Map joins while map-join and 
> shuffle join gets correct results
> --
>
> Key: HIVE-16791
> URL: https://issues.apache.org/jira/browse/HIVE-16791
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: Saumil Mayani
>Assignee: Deepak Jaiswal
> Attachments: sample-data-query.txt, sample-data.tar.gz-aa, 
> sample-data.tar.gz-ab, sample-data.tar.gz-ac, sample-data.tar.gz-ad
>
>
> SMB Join gives incorrect results. 
> {code}
> SMB-Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=50;
> OK
> 2016  1   11999639
> 2016  2   18955110
> 2017  2   22217437
> Time taken: 92.647 seconds, Fetched: 3 row(s)
> {code}
> {code}
> MAP-JOIN
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 17.49 seconds, Fetched: 3 row(s)
> {code}
> {code}
> Shuffle Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=false;
> set hive.auto.convert.join=false;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 38.575 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16791) Tez engine giving inaccurate results on SMB Map joins while map-join and shuffle join gets correct results

2017-07-24 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16791:
--
Status: Open  (was: Patch Available)

> Tez engine giving inaccurate results on SMB Map joins while map-join and 
> shuffle join gets correct results
> --
>
> Key: HIVE-16791
> URL: https://issues.apache.org/jira/browse/HIVE-16791
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: Saumil Mayani
>Assignee: Deepak Jaiswal
> Attachments: sample-data-query.txt, sample-data.tar.gz-aa, 
> sample-data.tar.gz-ab, sample-data.tar.gz-ac, sample-data.tar.gz-ad
>
>
> SMB Join gives incorrect results. 
> {code}
> SMB-Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=50;
> OK
> 2016  1   11999639
> 2016  2   18955110
> 2017  2   22217437
> Time taken: 92.647 seconds, Fetched: 3 row(s)
> {code}
> {code}
> MAP-JOIN
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 17.49 seconds, Fetched: 3 row(s)
> {code}
> {code}
> Shuffle Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=false;
> set hive.auto.convert.join=false;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 38.575 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15758) Allow correlated scalar subqueries with aggregates which has non-equi join predicates

2017-07-24 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-15758:
--
Labels: TODOC3.0 sub-query  (was: sub-query)

> Allow correlated scalar subqueries with aggregates which has non-equi join 
> predicates
> -
>
> Key: HIVE-15758
> URL: https://issues.apache.org/jira/browse/HIVE-15758
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: TODOC3.0, sub-query
> Fix For: 3.0.0
>
> Attachments: HIVE-15758.1.patch, HIVE-15758.2.patch, 
> HIVE-15758.3.patch
>
>
> Queries such as 
> {code} select * from part where p_size <> (select count(p_size) from part pp 
> where part.p_type <> pp.p_type); {code} are currently not allowed since HIVE 
> doesn't know how to rewrite such queries to preserve the correctness for 
> cases when there is zero row



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16791) Tez engine giving inaccurate results on SMB Map joins while map-join and shuffle join gets correct results

2017-07-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099486#comment-16099486
 ] 

Hive QA commented on HIVE-16791:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12871673/sample-data-query.txt

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6124/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6124/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6124/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-07-25 04:40:11.356
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-6124/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-07-25 04:40:11.359
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   9a85331..88da238  master -> origin/master
   2c7c92a..c37fdf9  branch-2   -> origin/branch-2
+ git reset --hard HEAD
HEAD is now at 9a85331 HIVE-17114: HoS: Possible skew in shuffling when data is 
not really skewed (Rui reviewed by Chao)
+ git clean -f -d
Removing ql/src/test/queries/clientpositive/smb_join1.q
Removing ql/src/test/results/clientpositive/llap/smb_join1.q.out
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 4 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at 88da238 HIVE-16222 : add a setting to disable row.serde for 
specific formats; enable for others (Sergey Shelukhin, reviewed by Matt McCline)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-07-25 04:40:18.375
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
patch:  Only garbage was found in the patch input.
patch:  Only garbage was found in the patch input.
patch:  Only garbage was found in the patch input.
fatal: unrecognized input
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12871673 - PreCommit-HIVE-Build

> Tez engine giving inaccurate results on SMB Map joins while map-join and 
> shuffle join gets correct results
> --
>
> Key: HIVE-16791
> URL: https://issues.apache.org/jira/browse/HIVE-16791
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: Saumil Mayani
>Assignee: Deepak Jaiswal
> Attachments: sample-data-query.txt, sample-data.tar.gz-aa, 
> sample-data.tar.gz-ab, sample-data.tar.gz-ac, sample-data.tar.gz-ad
>
>
> SMB Join gives incorrect results. 
> {code}
> SMB-Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=50;
> OK
> 2016  1   11999639
> 2016  2   18955110
> 2017  2   22217437
> Time taken: 92.647 seconds, Fetched: 3 row(s)
> {code}
> {code}
> MAP-JOIN
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set 

[jira] [Commented] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion

2017-07-24 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099485#comment-16099485
 ] 

Sahil Takiar commented on HIVE-17087:
-

Hey [~kellyzly]. Yes, opRules is a LinkedHashMap, so they should be run in 
order. After looking at HIVE-10559 in more detail, I think there is a simpler 
fix that can be made to avoid the NPE. I've left some comments in RB about it.

> Remove unnecessary HoS DPP trees during map-join conversion
> ---
>
> Key: HIVE-17087
> URL: https://issues.apache.org/jira/browse/HIVE-17087
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17087.1.patch, HIVE-17087.2.patch, 
> HIVE-17087.3.patch
>
>
> Ran the following query in the {{TestSparkCliDriver}}:
> {code:sql}
> set hive.spark.dynamic.partition.pruning=true;
> set hive.auto.convert.join=true;
> create table partitioned_table1 (col int) partitioned by (part_col int);
> create table partitioned_table2 (col int) partitioned by (part_col int);
> create table regular_table (col int);
> insert into table regular_table values (1);
> alter table partitioned_table1 add partition (part_col = 1);
> insert into table partitioned_table1 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> alter table partitioned_table2 add partition (part_col = 1);
> insert into table partitioned_table2 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> explain select * from partitioned_table1, partitioned_table2 where 
> partitioned_table1.part_col = partitioned_table2.part_col;
> {code}
> and got the following explain plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-3 depends on stages: Stage-2
>   Stage-1 depends on stages: Stage-3
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col1 (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: part_col
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   target column name: part_col
>   target work: Map 2
>   Stage: Stage-3
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table2
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark HashTable Sink Operator
>   keys:
> 0 _col1 (type: int)
> 1 _col1 (type: int)
> Local Work:
>   Map Reduce Local Work
>   Stage: Stage-1
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Map Join Operator
> 

[jira] [Commented] (HIVE-12878) Support Vectorization for TEXTFILE and other formats

2017-07-24 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099480#comment-16099480
 ] 

Lefty Leverenz commented on HIVE-12878:
---

Doc note:  HIVE-16222 changes the default value of 
*hive.vectorized.use.row.serde.deserialize* to true in release 3.0.0.

> Support Vectorization for TEXTFILE and other formats
> 
>
> Key: HIVE-12878
> URL: https://issues.apache.org/jira/browse/HIVE-12878
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-12878.01.patch, HIVE-12878.02.patch, 
> HIVE-12878.03.patch, HIVE-12878.04.patch, HIVE-12878.05.patch, 
> HIVE-12878.06.patch, HIVE-12878.07.patch, HIVE-12878.08.patch, 
> HIVE-12878.091.patch, HIVE-12878.092.patch, HIVE-12878.093.patch, 
> HIVE-12878.09.patch
>
>
> Support vectorizing when the input format is TEXTFILE and other formats for 
> better Map Vertex performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16222) add a setting to disable row.serde for specific formats; enable for others

2017-07-24 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099476#comment-16099476
 ] 

Lefty Leverenz commented on HIVE-16222:
---

Doc note:  This adds *hive.vectorized.row.serde.inputformat.excludes* to 
HiveConf.java and changes the default value of 
*hive.vectorized.use.row.serde.deserialize* to true, so the wiki needs to be 
updated.

* [Configuration Properties -- Vectorization | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Vectorization]
** [hive.vectorized.use.row.serde.deserialize | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.vectorized.use.row.serde.deserialize]

Added a TODOC3.0 label.

(Welcome back, Sergey.)

> add a setting to disable row.serde for specific formats; enable for others
> --
>
> Key: HIVE-16222
> URL: https://issues.apache.org/jira/browse/HIVE-16222
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-16222.01.patch, HIVE-16222.02.patch, 
> HIVE-16222.03.patch, HIVE-16222.04.patch, HIVE-16222.05.patch, 
> HIVE-16222.patch
>
>
> Per [~gopalv]
> {quote}
> row.serde = true ... breaks Parquet (they expect to get the same object back, 
> which means you can't buffer 1024 rows).
> {quote}
> We want to enable this and vector.serde for text vectorization. Need to turn 
> it off for specific formats.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16222) add a setting to disable row.serde for specific formats; enable for others

2017-07-24 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16222:
--
Labels: TODOC3.0  (was: )

> add a setting to disable row.serde for specific formats; enable for others
> --
>
> Key: HIVE-16222
> URL: https://issues.apache.org/jira/browse/HIVE-16222
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-16222.01.patch, HIVE-16222.02.patch, 
> HIVE-16222.03.patch, HIVE-16222.04.patch, HIVE-16222.05.patch, 
> HIVE-16222.patch
>
>
> Per [~gopalv]
> {quote}
> row.serde = true ... breaks Parquet (they expect to get the same object back, 
> which means you can't buffer 1024 rows).
> {quote}
> We want to enable this and vector.serde for text vectorization. Need to turn 
> it off for specific formats.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16948) Invalid explain when running dynamic partition pruning query in HOS

2017-07-24 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-16948:

Description: 
in 
[union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107]
 in spark_dynamic_partition_pruning.q
{code}
set hive.optimize.ppd=true;
set hive.ppd.remove.duplicatefilters=true;
set hive.spark.dynamic.partition.pruning=true;
set hive.optimize.metadataonly=false;
set hive.optimize.index.filter=true;
set hive.strict.checks.cartesian.product=false;
explain select ds from (select distinct(ds) as ds from srcpart union all select 
distinct(ds) as ds from srcpart) s where s.ds in (select max(srcpart.ds) from 
srcpart union all select min(srcpart.ds) from srcpart);
{code}
explain 
{code}
STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-1 depends on stages: Stage-2
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-2
Spark
  Edges:
Reducer 11 <- Map 10 (GROUP, 1)
Reducer 13 <- Map 12 (GROUP, 1)
  DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
  Vertices:
Map 10 
Map Operator Tree:
TableScan
  alias: srcpart
  Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL 
Column stats: NONE
  Select Operator
expressions: ds (type: string)
outputColumnNames: ds
Statistics: Num rows: 1 Data size: 23248 Basic stats: 
PARTIAL Column stats: NONE
Group By Operator
  aggregations: max(ds)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 184 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
sort order: 
Statistics: Num rows: 1 Data size: 184 Basic stats: 
COMPLETE Column stats: NONE
value expressions: _col0 (type: string)
Map 12 
Map Operator Tree:
TableScan
  alias: srcpart
  Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL 
Column stats: NONE
  Select Operator
expressions: ds (type: string)
outputColumnNames: ds
Statistics: Num rows: 1 Data size: 23248 Basic stats: 
PARTIAL Column stats: NONE
Group By Operator
  aggregations: min(ds)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 184 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
sort order: 
Statistics: Num rows: 1 Data size: 184 Basic stats: 
COMPLETE Column stats: NONE
value expressions: _col0 (type: string)
Reducer 11 
Reduce Operator Tree:
  Group By Operator
aggregations: max(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
Column stats: NONE
Filter Operator
  predicate: _col0 is not null (type: boolean)
  Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
Column stats: NONE
  Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 2 Data size: 368 Basic stats: 
COMPLETE Column stats: NONE
Select Operator
  expressions: _col0 (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 2 Data size: 368 Basic stats: 
COMPLETE Column stats: NONE
  Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 2 Data size: 368 Basic stats: 
COMPLETE Column stats: NONE
Spark Partition Pruning Sink Operator
  partition key expr: ds
  Statistics: Num rows: 2 Data size: 368 Basic stats: 
COMPLETE Column stats: NONE
  target column name: ds
  target work: Map 1
Select Operator
  expressions: _col0 (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 2 Data size: 368 Basic stats: 
COMPLETE Column stats: NONE
  Group By 

[jira] [Updated] (HIVE-16948) Invalid explain when running dynamic partition pruning query in HOS

2017-07-24 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-16948:

Status: Patch Available  (was: Open)

> Invalid explain when running dynamic partition pruning query in HOS
> ---
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16948.patch
>
>
>  union_subquery.q 
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   

[jira] [Updated] (HIVE-16948) Invalid explain when running dynamic partition pruning query in HOS

2017-07-24 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-16948:

Attachment: HIVE-16948.patch

> Invalid explain when running dynamic partition pruning query in HOS
> ---
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16948.patch
>
>
>  union_subquery.q 
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition 

[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in HOS

2017-07-24 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099424#comment-16099424
 ] 

liyunzhang_intel commented on HIVE-16948:
-

the reason why Map4 does not exist in the explain is because of 
[CombineEquivalentWorkResolver|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java]

before CombineEquivalentWorkResolver optimization is enabled,Map4 exists in the 
explain, after CombineEquivalentWorkResolver which will find and combine 
equivalent works.  Map4 is deleted because Map4 equals Map1. So we need to 
remove the spark dynamic pruning sink branch in Reducer 11 and Reducer 13 in 
Stage-2.

[~lirui], [~stakiar], [~csun] please help review, thanks!


> Invalid explain when running dynamic partition pruning query in HOS
> ---
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
>
>  union_subquery.q 
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: 

[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-24 Thread Ke Jia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated HIVE-17139:
--
Attachment: HIVE-17139.2.patch

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16997) Extend object store to store bit vectors

2017-07-24 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099386#comment-16099386
 ] 

Ashutosh Chauhan commented on HIVE-16997:
-

+1 pending test. 
Lets tackle a) removing base64 encoding and b) displaying only 2 chars of bit 
vector in desc output in a follow-up.

> Extend object store to store bit vectors
> 
>
> Key: HIVE-16997
> URL: https://issues.apache.org/jira/browse/HIVE-16997
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch, 
> HIVE-16997.03.patch, HIVE-16997.04.patch, HIVE-16997.05.patch, 
> HIVE-16997.06.patch
>
>
> This patch includes: (1) a new serde for FMSketch (2) change of schema for 
> derby and mysql (3) support for date type (4) refactoring the extrapolation 
> and merge code



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16954) LLAP IO: better debugging

2017-07-24 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099385#comment-16099385
 ] 

Gopal V commented on HIVE-16954:


Meant to +1 - this hasn't been perf tested, so will return to this if I find 
these functions in my profile runs.

> LLAP IO: better debugging
> -
>
> Key: HIVE-16954
> URL: https://issues.apache.org/jira/browse/HIVE-16954
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16954-branch-2.patch, HIVE-16954.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion

2017-07-24 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17087:

Attachment: HIVE-17087.3.patch

> Remove unnecessary HoS DPP trees during map-join conversion
> ---
>
> Key: HIVE-17087
> URL: https://issues.apache.org/jira/browse/HIVE-17087
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17087.1.patch, HIVE-17087.2.patch, 
> HIVE-17087.3.patch
>
>
> Ran the following query in the {{TestSparkCliDriver}}:
> {code:sql}
> set hive.spark.dynamic.partition.pruning=true;
> set hive.auto.convert.join=true;
> create table partitioned_table1 (col int) partitioned by (part_col int);
> create table partitioned_table2 (col int) partitioned by (part_col int);
> create table regular_table (col int);
> insert into table regular_table values (1);
> alter table partitioned_table1 add partition (part_col = 1);
> insert into table partitioned_table1 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> alter table partitioned_table2 add partition (part_col = 1);
> insert into table partitioned_table2 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> explain select * from partitioned_table1, partitioned_table2 where 
> partitioned_table1.part_col = partitioned_table2.part_col;
> {code}
> and got the following explain plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-3 depends on stages: Stage-2
>   Stage-1 depends on stages: Stage-3
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col1 (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: part_col
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   target column name: part_col
>   target work: Map 2
>   Stage: Stage-3
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table2
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark HashTable Sink Operator
>   keys:
> 0 _col1 (type: int)
> 1 _col1 (type: int)
> Local Work:
>   Map Reduce Local Work
>   Stage: Stage-1
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   keys:
> 0 _col1 (type: int)
> 1 _col1 (type: int)
>   outputColumnNames: 

[jira] [Commented] (HIVE-16954) LLAP IO: better debugging

2017-07-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099383#comment-16099383
 ] 

Sergey Shelukhin commented on HIVE-16954:
-

[~gopalv] ping?

> LLAP IO: better debugging
> -
>
> Key: HIVE-16954
> URL: https://issues.apache.org/jira/browse/HIVE-16954
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16954-branch-2.patch, HIVE-16954.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17006) LLAP: Parquet caching

2017-07-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099361#comment-16099361
 ] 

Sergey Shelukhin commented on HIVE-17006:
-

Sorry I was on vacation. Will do, after cleaning up the patch and testing more.

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16844) Fix Connection leak in ObjectStore when new Conf object is used

2017-07-24 Thread Sunitha Beeram (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099357#comment-16099357
 ] 

Sunitha Beeram commented on HIVE-16844:
---

[~mithun] Do you have further input on this?

> Fix Connection leak in ObjectStore when new Conf object is used
> ---
>
> Key: HIVE-16844
> URL: https://issues.apache.org/jira/browse/HIVE-16844
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Fix For: 3.0.0
>
> Attachments: HIVE-16844.1.patch
>
>
> The code path in ObjectStore.java currently leaks BoneCP (or Hikari) 
> connection pools when a new configuration object is passed in. The code needs 
> to ensure that the persistence-factory is closed before it is nullified.
> The relevant code is 
> [here|https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L290].
>  Note that pmf is set to null, but the underlying connection pool is not 
> closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-12631) LLAP: support ORC ACID tables

2017-07-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099349#comment-16099349
 ] 

Sergey Shelukhin commented on HIVE-12631:
-

Left some feedback on the new implementation. Looks like [~ekoifman] supports 
this approach.

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, 
> HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, 
> HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, 
> HIVE-12631.17.patch, HIVE-12631.18.patch, HIVE-12631.19.patch, 
> HIVE-12631.1.patch, HIVE-12631.20.patch, HIVE-12631.21.patch, 
> HIVE-12631.22.patch, HIVE-12631.23.patch, HIVE-12631.2.patch, 
> HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, 
> HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch, 
> HIVE-12631.8.patch, HIVE-12631.9.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors

2017-07-24 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16997:
---
Attachment: HIVE-16997.06.patch

> Extend object store to store bit vectors
> 
>
> Key: HIVE-16997
> URL: https://issues.apache.org/jira/browse/HIVE-16997
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch, 
> HIVE-16997.03.patch, HIVE-16997.04.patch, HIVE-16997.05.patch, 
> HIVE-16997.06.patch
>
>
> This patch includes: (1) a new serde for FMSketch (2) change of schema for 
> derby and mysql (3) support for date type (4) refactoring the extrapolation 
> and merge code



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors

2017-07-24 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16997:
---
Status: Open  (was: Patch Available)

> Extend object store to store bit vectors
> 
>
> Key: HIVE-16997
> URL: https://issues.apache.org/jira/browse/HIVE-16997
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch, 
> HIVE-16997.03.patch, HIVE-16997.04.patch, HIVE-16997.05.patch, 
> HIVE-16997.06.patch
>
>
> This patch includes: (1) a new serde for FMSketch (2) change of schema for 
> derby and mysql (3) support for date type (4) refactoring the extrapolation 
> and merge code



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors

2017-07-24 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16997:
---
Status: Patch Available  (was: Open)

> Extend object store to store bit vectors
> 
>
> Key: HIVE-16997
> URL: https://issues.apache.org/jira/browse/HIVE-16997
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch, 
> HIVE-16997.03.patch, HIVE-16997.04.patch, HIVE-16997.05.patch, 
> HIVE-16997.06.patch
>
>
> This patch includes: (1) a new serde for FMSketch (2) change of schema for 
> derby and mysql (3) support for date type (4) refactoring the extrapolation 
> and merge code



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs

2017-07-24 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099332#comment-16099332
 ] 

Ashutosh Chauhan commented on HIVE-17131:
-

This patch is for branch-2. I think there we dont want to make any changes in 
those interfaces.
+1

> Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
> ---
>
> Key: HIVE-17131
> URL: https://issues.apache.org/jira/browse/HIVE-17131
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17131.1.branch-2.patch, HIVE-17131.1.patch
>
>
> Adding InterfaceAudience and InterfaceStability annotations for the core 
> SerDe APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Moved] (HIVE-17162) get rid of "skipCorrupt" flag

2017-07-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin moved HADOOP-14684 to HIVE-17162:
--

Key: HIVE-17162  (was: HADOOP-14684)
Project: Hive  (was: Hadoop Common)

> get rid of "skipCorrupt" flag
> -
>
> Key: HIVE-17162
> URL: https://issues.apache.org/jira/browse/HIVE-17162
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> The error that caused the issue was a long time ago and it's probably ok to 
> get rid of this flag.
> Perhaps we should provide a small tool to overwrite these files without the 
> corrupt values.
> cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17162) get rid of "skipCorrupt" flag

2017-07-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17162:

Description: 
The error that caused the issue was a long time ago and it's probably ok to get 
rid of this flag.
Perhaps we should provide a small tool to overwrite these files without the 
corrupted values.
cc [~prasanth_j]

  was:
The error that caused the issue was a long time ago and it's probably ok to get 
rid of this flag.
Perhaps we should provide a small tool to overwrite these files without the 
corrupt values.
cc [~prasanth_j]


> get rid of "skipCorrupt" flag
> -
>
> Key: HIVE-17162
> URL: https://issues.apache.org/jira/browse/HIVE-17162
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> The error that caused the issue was a long time ago and it's probably ok to 
> get rid of this flag.
> Perhaps we should provide a small tool to overwrite these files without the 
> corrupted values.
> cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17133) NoSuchMethodError in Hadoop FileStatus.compareTo

2017-07-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099305#comment-16099305
 ] 

Sergey Shelukhin commented on HIVE-17133:
-

Filed HADOOP-14683. Let's see what the response is, and go from there.

> NoSuchMethodError in Hadoop FileStatus.compareTo
> 
>
> Key: HIVE-17133
> URL: https://issues.apache.org/jira/browse/HIVE-17133
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>
> The stack trace is:
> {noformat}
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I
>   at 
> org.apache.hadoop.hive.ql.io.AcidUtils.lambda$getAcidState$0(AcidUtils.java:931)
>   at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
>   at java.util.TimSort.sort(TimSort.java:234)
>   at java.util.Arrays.sort(Arrays.java:1512)
>   at java.util.ArrayList.sort(ArrayList.java:1454)
>   at java.util.Collections.sort(Collections.java:175)
>   at 
> org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:929)
> {noformat}
> I'm on Hive master and using Hadoop 2.7.2. The method signature in Hadoop 
> 2.7.2 is:
> https://github.com/apache/hadoop/blob/release-2.7.2-RC2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileStatus.java#L336
> In Hadoop 2.8.0 it becomes:
> https://github.com/apache/hadoop/blob/release-2.8.0-RC3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileStatus.java#L332
> I think that breaks binary compatibility.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17161) BeeLine up arrow key should show the last sql instead of last line

2017-07-24 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099303#comment-16099303
 ] 

Gopal V commented on HIVE-17161:


AFAIK, you need JLine 3 for this.

https://github.com/jline/jline3/blob/master/reader/src/main/java/org/jline/reader/Parser.java#L25

> BeeLine up arrow key should show the last sql instead of last line
> --
>
> Key: HIVE-17161
> URL: https://issues.apache.org/jira/browse/HIVE-17161
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>
> Currently, when you press Up arrow on beeline prompt, it shows the last line 
> of the previous sql. It is hard to execute the previous sql if it was 
> executed using multiple lines. It would be good to improve this experience by 
> fetching the last command executed instead of just the last line of the 
> previous command.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17161) BeeLine up arrow key should show the last sql instead of last line

2017-07-24 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-17161:
--


> BeeLine up arrow key should show the last sql instead of last line
> --
>
> Key: HIVE-17161
> URL: https://issues.apache.org/jira/browse/HIVE-17161
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>
> Currently, when you press Up arrow on beeline prompt, it shows the last line 
> of the previous sql. It is hard to execute the previous sql if it was 
> executed using multiple lines. It would be good to improve this experience by 
> fetching the last command executed instead of just the last line of the 
> previous command.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors

2017-07-24 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16997:
---
Status: Open  (was: Patch Available)

> Extend object store to store bit vectors
> 
>
> Key: HIVE-16997
> URL: https://issues.apache.org/jira/browse/HIVE-16997
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch, 
> HIVE-16997.03.patch, HIVE-16997.04.patch, HIVE-16997.05.patch
>
>
> This patch includes: (1) a new serde for FMSketch (2) change of schema for 
> derby and mysql (3) support for date type (4) refactoring the extrapolation 
> and merge code



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors

2017-07-24 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16997:
---
Status: Patch Available  (was: Open)

> Extend object store to store bit vectors
> 
>
> Key: HIVE-16997
> URL: https://issues.apache.org/jira/browse/HIVE-16997
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch, 
> HIVE-16997.03.patch, HIVE-16997.04.patch, HIVE-16997.05.patch
>
>
> This patch includes: (1) a new serde for FMSketch (2) change of schema for 
> derby and mysql (3) support for date type (4) refactoring the extrapolation 
> and merge code



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors

2017-07-24 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16997:
---
Attachment: HIVE-16997.05.patch

> Extend object store to store bit vectors
> 
>
> Key: HIVE-16997
> URL: https://issues.apache.org/jira/browse/HIVE-16997
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch, 
> HIVE-16997.03.patch, HIVE-16997.04.patch, HIVE-16997.05.patch
>
>
> This patch includes: (1) a new serde for FMSketch (2) change of schema for 
> derby and mysql (3) support for date type (4) refactoring the extrapolation 
> and merge code



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-16222) add a setting to disable row.serde for specific formats; enable for others

2017-07-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-16222.
-
Resolution: Fixed

> add a setting to disable row.serde for specific formats; enable for others
> --
>
> Key: HIVE-16222
> URL: https://issues.apache.org/jira/browse/HIVE-16222
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-16222.01.patch, HIVE-16222.02.patch, 
> HIVE-16222.03.patch, HIVE-16222.04.patch, HIVE-16222.05.patch, 
> HIVE-16222.patch
>
>
> Per [~gopalv]
> {quote}
> row.serde = true ... breaks Parquet (they expect to get the same object back, 
> which means you can't buffer 1024 rows).
> {quote}
> We want to enable this and vector.serde for text vectorization. Need to turn 
> it off for specific formats.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17149) Hdfs directory is not cleared if partition creation failed on HMS

2017-07-24 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-17149:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks [~zsombor.klara] for the work.

> Hdfs directory is not cleared if partition creation failed on HMS
> -
>
> Key: HIVE-17149
> URL: https://issues.apache.org/jira/browse/HIVE-17149
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-17149.01.patch
>
>
> Hive#loadPartition will load a directory into a Hive Table Partition. It will 
> alter the existing content of
> the partition with the new contents and create a new partition if one does 
> not exist.
> The file move is performed before the partition creation and if the creation 
> failes, the moved files are not cleared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16222) add a setting to disable row.serde for specific formats; enable for others

2017-07-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099261#comment-16099261
 ] 

Sergey Shelukhin commented on HIVE-16222:
-

Looks like I forgot to push this

> add a setting to disable row.serde for specific formats; enable for others
> --
>
> Key: HIVE-16222
> URL: https://issues.apache.org/jira/browse/HIVE-16222
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-16222.01.patch, HIVE-16222.02.patch, 
> HIVE-16222.03.patch, HIVE-16222.04.patch, HIVE-16222.05.patch, 
> HIVE-16222.patch
>
>
> Per [~gopalv]
> {quote}
> row.serde = true ... breaks Parquet (they expect to get the same object back, 
> which means you can't buffer 1024 rows).
> {quote}
> We want to enable this and vector.serde for text vectorization. Need to turn 
> it off for specific formats.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17133) NoSuchMethodError in Hadoop FileStatus.compareTo

2017-07-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099228#comment-16099228
 ] 

Sergey Shelukhin commented on HIVE-17133:
-

I think we do... we should still be able run on it.
We'd probably need a shim. Or we can ask to re-add the method in 2.8.1 (or 
whatever) and not support 2.8.0.


> NoSuchMethodError in Hadoop FileStatus.compareTo
> 
>
> Key: HIVE-17133
> URL: https://issues.apache.org/jira/browse/HIVE-17133
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>
> The stack trace is:
> {noformat}
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I
>   at 
> org.apache.hadoop.hive.ql.io.AcidUtils.lambda$getAcidState$0(AcidUtils.java:931)
>   at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
>   at java.util.TimSort.sort(TimSort.java:234)
>   at java.util.Arrays.sort(Arrays.java:1512)
>   at java.util.ArrayList.sort(ArrayList.java:1454)
>   at java.util.Collections.sort(Collections.java:175)
>   at 
> org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:929)
> {noformat}
> I'm on Hive master and using Hadoop 2.7.2. The method signature in Hadoop 
> 2.7.2 is:
> https://github.com/apache/hadoop/blob/release-2.7.2-RC2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileStatus.java#L336
> In Hadoop 2.8.0 it becomes:
> https://github.com/apache/hadoop/blob/release-2.8.0-RC3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileStatus.java#L332
> I think that breaks binary compatibility.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results

2017-07-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099220#comment-16099220
 ] 

Sergey Shelukhin commented on HIVE-16965:
-

+1 pending tests. llap_smb test might now start passing

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, 
> HIVE-16965.3.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-24 Thread Janaki Lahorani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani updated HIVE-16998:
---
Attachment: HIVE16998.1.patch

Introduced parameter hive.spark.dynamic.partition.pruning.map.join.only, with a 
default value of false.

If hive.spark.dynamic.partition.pruning is set to false, this parameter value 
is ignored.  If the hive.spark.dynamic.partition.pruning is set to true, then 
if hive.spark.dynamic.partition.pruning.map.join.only is set to true, then DPP 
will be enabled only for queries that run with map joins, otherwise DPP will be 
enabled for all queries.

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results

2017-07-24 Thread Deepak Jaiswal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099204#comment-16099204
 ] 

Deepak Jaiswal commented on HIVE-16965:
---

https://reviews.apache.org/r/61087

RB link

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, 
> HIVE-16965.3.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17150) CREATE INDEX execute HMS out-of-transaction listener calls inside a transaction

2017-07-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099191#comment-16099191
 ] 

Sergio Peña commented on HIVE-17150:


It creates more than one notification per operation. One for CREATE_TABLE and 
another for CREATE_INDEX. These 2 notifications are not duplicated as they're 
in the same transaction. However, if another thread writes a new notification, 
then the EVENT_ID will be duplicated if it grabs the ID before committing the 
CREATE_INDEX operation.

Currently, only CREATE_INDEX calls a CREATE_TABLE inside the same transaction.

> CREATE INDEX execute HMS out-of-transaction listener calls inside a 
> transaction
> ---
>
> Key: HIVE-17150
> URL: https://issues.apache.org/jira/browse/HIVE-17150
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17150.1.patch, HIVE-17150.2.patch
>
>
> The problem with CREATE INDEX is that it calls a CREATE TABLE operation 
> inside the same CREATE INDEX transaction. During listener calls, there are 
> some listeners that should run in an out-of-transaction context, for 
> instance, Sentry blocks the HMS operation until the DB log notification is 
> processed, but if the transaction has not finished, then the 
> out-of-transaction listener will block forever (or until a read-time out 
> happens).
> A fix would be to add a parameter to the out-of-transaction listener that 
> alerts the listener if HMS is in an active transaction. If so, then is up to 
> the listener plugin to return immediately and avoid blocking the HMS 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-07-24 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099189#comment-16099189
 ] 

slim bouguerra commented on HIVE-17160:
---

[~sershe] and [~sseth] can you please look at the Tez part.

> Adding kerberos Authorization to the Druid hive integration
> ---
>
> Key: HIVE-17160
> URL: https://issues.apache.org/jira/browse/HIVE-17160
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17160.patch
>
>
> This goal of this feature is to allow hive querying a secured druid cluster 
> using kerberos credentials.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16759) Add table type information to HMS log notifications

2017-07-24 Thread Janaki Lahorani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani updated HIVE-16759:
---
Attachment: HIVE16759.3.patch

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-07-24 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17160:
--
Attachment: HIVE-17160.patch

> Adding kerberos Authorization to the Druid hive integration
> ---
>
> Key: HIVE-17160
> URL: https://issues.apache.org/jira/browse/HIVE-17160
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17160.patch
>
>
> This goal of this feature is to allow hive querying a secured druid cluster 
> using kerberos credentials.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results

2017-07-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099186#comment-16099186
 ] 

Sergey Shelukhin commented on HIVE-16965:
-

Can you link to RB? I saw it but I cannot find the email anymore :)

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, 
> HIVE-16965.3.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17150) CREATE INDEX execute HMS out-of-transaction listener calls inside a transaction

2017-07-24 Thread Alexander Kolbasov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099185#comment-16099185
 ] 

Alexander Kolbasov commented on HIVE-17150:
---

[~spena] When you have such nested operations, how do you handle the 
notification ID - you can't store more then a single ID in an event, but such 
nested operations may try to create more then a single notification ID in the 
transactional part.

> CREATE INDEX execute HMS out-of-transaction listener calls inside a 
> transaction
> ---
>
> Key: HIVE-17150
> URL: https://issues.apache.org/jira/browse/HIVE-17150
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17150.1.patch, HIVE-17150.2.patch
>
>
> The problem with CREATE INDEX is that it calls a CREATE TABLE operation 
> inside the same CREATE INDEX transaction. During listener calls, there are 
> some listeners that should run in an out-of-transaction context, for 
> instance, Sentry blocks the HMS operation until the DB log notification is 
> processed, but if the transaction has not finished, then the 
> out-of-transaction listener will block forever (or until a read-time out 
> happens).
> A fix would be to add a parameter to the out-of-transaction listener that 
> alerts the listener if HMS is in an active transaction. If so, then is up to 
> the listener plugin to return immediately and avoid blocking the HMS 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-07-24 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17160:
--
Status: Patch Available  (was: Open)

> Adding kerberos Authorization to the Druid hive integration
> ---
>
> Key: HIVE-17160
> URL: https://issues.apache.org/jira/browse/HIVE-17160
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>
> This goal of this feature is to allow hive querying a secured druid cluster 
> using kerberos credentials.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-07-24 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099181#comment-16099181
 ] 

slim bouguerra commented on HIVE-17160:
---

Druid supports  SPNEGO Kerberos authorization. 
http://druid.io/docs/latest/development/extensions-core/druid-kerberos.html
In This PR adds decorated Http clients with Kerberos Token and Cookies manager.
This will work only when LLAP is enable, since containers do not have any 
Kerberos credentials.
User need to set the following configuration in the Hive side:
{code}hive.llap.task.principal{code} and {code}hive.llap.task.keytab.file{code}

> Adding kerberos Authorization to the Druid hive integration
> ---
>
> Key: HIVE-17160
> URL: https://issues.apache.org/jira/browse/HIVE-17160
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>
> This goal of this feature is to allow hive querying a secured druid cluster 
> using kerberos credentials.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17150) CREATE INDEX execute HMS out-of-transaction listener calls inside a transaction

2017-07-24 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-17150:
---
Fix Version/s: 2.4.0

> CREATE INDEX execute HMS out-of-transaction listener calls inside a 
> transaction
> ---
>
> Key: HIVE-17150
> URL: https://issues.apache.org/jira/browse/HIVE-17150
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17150.1.patch, HIVE-17150.2.patch
>
>
> The problem with CREATE INDEX is that it calls a CREATE TABLE operation 
> inside the same CREATE INDEX transaction. During listener calls, there are 
> some listeners that should run in an out-of-transaction context, for 
> instance, Sentry blocks the HMS operation until the DB log notification is 
> processed, but if the transaction has not finished, then the 
> out-of-transaction listener will block forever (or until a read-time out 
> happens).
> A fix would be to add a parameter to the out-of-transaction listener that 
> alerts the listener if HMS is in an active transaction. If so, then is up to 
> the listener plugin to return immediately and avoid blocking the HMS 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17150) CREATE INDEX execute HMS out-of-transaction listener calls inside a transaction

2017-07-24 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-17150:
---
   Resolution: Fixed
Fix Version/s: 30
   Status: Resolved  (was: Patch Available)

> CREATE INDEX execute HMS out-of-transaction listener calls inside a 
> transaction
> ---
>
> Key: HIVE-17150
> URL: https://issues.apache.org/jira/browse/HIVE-17150
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 30
>
> Attachments: HIVE-17150.1.patch, HIVE-17150.2.patch
>
>
> The problem with CREATE INDEX is that it calls a CREATE TABLE operation 
> inside the same CREATE INDEX transaction. During listener calls, there are 
> some listeners that should run in an out-of-transaction context, for 
> instance, Sentry blocks the HMS operation until the DB log notification is 
> processed, but if the transaction has not finished, then the 
> out-of-transaction listener will block forever (or until a read-time out 
> happens).
> A fix would be to add a parameter to the out-of-transaction listener that 
> alerts the listener if HMS is in an active transaction. If so, then is up to 
> the listener plugin to return immediately and avoid blocking the HMS 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17150) CREATE INDEX execute HMS out-of-transaction listener calls inside a transaction

2017-07-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099171#comment-16099171
 ] 

Sergio Peña commented on HIVE-17150:


Thanks [~vihangk1]. I committed to master.

> CREATE INDEX execute HMS out-of-transaction listener calls inside a 
> transaction
> ---
>
> Key: HIVE-17150
> URL: https://issues.apache.org/jira/browse/HIVE-17150
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 3.0.0
>
> Attachments: HIVE-17150.1.patch, HIVE-17150.2.patch
>
>
> The problem with CREATE INDEX is that it calls a CREATE TABLE operation 
> inside the same CREATE INDEX transaction. During listener calls, there are 
> some listeners that should run in an out-of-transaction context, for 
> instance, Sentry blocks the HMS operation until the DB log notification is 
> processed, but if the transaction has not finished, then the 
> out-of-transaction listener will block forever (or until a read-time out 
> happens).
> A fix would be to add a parameter to the out-of-transaction listener that 
> alerts the listener if HMS is in an active transaction. If so, then is up to 
> the listener plugin to return immediately and avoid blocking the HMS 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17150) CREATE INDEX execute HMS out-of-transaction listener calls inside a transaction

2017-07-24 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-17150:
---
Fix Version/s: (was: 30)
   3.0.0

> CREATE INDEX execute HMS out-of-transaction listener calls inside a 
> transaction
> ---
>
> Key: HIVE-17150
> URL: https://issues.apache.org/jira/browse/HIVE-17150
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 3.0.0
>
> Attachments: HIVE-17150.1.patch, HIVE-17150.2.patch
>
>
> The problem with CREATE INDEX is that it calls a CREATE TABLE operation 
> inside the same CREATE INDEX transaction. During listener calls, there are 
> some listeners that should run in an out-of-transaction context, for 
> instance, Sentry blocks the HMS operation until the DB log notification is 
> processed, but if the transaction has not finished, then the 
> out-of-transaction listener will block forever (or until a read-time out 
> happens).
> A fix would be to add a parameter to the out-of-transaction listener that 
> alerts the listener if HMS is in an active transaction. If so, then is up to 
> the listener plugin to return immediately and avoid blocking the HMS 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16640) The ASF Headers have some errors in some class

2017-07-24 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16640:
---
Fix Version/s: 2.3.0

> The ASF Headers have some errors in some class
> --
>
> Key: HIVE-16640
> URL: https://issues.apache.org/jira/browse/HIVE-16640
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
> Fix For: 2.3.0, 3.0.0
>
> Attachments: HIVE-16640.1.patch
>
>
> I found some class license hive placed in an incorrect location, some classes 
> missing license



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results

2017-07-24 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16965:
--
Attachment: HIVE-16965.3.patch

Implemented comments from [~sershe] and [~gopalv]

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, 
> HIVE-16965.3.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17019) Add support to download debugging information as an archive.

2017-07-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099157#comment-16099157
 ] 

Siddharth Seth commented on HIVE-17019:
---

Re-looked at the patch. Mostly looks good. Some comments and questions.
- How is the context set up for LogDownloadServlet. 
e.g.CONF_LOG_DOWNLODER_NUM_EXECUTORS. The config should likely be set up in 
HiveConf in some way.
- init for the servlet will happen once at startup? So if there's multiple 
requests to download, and the limit is hit, all webserver threads will block? 
Should we just return an error if there's too many parallel downloads, so that 
other parts of the UI continue to be functional.
- In terms of the security - this becomes interesting. Essentially says that 
the feature will only work if authentication is enabled on secure clusters.
- Timeout for the downloads as a separate jira?
- Are any credentials required on the HttpClient created to download artifacts 
from various end points?
- For Constants like TIMELINE_PATH_PREFIX - any chance YARN has a helper 
method? Otherwise we should file a jira to ask yarn to expose such utilities.
- Both dagId and queryId cannot be specified at the same time?


> Add support to download debugging information as an archive.
> 
>
> Key: HIVE-17019
> URL: https://issues.apache.org/jira/browse/HIVE-17019
> Project: Hive
>  Issue Type: Bug
>Reporter: Harish Jaiprakash
>Assignee: Harish Jaiprakash
> Attachments: HIVE-17019.01.patch, HIVE-17019.02.patch, 
> HIVE-17019.03.patch
>
>
> Given a queryId or dagId, get all information related to it: like, tez am, 
> task logs, hive ats data, tez ats data, slider am status, etc. Package it 
> into and archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17150) CREATE INDEX execute HMS out-of-transaction listener calls inside a transaction

2017-07-24 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099145#comment-16099145
 ] 

Vihang Karajgaonkar commented on HIVE-17150:


Hi [~spena] the change looks good to me. The failures are unrelated. +1

> CREATE INDEX execute HMS out-of-transaction listener calls inside a 
> transaction
> ---
>
> Key: HIVE-17150
> URL: https://issues.apache.org/jira/browse/HIVE-17150
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-17150.1.patch, HIVE-17150.2.patch
>
>
> The problem with CREATE INDEX is that it calls a CREATE TABLE operation 
> inside the same CREATE INDEX transaction. During listener calls, there are 
> some listeners that should run in an out-of-transaction context, for 
> instance, Sentry blocks the HMS operation until the DB log notification is 
> processed, but if the transaction has not finished, then the 
> out-of-transaction listener will block forever (or until a read-time out 
> happens).
> A fix would be to add a parameter to the out-of-transaction listener that 
> alerts the listener if HMS is in an active transaction. If so, then is up to 
> the listener plugin to return immediately and avoid blocking the HMS 
> operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17159) Make metastore a separately releasable module

2017-07-24 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099143#comment-16099143
 ] 

Alan Gates commented on HIVE-17159:
---

Thanks!

As a first step I will put together a 1 pager with a proposed plan so people 
can give feedback.  Hopefully I'll have that done today or tomorrow.

Then I agree subtasks of this JIRA are the way to go.

> Make metastore a separately releasable module
> -
>
> Key: HIVE-17159
> URL: https://issues.apache.org/jira/browse/HIVE-17159
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> As proposed in this 
> [thread|https://lists.apache.org/thread.html/5e75f45d60f0b819510814a126cfd3809dd24b1c7035a1c8c41b0c5c@%3Cdev.hive.apache.org%3E]
>  on the dev list, we should move the metastore into a separately releasable 
> module.  This is a POC of and potential first step towards separating out the 
> metastore as a separate Apache TLP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17159) Make metastore a separately releasable module

2017-07-24 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099138#comment-16099138
 ] 

Vihang Karajgaonkar commented on HIVE-17159:


Hi [~alangates] Thanks for creating this. I can help working on this as well. 
Are you planning to create sub-tasks to break up the work?

> Make metastore a separately releasable module
> -
>
> Key: HIVE-17159
> URL: https://issues.apache.org/jira/browse/HIVE-17159
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> As proposed in this 
> [thread|https://lists.apache.org/thread.html/5e75f45d60f0b819510814a126cfd3809dd24b1c7035a1c8c41b0c5c@%3Cdev.hive.apache.org%3E]
>  on the dev list, we should move the metastore into a separately releasable 
> module.  This is a POC of and potential first step towards separating out the 
> metastore as a separate Apache TLP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results

2017-07-24 Thread Deepak Jaiswal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099115#comment-16099115
 ] 

Deepak Jaiswal commented on HIVE-16965:
---

[~sershe] Thanks for the comments.
Somehow lost the assert while making the code pretty. Applying all your 
comments in a patch coming in shortly.

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-07-24 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-17160:
-

Assignee: slim bouguerra

> Adding kerberos Authorization to the Druid hive integration
> ---
>
> Key: HIVE-17160
> URL: https://issues.apache.org/jira/browse/HIVE-17160
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>
> This goal of this feature is to allow hive querying a secured druid cluster 
> using kerberos credentials.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17132) Add InterfaceAudience and InterfaceStability annotations for UDF APIs

2017-07-24 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099099#comment-16099099
 ] 

Zoltan Haindrich commented on HIVE-17132:
-

I always feeled that {{GenericUDF}} and {{UDF}} are somewhat the same...but 
they don't share a common parent...

it seems to me that every {{UDF}} is repackaged into a Generic one via 
{{GenericUDFBridge}} - I assume this is because of evolutional purposes...I 
think it would be better to leave it behind...and try to remove the old {{UDF}} 
later.

> Add InterfaceAudience and InterfaceStability annotations for UDF APIs
> -
>
> Key: HIVE-17132
> URL: https://issues.apache.org/jira/browse/HIVE-17132
> Project: Hive
>  Issue Type: Sub-task
>  Components: UDF
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17132.1.patch
>
>
> Add InterfaceAudience and InterfaceStability annotations for UDF APIs. UDFs 
> are a useful plugin point for Hive users, and there are a number of external 
> UDF libraries, such as hivemall.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results

2017-07-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099093#comment-16099093
 ] 

Hive QA commented on HIVE-16965:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12878673/HIVE-16965.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11093 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[smb_join1] 
(batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6123/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6123/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6123/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12878673 - PreCommit-HIVE-Build

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17152) Improve security of random generator for HS2 cookies

2017-07-24 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099081#comment-16099081
 ] 

Tao Li commented on HIVE-17152:
---

[~susanths], [~vgumashta] Can you please take a look at this change?

> Improve security of random generator for HS2 cookies
> 
>
> Key: HIVE-17152
> URL: https://issues.apache.org/jira/browse/HIVE-17152
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17152.1.patch
>
>
> The random number generated is used as a secret to append to a sequence and 
> SHA to implement a CookieSigner. If this is attackable, then it's possible 
> for an attacker to sign a cookie as if we had. We should fix this and use 
> SecureRandom as a stronger random function .
> HTTPAuthUtils has a similar issue. If that is attackable, an attacker might 
> be able to create a similar cookie. Paired with the above issue with the 
> CookieSigner, it could reasonably spoof a HS2 cookie.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17039) Implement optimization rewritings that rely on database SQL constraints

2017-07-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099080#comment-16099080
 ] 

Sergey Shelukhin commented on HIVE-17039:
-

Hmm... I thought the Hive SQL constraints are not enforced.

> Implement optimization rewritings that rely on database SQL constraints
> ---
>
> Key: HIVE-17039
> URL: https://issues.apache.org/jira/browse/HIVE-17039
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>
> Hive already has support to declare multiple SQL constraints (PRIMARY KEY, 
> FOREIGN KEY, UNIQUE, and NOT NULL). Although these constraints cannot be 
> currently enforced on the data, they can be made available to the optimizer 
> by using the 'RELY' keyword.
> This ticket is an umbrella for all the rewriting optimizations based on SQL 
> constraints that we will be including in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17159) Make metastore a separately releasable module

2017-07-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned HIVE-17159:
-


> Make metastore a separately releasable module
> -
>
> Key: HIVE-17159
> URL: https://issues.apache.org/jira/browse/HIVE-17159
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> As proposed in this 
> [thread|https://lists.apache.org/thread.html/5e75f45d60f0b819510814a126cfd3809dd24b1c7035a1c8c41b0c5c@%3Cdev.hive.apache.org%3E]
>  on the dev list, we should move the metastore into a separately releasable 
> module.  This is a POC of and potential first step towards separating out the 
> metastore as a separate Apache TLP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17060) unix_timestamp(void) is deprecated message is printed twice

2017-07-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099073#comment-16099073
 ] 

Sergey Shelukhin commented on HIVE-17060:
-

Well, we can remove the overload and the warning in 3.0; not sure about 2.4. 
There isn't really a good way to de-dup it if the UDF is being created twice.

> unix_timestamp(void) is deprecated message is printed twice
> ---
>
> Key: HIVE-17060
> URL: https://issues.apache.org/jira/browse/HIVE-17060
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, UDF
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Peter Vary
>Priority: Trivial
>
> HIVE-10728 added a warning message when the unix_timestamp used without 
> parameters.
> When CBO is used, this message is printed twice.
> Minimal steps to reproduce:
> {code}
> set hive.cbo.enable = true;
> create table timestamp_test(s string);
> select unix_timestamp() from timestamp_test;
> {code}
> This duplication is even enforced by the golden files in the commit :)
> https://github.com/apache/hive/commit/24d3307be79d35d3a34c49014dfdd597112f9106#diff-bf6c9f3549aaeb2b40b8b1eab9254c4aR73



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup

2017-07-24 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099064#comment-16099064
 ] 

Chao Sun commented on HIVE-17117:
-

Tests failures are not related. Committed this to master. Thanks [~pgolash] for 
the patch, and [~mohitsabharwal] and [~zshao] for the review.

> Metalisteners are not notified when threadlocal metaconf is cleanup 
> 
>
> Key: HIVE-17117
> URL: https://issues.apache.org/jira/browse/HIVE-17117
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
> Environment: Tested on master branch (Applicable for downlevel 
> versions as well)
>Reporter: PRASHANT GOLASH
>Assignee: PRASHANT GOLASH
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-17117.1.patch, HIVE-17117.patch
>
>
> Meta listeners are not notified of meta-conf cleanup. This could potentially 
> leave stale values on listeners objects. For e.g.
> Request1
> a. HS2 -> HMS : HMSHandler#setMetaConf
>  MetaListeners are notified of the ConfigChangeEvent.
> b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if 
> shutdown is not invoked)
> MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta 
> listeners are not notified
> Request 2
> 3. HS2->HMS : AlterPartition
>  MetaListeners are notified of AlterPartitionEvent. If any listener has 
> taken dependency on the meta conf value, it will still be having stale value 
> from Request1 and would potentially be having issues.
> The correct behavior should be to notify meta listeners on cleanup as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup

2017-07-24 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-17117:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

> Metalisteners are not notified when threadlocal metaconf is cleanup 
> 
>
> Key: HIVE-17117
> URL: https://issues.apache.org/jira/browse/HIVE-17117
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
> Environment: Tested on master branch (Applicable for downlevel 
> versions as well)
>Reporter: PRASHANT GOLASH
>Assignee: PRASHANT GOLASH
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-17117.1.patch, HIVE-17117.patch
>
>
> Meta listeners are not notified of meta-conf cleanup. This could potentially 
> leave stale values on listeners objects. For e.g.
> Request1
> a. HS2 -> HMS : HMSHandler#setMetaConf
>  MetaListeners are notified of the ConfigChangeEvent.
> b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if 
> shutdown is not invoked)
> MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta 
> listeners are not notified
> Request 2
> 3. HS2->HMS : AlterPartition
>  MetaListeners are notified of AlterPartitionEvent. If any listener has 
> taken dependency on the meta conf value, it will still be having stale value 
> from Request1 and would potentially be having issues.
> The correct behavior should be to notify meta listeners on cleanup as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16640) The ASF Headers have some errors in some class

2017-07-24 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099062#comment-16099062
 ] 

Lefty Leverenz commented on HIVE-16640:
---

Nudging [~pxiong] to add 2.3.0 to the fix versions.  (See last comment.)

> The ASF Headers have some errors in some class
> --
>
> Key: HIVE-16640
> URL: https://issues.apache.org/jira/browse/HIVE-16640
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-16640.1.patch
>
>
> I found some class license hive placed in an incorrect location, some classes 
> missing license



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup

2017-07-24 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099059#comment-16099059
 ] 

Zheng Shao commented on HIVE-17117:
---

+1

> Metalisteners are not notified when threadlocal metaconf is cleanup 
> 
>
> Key: HIVE-17117
> URL: https://issues.apache.org/jira/browse/HIVE-17117
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
> Environment: Tested on master branch (Applicable for downlevel 
> versions as well)
>Reporter: PRASHANT GOLASH
>Assignee: PRASHANT GOLASH
>Priority: Minor
> Attachments: HIVE-17117.1.patch, HIVE-17117.patch
>
>
> Meta listeners are not notified of meta-conf cleanup. This could potentially 
> leave stale values on listeners objects. For e.g.
> Request1
> a. HS2 -> HMS : HMSHandler#setMetaConf
>  MetaListeners are notified of the ConfigChangeEvent.
> b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if 
> shutdown is not invoked)
> MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta 
> listeners are not notified
> Request 2
> 3. HS2->HMS : AlterPartition
>  MetaListeners are notified of AlterPartitionEvent. If any listener has 
> taken dependency on the meta conf value, it will still be having stale value 
> from Request1 and would potentially be having issues.
> The correct behavior should be to notify meta listeners on cleanup as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17126) Hive Metastore is incompatible with MariaDB 10.x

2017-07-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099060#comment-16099060
 ] 

Sergey Shelukhin commented on HIVE-17126:
-

Sounds like a DataNucleus issue to me (assuming it supports Maria DB), or at 
least the lack of functionality (assuming it doesn't). You might want to find 
out which one :)

Also, there's an ORM-free, SQL-based implementation of the entire metastore in 
the works in HIVE-14870. That is Oracle-specific, but some (or at least I ;) ) 
believe it should be database agnostic. It might also be of use, if there isn't 
an easy solution with DataNucleus.
 cc [~cdrome]

> Hive Metastore is incompatible with MariaDB 10.x
> 
>
> Key: HIVE-17126
> URL: https://issues.apache.org/jira/browse/HIVE-17126
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.0
>Reporter: Eric Yang
>
> MariaDB 10.x is commonly used for cheap RDBMS high availability.  Hive usage 
> of Datanucleus is currently preventing Hive Metastore to use MariaDB 10.x as 
> highly available metastore. Datanucleus generate SQL statements that are not 
> parsable by MariaDB 10.x when dropping Hive table or database schema.  
> Without MariaDB HA setup, the SQL statement problem also exists for metastore 
> interaction with MariaDB 10.x.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17077) Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len character's value is negative number

2017-07-24 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099054#comment-16099054
 ] 

Ashutosh Chauhan commented on HIVE-17077:
-

General principle in such cases is to throw exception at query compile time if 
we can detect illegal argument in udf. But to return null if its runtime since 
there likely it depends on data and we don't fail query for malformed rows.

> Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len 
> character's value is negative number
> -
>
> Key: HIVE-17077
> URL: https://issues.apache.org/jira/browse/HIVE-17077
> Project: Hive
>  Issue Type: Bug
>Reporter: Lingang Deng
>Assignee: Lingang Deng
>Priority: Minor
>
> lpad(rpad) throw a exception when the second argument a negative number, as 
> follows,
> {code:java}
> hive> select lpad("hello", -1 ,"h");
> FAILED: StringIndexOutOfBoundsException String index out of range: -1
> hive> select rpad("hello", -1 ,"h");
> FAILED: StringIndexOutOfBoundsException String index out of range: -1
> {code}
> Maybe we should return friendly result such as mysql.
> {code:java}
> mysql> select lpad("hello", -1 ,"h");
> +--+
> | lpad("hello", -1 ,"h") |
> +--+
> | NULL |
> +--+
> 1 row in set (0.00 sec)
> mysql> select rpad("hello", -1 ,"h");
> +--+
> | rpad("hello", -1 ,"h") |
> +--+
> | NULL |
> +--+
> 1 row in set (0.00 sec)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers

2017-07-24 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16077:
--
Description: 
don't think we have such tests for Acid path
check if they exist for non-acid path

way to record expected files on disk in ptest/qfile
https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25
dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/;

  was:
don't think we have such tests for Acid path
check if they exist for non-acid path


> UPDATE/DELETE fails with numBuckets > numReducers
> -
>
> Key: HIVE-16077
> URL: https://issues.apache.org/jira/browse/HIVE-16077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, 
> HIVE-16077.03.patch
>
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path
> way to record expected files on disk in ptest/qfile
> https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17088) HS2 WebUI throws a NullPointerException when opened

2017-07-24 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-17088:
---
Attachment: HIVE-17088.addendum1.patch

[~aihuaxu] Here's another patch to fix this issue.

> HS2 WebUI throws a NullPointerException when opened
> ---
>
> Key: HIVE-17088
> URL: https://issues.apache.org/jira/browse/HIVE-17088
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 3.0.0
>
> Attachments: HIVE-17088.1.patch, HIVE-17088.addendum1.patch
>
>
> After bumping the Jetty version to 3.9 and excluding several other 
> dependencies on HIVE-16049, the HS2 webui stopped working and throwing a NPE 
> error.
> {noformat}
> HTTP ERROR 500
> Problem accessing /hiveserver2.jsp. Reason:
> Server Error
> Caused by:
> java.lang.NullPointerException
>   at 
> org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:181)
>   at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:240)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> Powered by Jetty:// 9.3.19.v20170502
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17088) HS2 WebUI throws a NullPointerException when opened

2017-07-24 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-17088:
---
Status: Patch Available  (was: Reopened)

> HS2 WebUI throws a NullPointerException when opened
> ---
>
> Key: HIVE-17088
> URL: https://issues.apache.org/jira/browse/HIVE-17088
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Fix For: 3.0.0
>
> Attachments: HIVE-17088.1.patch, HIVE-17088.addendum1.patch
>
>
> After bumping the Jetty version to 3.9 and excluding several other 
> dependencies on HIVE-16049, the HS2 webui stopped working and throwing a NPE 
> error.
> {noformat}
> HTTP ERROR 500
> Problem accessing /hiveserver2.jsp. Reason:
> Server Error
> Caused by:
> java.lang.NullPointerException
>   at 
> org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:181)
>   at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:240)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> Powered by Jetty:// 9.3.19.v20170502
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17077) Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len character's value is negative number

2017-07-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099006#comment-16099006
 ] 

Sergey Shelukhin commented on HIVE-17077:
-

cc [~ashutoshc] what's the default behavior in Hive for these things? I think 
throwing a better exception (invalid argument) would be better. But we do 
return null for some invalid type conversions iirc

> Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len 
> character's value is negative number
> -
>
> Key: HIVE-17077
> URL: https://issues.apache.org/jira/browse/HIVE-17077
> Project: Hive
>  Issue Type: Bug
>Reporter: Lingang Deng
>Assignee: Lingang Deng
>Priority: Minor
>
> lpad(rpad) throw a exception when the second argument a negative number, as 
> follows,
> {code:java}
> hive> select lpad("hello", -1 ,"h");
> FAILED: StringIndexOutOfBoundsException String index out of range: -1
> hive> select rpad("hello", -1 ,"h");
> FAILED: StringIndexOutOfBoundsException String index out of range: -1
> {code}
> Maybe we should return friendly result such as mysql.
> {code:java}
> mysql> select lpad("hello", -1 ,"h");
> +--+
> | lpad("hello", -1 ,"h") |
> +--+
> | NULL |
> +--+
> 1 row in set (0.00 sec)
> mysql> select rpad("hello", -1 ,"h");
> +--+
> | rpad("hello", -1 ,"h") |
> +--+
> | NULL |
> +--+
> 1 row in set (0.00 sec)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17155) findConfFile() in HiveConf.java has some issues with the conf path

2017-07-24 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098988#comment-16098988
 ] 

Yongzhi Chen commented on HIVE-17155:
-

The change looks good. +1

> findConfFile() in HiveConf.java has some issues with the conf path
> --
>
> Key: HIVE-17155
> URL: https://issues.apache.org/jira/browse/HIVE-17155
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-17155.1.patch
>
>
> In findConfFile() function of HiveConf.java, here are some issues. 
> File.pathSeparator which is ":" is used as the separator rather than "/". new 
> File(jarUri).getParentFile() will get the "$hive_home/lib" folder, but 
> actually we want "$hive_home".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results

2017-07-24 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16965:
--
Attachment: HIVE-16965.2.patch

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results

2017-07-24 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16965:
--
Attachment: (was: HIVE-16965.2.patch)

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16614) Support "set local time zone" statement

2017-07-24 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16614:
---
Status: Patch Available  (was: Open)

> Support "set local time zone" statement
> ---
>
> Key: HIVE-16614
> URL: https://issues.apache.org/jira/browse/HIVE-16614
> Project: Hive
>  Issue Type: Improvement
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16614.patch
>
>
> HIVE-14412 introduces a timezone-aware timestamp.
> SQL has a concept of default time zone displacements, which are transparently 
> applied when converting between timezone-unaware types and timezone-aware 
> types and, in Hive's case, are also used to shift a timezone aware type to a 
> different time zone, depending on configuration.
> SQL also provides that the default time zone displacement be settable at a 
> session level, so that clients can access a database simultaneously from 
> different time zones and see time values in their own time zone.
> Currently the time zone displacement is fixed and is set based on the system 
> time zone where the Hive client runs (HiveServer2 or Hive CLI). It will be 
> more convenient for users if they have the ability to set their time zone of 
> choice.
> SQL defines "set time zone" with 2 ways of specifying the time zone, first 
> using an interval and second using the special keyword LOCAL.
> Examples:
>   • set time zone '-8:00';
>   • set time zone LOCAL;
> LOCAL means to set the current default time zone displacement to the 
> session's original default time zone displacement.
> Reference: SQL:2011 section 19.4



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16614) Support "set local time zone" statement

2017-07-24 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-16614:
--

Assignee: Jesus Camacho Rodriguez  (was: Bing Li)

> Support "set local time zone" statement
> ---
>
> Key: HIVE-16614
> URL: https://issues.apache.org/jira/browse/HIVE-16614
> Project: Hive
>  Issue Type: Improvement
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16614.patch
>
>
> HIVE-14412 introduces a timezone-aware timestamp.
> SQL has a concept of default time zone displacements, which are transparently 
> applied when converting between timezone-unaware types and timezone-aware 
> types and, in Hive's case, are also used to shift a timezone aware type to a 
> different time zone, depending on configuration.
> SQL also provides that the default time zone displacement be settable at a 
> session level, so that clients can access a database simultaneously from 
> different time zones and see time values in their own time zone.
> Currently the time zone displacement is fixed and is set based on the system 
> time zone where the Hive client runs (HiveServer2 or Hive CLI). It will be 
> more convenient for users if they have the ability to set their time zone of 
> choice.
> SQL defines "set time zone" with 2 ways of specifying the time zone, first 
> using an interval and second using the special keyword LOCAL.
> Examples:
>   • set time zone '-8:00';
>   • set time zone LOCAL;
> LOCAL means to set the current default time zone displacement to the 
> session's original default time zone displacement.
> Reference: SQL:2011 section 19.4



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16614) Support "set local time zone" statement

2017-07-24 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16614:
---
Attachment: HIVE-16614.patch

> Support "set local time zone" statement
> ---
>
> Key: HIVE-16614
> URL: https://issues.apache.org/jira/browse/HIVE-16614
> Project: Hive
>  Issue Type: Improvement
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16614.patch
>
>
> HIVE-14412 introduces a timezone-aware timestamp.
> SQL has a concept of default time zone displacements, which are transparently 
> applied when converting between timezone-unaware types and timezone-aware 
> types and, in Hive's case, are also used to shift a timezone aware type to a 
> different time zone, depending on configuration.
> SQL also provides that the default time zone displacement be settable at a 
> session level, so that clients can access a database simultaneously from 
> different time zones and see time values in their own time zone.
> Currently the time zone displacement is fixed and is set based on the system 
> time zone where the Hive client runs (HiveServer2 or Hive CLI). It will be 
> more convenient for users if they have the ability to set their time zone of 
> choice.
> SQL defines "set time zone" with 2 ways of specifying the time zone, first 
> using an interval and second using the special keyword LOCAL.
> Examples:
>   • set time zone '-8:00';
>   • set time zone LOCAL;
> LOCAL means to set the current default time zone displacement to the 
> session's original default time zone displacement.
> Reference: SQL:2011 section 19.4



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results

2017-07-24 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16965:
--
Attachment: HIVE-16965.2.patch

Forgot to attach test.

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs

2017-07-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098960#comment-16098960
 ] 

Sergey Shelukhin commented on HIVE-17131:
-

Is it stable? esp. the ObjectInspector. cc [~ashutoshc]

> Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
> ---
>
> Key: HIVE-17131
> URL: https://issues.apache.org/jira/browse/HIVE-17131
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17131.1.branch-2.patch, HIVE-17131.1.patch
>
>
> Adding InterfaceAudience and InterfaceStability annotations for the core 
> SerDe APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17115) MetaStoreUtils.getDeserializer doesn't catch the java.lang.ClassNotFoundException

2017-07-24 Thread Erik.fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik.fang updated HIVE-17115:
-
Attachment: HIVE-17115.1.patch

Sorry for the late reply

[~daijy] I try to write a test case, but fail to get the NoClassDefFoundError.
When code compiles successfully but jvm fails to find the class at runtime, the 
NoClassDefFoundError is threw
I'm afraid it is hard to get NoClassDefFoundError in TestHiveMetaStore
changed the patch to catch Throwable

[~vihangk1] Yes, the error moved up the stack, and the thread died
And the client waited for the response until timeout

upload the patch, catches throwable and rebased against  branch-1.2

> MetaStoreUtils.getDeserializer doesn't catch the 
> java.lang.ClassNotFoundException
> -
>
> Key: HIVE-17115
> URL: https://issues.apache.org/jira/browse/HIVE-17115
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1
>Reporter: Erik.fang
>Assignee: Erik.fang
> Attachments: HIVE-17115.1.patch, HIVE-17115.patch
>
>
> Suppose we create a table with Custom SerDe, then call 
> HiveMetaStoreClient.getSchema(String db, String tableName) to extract the 
> metadata from HiveMetaStore Service
> the thrift client hangs there with exception in HiveMetaStore Service's log, 
> such as
> {code:java}
> Exception in thread "pool-5-thread-129" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/util/Bytes
> at 
> org.apache.hadoop.hive.hbase.HBaseSerDe.parseColumnsMapping(HBaseSerDe.java:184)
> at 
> org.apache.hadoop.hive.hbase.HBaseSerDeParameters.(HBaseSerDeParameters.java:73)
> at 
> org.apache.hadoop.hive.hbase.HBaseSerDe.initialize(HBaseSerDe.java:117)
> at 
> org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:53)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:401)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_fields_with_environment_context(HiveMetaStore.java:3556)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_schema_with_environment_context(HiveMetaStore.java:3636)
> at sun.reflect.GeneratedMethodAccessor104.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy4.get_schema_with_environment_context(Unknown 
> Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema_with_environment_context.getResult(ThriftHiveMetastore.java:9146)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema_with_environment_context.getResult(ThriftHiveMetastore.java:9130)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:551)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:546)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:546)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.util.Bytes
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results

2017-07-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098921#comment-16098921
 ] 

Sergey Shelukhin commented on HIVE-16965:
-

Map by KV reader looks a little suspicious, what is the hashcode/equals of 
that? Is it valid and also acceptable in terms of perf? Should it be identity 
hash map?
(HiveInputFormat.HiveInputSplit) splits.get(0) - assumes one element, add an 
assert?
Why is path updated in IO context if we already set a specific one per input, 
perhaps a more detailed comment could be added

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16965) SMB join may produce incorrect results

2017-07-24 Thread Deepak Jaiswal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098913#comment-16098913
 ] 

Deepak Jaiswal edited comment on HIVE-16965 at 7/24/17 6:06 PM:


Initial patch.
Fixes the algorithm to provide correct IOContext for a given input.
In SMB, the inputs keep switching compared to traditional joins where inputs 
are read sequentially.

[~gopalv][~jdere][~hagleitn][~sershe] can you please review?


was (Author: djaiswal):
Initial patch.
Fixes the algorithm to provide correct IOContext for a given input.
In SMB, the inputs keep switching compared to traditional joins where inputs 
are read sequentially.

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results

2017-07-24 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16965:
--
Attachment: HIVE-16965.1.patch

Initial patch.
Fixes the algorithm to provide correct IOContext for a given input.
In SMB, the inputs keep switching compared to traditional joins where inputs 
are read sequentially.

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-16791) Tez engine giving inaccurate results on SMB Map joins while map-join and shuffle join gets correct results

2017-07-24 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16791 started by Deepak Jaiswal.
-
> Tez engine giving inaccurate results on SMB Map joins while map-join and 
> shuffle join gets correct results
> --
>
> Key: HIVE-16791
> URL: https://issues.apache.org/jira/browse/HIVE-16791
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: Saumil Mayani
>Assignee: Deepak Jaiswal
> Attachments: sample-data-query.txt, sample-data.tar.gz-aa, 
> sample-data.tar.gz-ab, sample-data.tar.gz-ac, sample-data.tar.gz-ad
>
>
> SMB Join gives incorrect results. 
> {code}
> SMB-Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=50;
> OK
> 2016  1   11999639
> 2016  2   18955110
> 2017  2   22217437
> Time taken: 92.647 seconds, Fetched: 3 row(s)
> {code}
> {code}
> MAP-JOIN
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 17.49 seconds, Fetched: 3 row(s)
> {code}
> {code}
> Shuffle Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=false;
> set hive.auto.convert.join=false;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 38.575 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16791) Tez engine giving inaccurate results on SMB Map joins while map-join and shuffle join gets correct results

2017-07-24 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16791:
--
Status: Patch Available  (was: In Progress)

> Tez engine giving inaccurate results on SMB Map joins while map-join and 
> shuffle join gets correct results
> --
>
> Key: HIVE-16791
> URL: https://issues.apache.org/jira/browse/HIVE-16791
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: Saumil Mayani
>Assignee: Deepak Jaiswal
> Attachments: sample-data-query.txt, sample-data.tar.gz-aa, 
> sample-data.tar.gz-ab, sample-data.tar.gz-ac, sample-data.tar.gz-ad
>
>
> SMB Join gives incorrect results. 
> {code}
> SMB-Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=50;
> OK
> 2016  1   11999639
> 2016  2   18955110
> 2017  2   22217437
> Time taken: 92.647 seconds, Fetched: 3 row(s)
> {code}
> {code}
> MAP-JOIN
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 17.49 seconds, Fetched: 3 row(s)
> {code}
> {code}
> Shuffle Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=false;
> set hive.auto.convert.join=false;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 38.575 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results

2017-07-24 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16965:
--
Status: Patch Available  (was: In Progress)

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-16965) SMB join may produce incorrect results

2017-07-24 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16965 started by Deepak Jaiswal.
-
> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16997) Extend object store to store bit vectors

2017-07-24 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098551#comment-16098551
 ] 

Ashutosh Chauhan commented on HIVE-16997:
-

Latest patch looks good. I see you have used blob type for mysql. Any reason 
for still using varchar for others? Also, can you update RB with latest patch.

> Extend object store to store bit vectors
> 
>
> Key: HIVE-16997
> URL: https://issues.apache.org/jira/browse/HIVE-16997
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch, 
> HIVE-16997.03.patch, HIVE-16997.04.patch
>
>
> This patch includes: (1) a new serde for FMSketch (2) change of schema for 
> derby and mysql (3) support for date type (4) refactoring the extrapolation 
> and merge code



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-17158) BeeLine Query Log and Query Result print order is not defined

2017-07-24 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-17158.
---
Resolution: Cannot Reproduce

> BeeLine Query Log and Query Result print order is not defined
> -
>
> Key: HIVE-17158
> URL: https://issues.apache.org/jira/browse/HIVE-17158
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, Testing Infrastructure
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>
> The output of the BeeLine tests is sometimes flaky, especially if the query 
> is a fast one
> The output is sometime this:
> {code}
> PREHOOK: query: select explode(array('a', 'b'))
> PREHOOK: type: QUERY
> PREHOOK: Input: _dummy_database@_dummy_table
>  A masked pattern was here 
> POSTHOOK: query: select explode(array('a', 'b'))
> POSTHOOK: type: QUERY
> POSTHOOK: Input: _dummy_database@_dummy_table
>  A masked pattern was here 
> a
> b
> {code}
> Sometime this:
> {code}
> a
> b
> PREHOOK: query: select explode(array('a', 'b'))
> PREHOOK: type: QUERY
> PREHOOK: Input: _dummy_database@_dummy_table
>  A masked pattern was here 
> POSTHOOK: query: select explode(array('a', 'b'))
> POSTHOOK: type: QUERY
> POSTHOOK: Input: _dummy_database@_dummy_table
>  A masked pattern was here 
> {code}
> Notice, that the actual query result is either before, or after the stuff 
> printed by the hooks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17158) BeeLine Query Log and Query Result print order is not defined

2017-07-24 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098408#comment-16098408
 ] 

Peter Vary commented on HIVE-17158:
---

Ok. I was looking at old code. This was solved by HIVE-15473

> BeeLine Query Log and Query Result print order is not defined
> -
>
> Key: HIVE-17158
> URL: https://issues.apache.org/jira/browse/HIVE-17158
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, Testing Infrastructure
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>
> The output of the BeeLine tests is sometimes flaky, especially if the query 
> is a fast one
> The output is sometime this:
> {code}
> PREHOOK: query: select explode(array('a', 'b'))
> PREHOOK: type: QUERY
> PREHOOK: Input: _dummy_database@_dummy_table
>  A masked pattern was here 
> POSTHOOK: query: select explode(array('a', 'b'))
> POSTHOOK: type: QUERY
> POSTHOOK: Input: _dummy_database@_dummy_table
>  A masked pattern was here 
> a
> b
> {code}
> Sometime this:
> {code}
> a
> b
> PREHOOK: query: select explode(array('a', 'b'))
> PREHOOK: type: QUERY
> PREHOOK: Input: _dummy_database@_dummy_table
>  A masked pattern was here 
> POSTHOOK: query: select explode(array('a', 'b'))
> POSTHOOK: type: QUERY
> POSTHOOK: Input: _dummy_database@_dummy_table
>  A masked pattern was here 
> {code}
> Notice, that the actual query result is either before, or after the stuff 
> printed by the hooks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17158) BeeLine Query Log and Query Result print order is not defined

2017-07-24 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098296#comment-16098296
 ] 

Peter Vary commented on HIVE-17158:
---

The root cause of the issue is that if the LogRunnable is interrupted before 
started then there is no InterruptedException thrown, and the 
{{showRemainingLogsIfAny}} is not called. 

{code}
  private Runnable createLogRunnable(final Statement statement) {
if (statement instanceof HiveStatement) {
  final HiveStatement hiveStatement = (HiveStatement) statement;

  Runnable runnable = new Runnable() {
@Override
public void run() {
  while (hiveStatement.hasMoreLogs() && 
!Thread.currentThread().isInterrupted()) {
try {
  // fetch the log periodically and output to beeline console
  for (String log : hiveStatement.getQueryLog()) {
if (!beeLine.isTestMode()) {
  beeLine.info(log);
} else {
  // In test mode print the logs to the output
  beeLine.output(log);
}
  }
  Thread.sleep(DEFAULT_QUERY_PROGRESS_INTERVAL);
} catch (SQLException e) {
  beeLine.error(new SQLWarning(e));
  return;
} catch (InterruptedException e) {
  beeLine.debug("Getting log thread is interrupted, since query is 
done!");
  showRemainingLogsIfAny(statement);  <-- We expect to 
print the logs here, but if no exception, no logs
  return;
}
  }
}
  };
  return runnable;
} else {
[..]
}
  }
{code}

The log printed when the ResultSet is queried, or in the finally stage.
{code}
  do {
ResultSet rs = stmnt.getResultSet();
try {
  int count = beeLine.print(rs);
  long end = System.currentTimeMillis();

  beeLine.info(
  beeLine.loc("rows-selected", count) + " " + 
beeLine.locElapsedTime(end - start));
} finally {
  if (logThread != null) {
logThread.join(DEFAULT_QUERY_PROGRESS_THREAD_TIMEOUT);
showRemainingLogsIfAny(stmnt);
logThread = null;
  }
  rs.close();
}
  } while (BeeLine.getMoreResults(stmnt));
{code}

{code}
  } finally {
if (logThread != null) {
  if (!logThread.isInterrupted()) {
logThread.interrupt();
  }
  logThread.join(DEFAULT_QUERY_PROGRESS_THREAD_TIMEOUT);
  showRemainingLogsIfAny(stmnt);
}
if (stmnt != null) {
  stmnt.close();
}
  }
{code}

> BeeLine Query Log and Query Result print order is not defined
> -
>
> Key: HIVE-17158
> URL: https://issues.apache.org/jira/browse/HIVE-17158
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, Testing Infrastructure
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>
> The output of the BeeLine tests is sometimes flaky, especially if the query 
> is a fast one
> The output is sometime this:
> {code}
> PREHOOK: query: select explode(array('a', 'b'))
> PREHOOK: type: QUERY
> PREHOOK: Input: _dummy_database@_dummy_table
>  A masked pattern was here 
> POSTHOOK: query: select explode(array('a', 'b'))
> POSTHOOK: type: QUERY
> POSTHOOK: Input: _dummy_database@_dummy_table
>  A masked pattern was here 
> a
> b
> {code}
> Sometime this:
> {code}
> a
> b
> PREHOOK: query: select explode(array('a', 'b'))
> PREHOOK: type: QUERY
> PREHOOK: Input: _dummy_database@_dummy_table
>  A masked pattern was here 
> POSTHOOK: query: select explode(array('a', 'b'))
> POSTHOOK: type: QUERY
> POSTHOOK: Input: _dummy_database@_dummy_table
>  A masked pattern was here 
> {code}
> Notice, that the actual query result is either before, or after the stuff 
> printed by the hooks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >