[jira] [Updated] (HIVE-21304) Show Bucketing version for ReduceSinkOp in explain extended plan

2020-02-22 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-21304:

Attachment: HIVE-21304.16.patch

> Show Bucketing version for ReduceSinkOp in explain extended plan
> 
>
> Key: HIVE-21304
> URL: https://issues.apache.org/jira/browse/HIVE-21304
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-21304.01.patch, HIVE-21304.02.patch, 
> HIVE-21304.03.patch, HIVE-21304.04.patch, HIVE-21304.05.patch, 
> HIVE-21304.06.patch, HIVE-21304.07.patch, HIVE-21304.08.patch, 
> HIVE-21304.09.patch, HIVE-21304.10.patch, HIVE-21304.11.patch, 
> HIVE-21304.12.patch, HIVE-21304.13.patch, HIVE-21304.14.patch, 
> HIVE-21304.15.patch, HIVE-21304.16.patch
>
>
> Show Bucketing version for ReduceSinkOp in explain extended plan.
> This helps identify what hashing algorithm is being used by by ReduceSinkOp.
>  
> cc [~vgarg]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-7025) Support retention on hive tables

2020-02-22 Thread Shawn Guo (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042793#comment-17042793
 ] 

Shawn Guo edited comment on HIVE-7025 at 2/23/20 4:45 AM:
--

[~navis]  any updates on this PR?


was (Author: guoxu1231):
[~navis]  any update on this PR?

> Support retention on hive tables
> 
>
> Key: HIVE-7025
> URL: https://issues.apache.org/jira/browse/HIVE-7025
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis Ryu
>Assignee: Navis Ryu
>Priority: Minor
> Attachments: HIVE-7025.1.patch.txt, HIVE-7025.2.patch.txt, 
> HIVE-7025.3.patch.txt, HIVE-7025.4.patch.txt
>
>
> Add self destruction properties for temporary tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-7025) Support retention on hive tables

2020-02-22 Thread Shawn Guo (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042793#comment-17042793
 ] 

Shawn Guo commented on HIVE-7025:
-

[~navis]  any update on this PR?

> Support retention on hive tables
> 
>
> Key: HIVE-7025
> URL: https://issues.apache.org/jira/browse/HIVE-7025
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis Ryu
>Assignee: Navis Ryu
>Priority: Minor
> Attachments: HIVE-7025.1.patch.txt, HIVE-7025.2.patch.txt, 
> HIVE-7025.3.patch.txt, HIVE-7025.4.patch.txt
>
>
> Add self destruction properties for temporary tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22923) Extract cumulative cost metadata from HiveRelMdDistinctRowCount metadata provider

2020-02-22 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042751#comment-17042751
 ] 

Jesus Camacho Rodriguez commented on HIVE-22923:


[~mgergely], could you take a look? Thanks

> Extract cumulative cost metadata from HiveRelMdDistinctRowCount metadata 
> provider 
> --
>
> Key: HIVE-22923
> URL: https://issues.apache.org/jira/browse/HIVE-22923
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-22923.patch
>
>
> It should not contained there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22922) LLAP: ShuffleHandler may not find shuffle data if pod restarts in k8s

2020-02-22 Thread Prasanth Jayachandran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-22922:
-
Attachment: HIVE-22922.2.patch

> LLAP: ShuffleHandler may not find shuffle data if pod restarts in k8s
> -
>
> Key: HIVE-22922
> URL: https://issues.apache.org/jira/browse/HIVE-22922
> Project: Hive
>  Issue Type: Bug
>Reporter: Nita Dembla
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22922.1.patch, HIVE-22922.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Executor logs shows "Invalid map id: TTP/1.1 500 Internal Server Error". This 
> happens when executor pod restarts with same hostname and port, but missing 
> shuffle data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22922) LLAP: ShuffleHandler may not find shuffle data if pod restarts in k8s

2020-02-22 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042723#comment-17042723
 ] 

Prasanth Jayachandran commented on HIVE-22922:
--

another try

> LLAP: ShuffleHandler may not find shuffle data if pod restarts in k8s
> -
>
> Key: HIVE-22922
> URL: https://issues.apache.org/jira/browse/HIVE-22922
> Project: Hive
>  Issue Type: Bug
>Reporter: Nita Dembla
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22922.1.patch, HIVE-22922.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Executor logs shows "Invalid map id: TTP/1.1 500 Internal Server Error". This 
> happens when executor pod restarts with same hostname and port, but missing 
> shuffle data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22098) Data loss occurs when multiple tables are join with different bucket_version

2020-02-22 Thread JithendhiraKumar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042715#comment-17042715
 ] 

JithendhiraKumar edited comment on HIVE-22098 at 2/22/20 9:46 PM:
--

[~luguangming] has already mentioned the steps to reproduce *Scenario 1.*

Here are Steps To Reproduce *Scenario 2*: 

Input (test_data.csv)
{code:java}
0,Kurt,vulnedca...@yahoo.co.uk
1,Rolland,naej...@gmx.com
2,Cortez,blategarfi...@yahoo.com
3,Tyron,tamepro...@gmail.com
4,Matthew,wellezek...@yahoo.co.uk
5,Jeffrey,fabingeb...@comcast.net
6,Gerard,oughtou...@att.net
7,Hal,coursedma...@hotmail.com
8,Virgil,squintpr...@gmail.com
9,Hector,lewddil...@email.com
{code}
Steps:
{code:java}
CREATE TABLE `join_test_1`(`id` string, `first` string, `email` string) ROW 
FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES ('field.delim'=',', 'serialization.format'=',') STORED AS 
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' TBLPROPERTIES 
('bucketing_version'='1');

LOAD DATA LOCAL INPATH '/uploads/test_data.csv' OVERWRITE INTO TABLE 
join_test_1;

CREATE TABLE `join_test_2`(`id` string, `first` string, `email` string) ROW 
FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES ('field.delim'=',', 'serialization.format'=',') STORED AS 
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' TBLPROPERTIES 
('bucketing_version'='2');

LOAD DATA LOCAL INPATH '/uploads/test_data.csv' OVERWRITE INTO TABLE 
join_test_2;

Query:
set mapred.reduce.tasks=2;
set hive.auto.convert.join=false;
SELECT * from (SELECT id from join_test_1) as tbl1 LEFT JOIN (SELECT id from 
join_test_2) as tbl2 on tbl1.id = tbl2.id;

OutPut: (Wrong Results/Data Loss)
+--+--+
| tbl1.id  | tbl2.id  |
+--+--+
| 0| NULL |
| 2| NULL |
| 4| NULL |
| 6| NULL |
| 8| 8|
| 1| NULL |
| 3| NULL |
| 5| 5|
| 7| NULL |
| 9| NULL |
+--+--+

Expected Result:
+--+--+
| tbl1.id  | tbl2.id  |
+--+--+
| 1| 1|
| 3| 3|
| 7| 7|
| 8| 8|
| 9| 9|
| 0| 0|
| 2| 2|
| 4| 4|
| 5| 5|
| 6| 6|
+--+--+
{code}
.


was (Author: jithendhir92):
[~luguangming] has already mentioned the steps to reproduce *Scenario 1.*

Here are Steps To Reproduce *Scenario 2*: 

Input (test_data.csv)
{code:java}
0,Kurt,vulnedca...@yahoo.co.uk
1,Rolland,naej...@gmx.com
2,Cortez,blategarfi...@yahoo.com
3,Tyron,tamepro...@gmail.com
4,Matthew,wellezek...@yahoo.co.uk
5,Jeffrey,fabingeb...@comcast.net
6,Gerard,oughtou...@att.net
7,Hal,coursedma...@hotmail.com
8,Virgil,squintpr...@gmail.com
9,Hector,lewddil...@email.com
{code}
{code:java}
CREATE TABLE `join_test_1`(`id` string, `first` string, `email` string) ROW 
FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES ('field.delim'=',', 'serialization.format'=',') STORED AS 
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' TBLPROPERTIES 
('bucketing_version'='1');

LOAD DATA LOCAL INPATH '/uploads/test_data.csv' OVERWRITE INTO TABLE 
join_test_1;

CREATE TABLE `join_test_2`(`id` string, `first` string, `email` string) ROW 
FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES ('field.delim'=',', 'serialization.format'=',') STORED AS 
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' TBLPROPERTIES 
('bucketing_version'='2');

LOAD DATA LOCAL INPATH '/uploads/test_data.csv' OVERWRITE INTO TABLE 
join_test_2;

Query:
set mapred.reduce.tasks=2;
set hive.auto.convert.join=false;
SELECT * from (SELECT id from join_test_1) as tbl1 LEFT JOIN (SELECT id from 
join_test_2) as tbl2 on tbl1.id = tbl2.id;

OutPut: (Wrong Results/Data Loss)
+--+--+
| tbl1.id  | tbl2.id  |
+--+--+
| 0| NULL |
| 2| NULL |
| 4| NULL |
| 6| NULL |
| 8| 8|
| 1| NULL |
| 3| NULL |
| 5| 5|
| 7| NULL |
| 9| NULL |
+--+--+

Expected Result:
+--+--+
| tbl1.id  | tbl2.id  |
+--+--+
| 1| 1|
| 3| 3|
| 7| 7|
| 8| 8|
| 9| 9|
| 0| 0|
| 2| 2|
| 4| 4|
| 5  

[jira] [Comment Edited] (HIVE-22098) Data loss occurs when multiple tables are join with different bucket_version

2020-02-22 Thread JithendhiraKumar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042715#comment-17042715
 ] 

JithendhiraKumar edited comment on HIVE-22098 at 2/22/20 9:44 PM:
--

[~luguangming] has already mentioned the steps to reproduce *Scenario 1.*

Here are Steps To Reproduce *Scenario 2*: 

Input (test_data.csv)

 
{code:java}
0,Kurt,vulnedca...@yahoo.co.uk
1,Rolland,naej...@gmx.com
2,Cortez,blategarfi...@yahoo.com
3,Tyron,tamepro...@gmail.com
4,Matthew,wellezek...@yahoo.co.uk
5,Jeffrey,fabingeb...@comcast.net
6,Gerard,oughtou...@att.net
7,Hal,coursedma...@hotmail.com
8,Virgil,squintpr...@gmail.com
9,Hector,lewddil...@email.com
{code}
 

 
{code:java}
CREATE TABLE `join_test_1`(`id` string, `first` string, `email` string) ROW 
FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES ('field.delim'=',', 'serialization.format'=',') STORED AS 
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' TBLPROPERTIES 
('bucketing_version'='1');

LOAD DATA LOCAL INPATH '/uploads/test_data.csv' OVERWRITE INTO TABLE 
join_test_1;

CREATE TABLE `join_test_2`(`id` string, `first` string, `email` string) ROW 
FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES ('field.delim'=',', 'serialization.format'=',') STORED AS 
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' TBLPROPERTIES 
('bucketing_version'='2');

LOAD DATA LOCAL INPATH '/uploads/test_data.csv' OVERWRITE INTO TABLE 
join_test_2;

Query:
set mapred.reduce.tasks=2;
set hive.auto.convert.join=false;
SELECT * from (SELECT id from join_test_1) as tbl1 LEFT JOIN (SELECT id from 
join_test_2) as tbl2 on tbl1.id = tbl2.id;

OutPut: (Wrong Results/Data Loss)
+--+--+
| tbl1.id  | tbl2.id  |
+--+--+
| 0| NULL |
| 2| NULL |
| 4| NULL |
| 6| NULL |
| 8| 8|
| 1| NULL |
| 3| NULL |
| 5| 5|
| 7| NULL |
| 9| NULL |
+--+--+

Expected Result:
+--+--+
| tbl1.id  | tbl2.id  |
+--+--+
| 1| 1|
| 3| 3|
| 7| 7|
| 8| 8|
| 9| 9|
| 0| 0|
| 2| 2|
| 4| 4|
| 5| 5|
| 6| 6|
+--+--+
{code}
.


was (Author: jithendhir92):
[~luguangming] has already mentioned the steps to reproduce *Scenario 1.*

Here are Steps To Reproduce *Scenario 2*: (test_data.csv can be found in the 
attachments)

 
{code:java}
CREATE TABLE `join_test_1`(`id` string, `first` string, `email` string) ROW 
FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES ('field.delim'=',', 'serialization.format'=',') STORED AS 
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' TBLPROPERTIES 
('bucketing_version'='1');

LOAD DATA LOCAL INPATH '/uploads/test_data.csv' OVERWRITE INTO TABLE 
join_test_1;

CREATE TABLE `join_test_2`(`id` string, `first` string, `email` string) ROW 
FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES ('field.delim'=',', 'serialization.format'=',') STORED AS 
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' TBLPROPERTIES 
('bucketing_version'='2');

LOAD DATA LOCAL INPATH '/uploads/test_data.csv' OVERWRITE INTO TABLE 
join_test_2;

Query:
set mapred.reduce.tasks=2;
set hive.auto.convert.join=false;
SELECT * from (SELECT id from join_test_1) as tbl1 LEFT JOIN (SELECT id from 
join_test_2) as tbl2 on tbl1.id = tbl2.id;

OutPut: (Wrong Results/Data Loss)
+--+--+
| tbl1.id  | tbl2.id  |
+--+--+
| 0| NULL |
| 2| NULL |
| 4| NULL |
| 6| NULL |
| 8| 8|
| 1| NULL |
| 3| NULL |
| 5| 5|
| 7| NULL |
| 9| NULL |
+--+--+

Expected Result:
+--+--+
| tbl1.id  | tbl2.id  |
+--+--+
| 1| 1|
| 3| 3|
| 7| 7|
| 8| 8|
| 9| 9|
| 0| 0|
| 2| 2|
| 4| 4|
| 5| 5|
| 6| 6|
+--+--+
{code}
.

> Data loss occurs when multiple tables are join with different bucket_version
> 
>
> Key: HIVE-22098
> URL: https:/

[jira] [Comment Edited] (HIVE-22098) Data loss occurs when multiple tables are join with different bucket_version

2020-02-22 Thread JithendhiraKumar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042715#comment-17042715
 ] 

JithendhiraKumar edited comment on HIVE-22098 at 2/22/20 9:44 PM:
--

[~luguangming] has already mentioned the steps to reproduce *Scenario 1.*

Here are Steps To Reproduce *Scenario 2*: 

Input (test_data.csv)
{code:java}
0,Kurt,vulnedca...@yahoo.co.uk
1,Rolland,naej...@gmx.com
2,Cortez,blategarfi...@yahoo.com
3,Tyron,tamepro...@gmail.com
4,Matthew,wellezek...@yahoo.co.uk
5,Jeffrey,fabingeb...@comcast.net
6,Gerard,oughtou...@att.net
7,Hal,coursedma...@hotmail.com
8,Virgil,squintpr...@gmail.com
9,Hector,lewddil...@email.com
{code}
{code:java}
CREATE TABLE `join_test_1`(`id` string, `first` string, `email` string) ROW 
FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES ('field.delim'=',', 'serialization.format'=',') STORED AS 
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' TBLPROPERTIES 
('bucketing_version'='1');

LOAD DATA LOCAL INPATH '/uploads/test_data.csv' OVERWRITE INTO TABLE 
join_test_1;

CREATE TABLE `join_test_2`(`id` string, `first` string, `email` string) ROW 
FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES ('field.delim'=',', 'serialization.format'=',') STORED AS 
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' TBLPROPERTIES 
('bucketing_version'='2');

LOAD DATA LOCAL INPATH '/uploads/test_data.csv' OVERWRITE INTO TABLE 
join_test_2;

Query:
set mapred.reduce.tasks=2;
set hive.auto.convert.join=false;
SELECT * from (SELECT id from join_test_1) as tbl1 LEFT JOIN (SELECT id from 
join_test_2) as tbl2 on tbl1.id = tbl2.id;

OutPut: (Wrong Results/Data Loss)
+--+--+
| tbl1.id  | tbl2.id  |
+--+--+
| 0| NULL |
| 2| NULL |
| 4| NULL |
| 6| NULL |
| 8| 8|
| 1| NULL |
| 3| NULL |
| 5| 5|
| 7| NULL |
| 9| NULL |
+--+--+

Expected Result:
+--+--+
| tbl1.id  | tbl2.id  |
+--+--+
| 1| 1|
| 3| 3|
| 7| 7|
| 8| 8|
| 9| 9|
| 0| 0|
| 2| 2|
| 4| 4|
| 5| 5|
| 6| 6|
+--+--+
{code}
.


was (Author: jithendhir92):
[~luguangming] has already mentioned the steps to reproduce *Scenario 1.*

Here are Steps To Reproduce *Scenario 2*: 

Input (test_data.csv)

 
{code:java}
0,Kurt,vulnedca...@yahoo.co.uk
1,Rolland,naej...@gmx.com
2,Cortez,blategarfi...@yahoo.com
3,Tyron,tamepro...@gmail.com
4,Matthew,wellezek...@yahoo.co.uk
5,Jeffrey,fabingeb...@comcast.net
6,Gerard,oughtou...@att.net
7,Hal,coursedma...@hotmail.com
8,Virgil,squintpr...@gmail.com
9,Hector,lewddil...@email.com
{code}
 

 
{code:java}
CREATE TABLE `join_test_1`(`id` string, `first` string, `email` string) ROW 
FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES ('field.delim'=',', 'serialization.format'=',') STORED AS 
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' TBLPROPERTIES 
('bucketing_version'='1');

LOAD DATA LOCAL INPATH '/uploads/test_data.csv' OVERWRITE INTO TABLE 
join_test_1;

CREATE TABLE `join_test_2`(`id` string, `first` string, `email` string) ROW 
FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES ('field.delim'=',', 'serialization.format'=',') STORED AS 
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' TBLPROPERTIES 
('bucketing_version'='2');

LOAD DATA LOCAL INPATH '/uploads/test_data.csv' OVERWRITE INTO TABLE 
join_test_2;

Query:
set mapred.reduce.tasks=2;
set hive.auto.convert.join=false;
SELECT * from (SELECT id from join_test_1) as tbl1 LEFT JOIN (SELECT id from 
join_test_2) as tbl2 on tbl1.id = tbl2.id;

OutPut: (Wrong Results/Data Loss)
+--+--+
| tbl1.id  | tbl2.id  |
+--+--+
| 0| NULL |
| 2| NULL |
| 4| NULL |
| 6| NULL |
| 8| 8|
| 1| NULL |
| 3| NULL |
| 5| 5|
| 7| NULL |
| 9| NULL |
+--+--+

Expected Result:
+--+--+
| tbl1.id  | tbl2.id  |
+--+--+
| 1| 1|
| 3| 3|
| 7| 7|
| 8| 8|
| 9| 9|
| 0| 0|
| 2| 2|
| 4| 4|
| 5 

[jira] [Issue Comment Deleted] (HIVE-22098) Data loss occurs when multiple tables are join with different bucket_version

2020-02-22 Thread JithendhiraKumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JithendhiraKumar updated HIVE-22098:

Comment: was deleted

(was: Attaching test_data.csv.)

> Data loss occurs when multiple tables are join with different bucket_version
> 
>
> Key: HIVE-22098
> URL: https://issues.apache.org/jira/browse/HIVE-22098
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 3.1.0, 3.1.2
>Reporter: LuGuangMing
>Assignee: LuGuangMing
>Priority: Blocker
>  Labels: data-loss, wrongresults
> Attachments: HIVE-22098.1.patch, image-2019-08-12-18-45-15-771.png, 
> join_test.sql, table_a_data.orc, table_b_data.orc, table_c_data.orc
>
>
> When different bucketVersion of tables do join and no of reducers is greater 
> than 2, the result is incorrect (*data loss*).
>  *Scenario 1*: Three tables join. The temporary result data of table_a in the 
> first table and table_b in the second table joins result is recorded as 
> tmp_a_b, When it joins with the third table, the bucket_version=2 of the 
> table created by default after hive-3.0.0, temporary data tmp_a_b initialized 
> the bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In 
> the init method, the hash algorithm of selecting join column is selected 
> according to bucketVersion. If bucketVersion = 2 and is not an acid 
> operation, it will acquired the new algorithm of hash. Otherwise, the old 
> algorithm of hash is acquired. Because of the inconsistency of the algorithm 
> of hash, the partition of data allocation caused are different. At stage of 
> Reducer, Data with the same key can not be paired resulting in data loss.
> *Scenario 2*: create two test tables, create table 
> table_bucketversion_1(col_1 string, col_2 string) TBLPROPERTIES 
> ('bucketing_version'='1'); table_bucketversion_2(col_1 string, col_2 string) 
> TBLPROPERTIES ('bucketing_version'='2');
>  when use table_bucketversion_1 to join table_bucketversion_2, partial result 
> data will be loss due to bucketVerison is different.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22098) Data loss occurs when multiple tables are join with different bucket_version

2020-02-22 Thread JithendhiraKumar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042719#comment-17042719
 ] 

JithendhiraKumar commented on HIVE-22098:
-

Attaching test_data.csv.

> Data loss occurs when multiple tables are join with different bucket_version
> 
>
> Key: HIVE-22098
> URL: https://issues.apache.org/jira/browse/HIVE-22098
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 3.1.0, 3.1.2
>Reporter: LuGuangMing
>Assignee: LuGuangMing
>Priority: Blocker
>  Labels: data-loss, wrongresults
> Attachments: HIVE-22098.1.patch, image-2019-08-12-18-45-15-771.png, 
> join_test.sql, table_a_data.orc, table_b_data.orc, table_c_data.orc
>
>
> When different bucketVersion of tables do join and no of reducers is greater 
> than 2, the result is incorrect (*data loss*).
>  *Scenario 1*: Three tables join. The temporary result data of table_a in the 
> first table and table_b in the second table joins result is recorded as 
> tmp_a_b, When it joins with the third table, the bucket_version=2 of the 
> table created by default after hive-3.0.0, temporary data tmp_a_b initialized 
> the bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In 
> the init method, the hash algorithm of selecting join column is selected 
> according to bucketVersion. If bucketVersion = 2 and is not an acid 
> operation, it will acquired the new algorithm of hash. Otherwise, the old 
> algorithm of hash is acquired. Because of the inconsistency of the algorithm 
> of hash, the partition of data allocation caused are different. At stage of 
> Reducer, Data with the same key can not be paired resulting in data loss.
> *Scenario 2*: create two test tables, create table 
> table_bucketversion_1(col_1 string, col_2 string) TBLPROPERTIES 
> ('bucketing_version'='1'); table_bucketversion_2(col_1 string, col_2 string) 
> TBLPROPERTIES ('bucketing_version'='2');
>  when use table_bucketversion_1 to join table_bucketversion_2, partial result 
> data will be loss due to bucketVerison is different.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22098) Data loss occurs when multiple tables are join with different bucket_version

2020-02-22 Thread JithendhiraKumar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042715#comment-17042715
 ] 

JithendhiraKumar commented on HIVE-22098:
-

[~luguangming] has already mentioned the steps to reproduce *Scenario 1.*

Here are Steps To Reproduce *Scenario 2*: (test_data.csv can be found in the 
attachments)

 
{code:java}
CREATE TABLE `join_test_1`(`id` string, `first` string, `email` string) ROW 
FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES ('field.delim'=',', 'serialization.format'=',') STORED AS 
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' TBLPROPERTIES 
('bucketing_version'='1');

LOAD DATA LOCAL INPATH '/uploads/test_data.csv' OVERWRITE INTO TABLE 
join_test_1;

CREATE TABLE `join_test_2`(`id` string, `first` string, `email` string) ROW 
FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES ('field.delim'=',', 'serialization.format'=',') STORED AS 
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' TBLPROPERTIES 
('bucketing_version'='2');

LOAD DATA LOCAL INPATH '/uploads/test_data.csv' OVERWRITE INTO TABLE 
join_test_2;

Query:
set mapred.reduce.tasks=2;
set hive.auto.convert.join=false;
SELECT * from (SELECT id from join_test_1) as tbl1 LEFT JOIN (SELECT id from 
join_test_2) as tbl2 on tbl1.id = tbl2.id;

OutPut: (Wrong Results/Data Loss)
+--+--+
| tbl1.id  | tbl2.id  |
+--+--+
| 0| NULL |
| 2| NULL |
| 4| NULL |
| 6| NULL |
| 8| 8|
| 1| NULL |
| 3| NULL |
| 5| 5|
| 7| NULL |
| 9| NULL |
+--+--+

Expected Result:
+--+--+
| tbl1.id  | tbl2.id  |
+--+--+
| 1| 1|
| 3| 3|
| 7| 7|
| 8| 8|
| 9| 9|
| 0| 0|
| 2| 2|
| 4| 4|
| 5| 5|
| 6| 6|
+--+--+
{code}
.

> Data loss occurs when multiple tables are join with different bucket_version
> 
>
> Key: HIVE-22098
> URL: https://issues.apache.org/jira/browse/HIVE-22098
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 3.1.0, 3.1.2
>Reporter: LuGuangMing
>Assignee: LuGuangMing
>Priority: Blocker
>  Labels: data-loss, wrongresults
> Attachments: HIVE-22098.1.patch, image-2019-08-12-18-45-15-771.png, 
> join_test.sql, table_a_data.orc, table_b_data.orc, table_c_data.orc
>
>
> When different bucketVersion of tables do join and no of reducers is greater 
> than 2, the result is incorrect (*data loss*).
>  *Scenario 1*: Three tables join. The temporary result data of table_a in the 
> first table and table_b in the second table joins result is recorded as 
> tmp_a_b, When it joins with the third table, the bucket_version=2 of the 
> table created by default after hive-3.0.0, temporary data tmp_a_b initialized 
> the bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In 
> the init method, the hash algorithm of selecting join column is selected 
> according to bucketVersion. If bucketVersion = 2 and is not an acid 
> operation, it will acquired the new algorithm of hash. Otherwise, the old 
> algorithm of hash is acquired. Because of the inconsistency of the algorithm 
> of hash, the partition of data allocation caused are different. At stage of 
> Reducer, Data with the same key can not be paired resulting in data loss.
> *Scenario 2*: create two test tables, create table 
> table_bucketversion_1(col_1 string, col_2 string) TBLPROPERTIES 
> ('bucketing_version'='1'); table_bucketversion_2(col_1 string, col_2 string) 
> TBLPROPERTIES ('bucketing_version'='2');
>  when use table_bucketversion_1 to join table_bucketversion_2, partial result 
> data will be loss due to bucketVerison is different.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22098) Data loss occurs when multiple tables are join with different bucket_version

2020-02-22 Thread JithendhiraKumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JithendhiraKumar updated HIVE-22098:

Description: 
When different bucketVersion of tables do join and no of reducers is greater 
than 2, the result is incorrect (*data loss*).
 *Scenario 1*: Three tables join. The temporary result data of table_a in the 
first table and table_b in the second table joins result is recorded as 
tmp_a_b, When it joins with the third table, the bucket_version=2 of the table 
created by default after hive-3.0.0, temporary data tmp_a_b initialized the 
bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In the 
init method, the hash algorithm of selecting join column is selected according 
to bucketVersion. If bucketVersion = 2 and is not an acid operation, it will 
acquired the new algorithm of hash. Otherwise, the old algorithm of hash is 
acquired. Because of the inconsistency of the algorithm of hash, the partition 
of data allocation caused are different. At stage of Reducer, Data with the 
same key can not be paired resulting in data loss.

*Scenario 2*: create two test tables, create table table_bucketversion_1(col_1 
string, col_2 string) TBLPROPERTIES ('bucketing_version'='1'); 
table_bucketversion_2(col_1 string, col_2 string) TBLPROPERTIES 
('bucketing_version'='2');
 when use table_bucketversion_1 to join table_bucketversion_2, partial result 
data will be loss due to bucketVerison is different.

 

  was:
When different bucketVersion of tables do join and no of reducers number 
greater than 2, the result is incorrect (*data loss*).
 *Scenario 1*: Three tables join. The temporary result data of table_a in the 
first table and table_b in the second table joins result is recorded as 
tmp_a_b, When it joins with the third table, the bucket_version=2 of the table 
created by default after hive-3.0.0, temporary data tmp_a_b initialized the 
bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In the 
init method, the hash algorithm of selecting join column is selected according 
to bucketVersion. If bucketVersion = 2 and is not an acid operation, it will 
acquired the new algorithm of hash. Otherwise, the old algorithm of hash is 
acquired. Because of the inconsistency of the algorithm of hash, the partition 
of data allocation caused are different. At stage of Reducer, Data with the 
same key can not be paired resulting in data loss.

*Scenario 2*: create two test tables, create table table_bucketversion_1(col_1 
string, col_2 string) TBLPROPERTIES ('bucketing_version'='1'); 
table_bucketversion_2(col_1 string, col_2 string) TBLPROPERTIES 
('bucketing_version'='2');
 when use table_bucketversion_1 to join table_bucketversion_2, partial result 
data will be loss due to bucketVerison is different.

 


> Data loss occurs when multiple tables are join with different bucket_version
> 
>
> Key: HIVE-22098
> URL: https://issues.apache.org/jira/browse/HIVE-22098
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 3.1.0, 3.1.2
>Reporter: LuGuangMing
>Assignee: LuGuangMing
>Priority: Blocker
>  Labels: data-loss, wrongresults
> Attachments: HIVE-22098.1.patch, image-2019-08-12-18-45-15-771.png, 
> join_test.sql, table_a_data.orc, table_b_data.orc, table_c_data.orc
>
>
> When different bucketVersion of tables do join and no of reducers is greater 
> than 2, the result is incorrect (*data loss*).
>  *Scenario 1*: Three tables join. The temporary result data of table_a in the 
> first table and table_b in the second table joins result is recorded as 
> tmp_a_b, When it joins with the third table, the bucket_version=2 of the 
> table created by default after hive-3.0.0, temporary data tmp_a_b initialized 
> the bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In 
> the init method, the hash algorithm of selecting join column is selected 
> according to bucketVersion. If bucketVersion = 2 and is not an acid 
> operation, it will acquired the new algorithm of hash. Otherwise, the old 
> algorithm of hash is acquired. Because of the inconsistency of the algorithm 
> of hash, the partition of data allocation caused are different. At stage of 
> Reducer, Data with the same key can not be paired resulting in data loss.
> *Scenario 2*: create two test tables, create table 
> table_bucketversion_1(col_1 string, col_2 string) TBLPROPERTIES 
> ('bucketing_version'='1'); table_bucketversion_2(col_1 string, col_2 string) 
> TBLPROPERTIES ('bucketing_version'='2');
>  when use table_bucketversion_1 to join table_bucketversion_2, partial result 
> data will be loss due to bucketVerison is different.
>  



--
This message was sent by Atlassian Jira
(v8.3

[jira] [Commented] (HIVE-22376) Cancelled query still prints exception if it was stuck in waiting for lock

2020-02-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042678#comment-17042678
 ] 

Hive QA commented on HIVE-22376:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12994214/HIVE-22376.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 18056 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20787/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20787/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20787/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12994214 - PreCommit-HIVE-Build

> Cancelled query still prints exception if it was stuck in waiting for lock
> --
>
> Key: HIVE-22376
> URL: https://issues.apache.org/jira/browse/HIVE-22376
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking
>Affects Versions: 3.1.2
>Reporter: Peter Vary
>Assignee: Aron Hamvas
>Priority: Major
> Attachments: HIVE-22376.patch
>
>
> The query waits for locks, then cancelled.
> It prints this to the logs, which is unnecessary and missleading:
> {code}
> apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:326)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:344)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: NoSuchLockException(message:No such lock lockid:272)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result.read(ThriftHiveMetastore.java)
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_check_lock(ThriftHiveMetastore.java:5730)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.check_lock(ThriftHiveMetastore.java:5717)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.checkLock(HiveMetaStoreClient.java:3128)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:115)
>   ... 25 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22098) Data loss occurs when multiple tables are join with different bucket_version

2020-02-22 Thread JithendhiraKumar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040749#comment-17040749
 ] 

JithendhiraKumar edited comment on HIVE-22098 at 2/22/20 6:43 PM:
--

Hi, we are stuck with similar issue(scenario 2), after upgrading from 2.3 to 
3.1.2. The tables which were created in Hive 2.3 has no bucketing version in 
its table properties, but when new tables are created in Hive 3.1.2 they are 
created with bucketing_version 2. When we do a left join between these old and 
new tables, in a few cases the results are wrong.

[~djaiswal] [~jdere]  Can you guys please review this patch?


was (Author: jithendhir92):
Hi, we are stuck with similar issue, after upgrading from 2.3 to 3.1.2. The 
tables which were created in Hive 2.3 has no bucketing version in its table 
properties, but when new tables are created in Hive 3.1.2 they are created with 
bucketing_version 2. When we do a left join between these old and new tables, 
in a few cases the results are wrong.

[~djaiswal] [~jdere]  Can you guys please review this patch?

> Data loss occurs when multiple tables are join with different bucket_version
> 
>
> Key: HIVE-22098
> URL: https://issues.apache.org/jira/browse/HIVE-22098
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 3.1.0, 3.1.2
>Reporter: LuGuangMing
>Assignee: LuGuangMing
>Priority: Blocker
>  Labels: data-loss, wrongresults
> Attachments: HIVE-22098.1.patch, image-2019-08-12-18-45-15-771.png, 
> join_test.sql, table_a_data.orc, table_b_data.orc, table_c_data.orc
>
>
> When different bucketVersion of tables do join and no of reducers number 
> greater than 2, the result is incorrect (*data loss*).
>  *Scenario 1*: Three tables join. The temporary result data of table_a in the 
> first table and table_b in the second table joins result is recorded as 
> tmp_a_b, When it joins with the third table, the bucket_version=2 of the 
> table created by default after hive-3.0.0, temporary data tmp_a_b initialized 
> the bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In 
> the init method, the hash algorithm of selecting join column is selected 
> according to bucketVersion. If bucketVersion = 2 and is not an acid 
> operation, it will acquired the new algorithm of hash. Otherwise, the old 
> algorithm of hash is acquired. Because of the inconsistency of the algorithm 
> of hash, the partition of data allocation caused are different. At stage of 
> Reducer, Data with the same key can not be paired resulting in data loss.
> *Scenario 2*: create two test tables, create table 
> table_bucketversion_1(col_1 string, col_2 string) TBLPROPERTIES 
> ('bucketing_version'='1'); table_bucketversion_2(col_1 string, col_2 string) 
> TBLPROPERTIES ('bucketing_version'='2');
>  when use table_bucketversion_1 to join table_bucketversion_2, partial result 
> data will be loss due to bucketVerison is different.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22098) Data loss occurs when multiple tables are join with different bucket_version

2020-02-22 Thread JithendhiraKumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JithendhiraKumar updated HIVE-22098:

Affects Version/s: 3.1.2

> Data loss occurs when multiple tables are join with different bucket_version
> 
>
> Key: HIVE-22098
> URL: https://issues.apache.org/jira/browse/HIVE-22098
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 3.1.0, 3.1.2
>Reporter: LuGuangMing
>Assignee: LuGuangMing
>Priority: Blocker
>  Labels: data-loss, wrongresults
> Attachments: HIVE-22098.1.patch, image-2019-08-12-18-45-15-771.png, 
> join_test.sql, table_a_data.orc, table_b_data.orc, table_c_data.orc
>
>
> When different bucketVersion of tables do join and no of reducers number 
> greater than 2, the result is incorrect (*data loss*).
>  *Scenario 1*: Three tables join. The temporary result data of table_a in the 
> first table and table_b in the second table joins result is recorded as 
> tmp_a_b, When it joins with the third table, the bucket_version=2 of the 
> table created by default after hive-3.0.0, temporary data tmp_a_b initialized 
> the bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In 
> the init method, the hash algorithm of selecting join column is selected 
> according to bucketVersion. If bucketVersion = 2 and is not an acid 
> operation, it will acquired the new algorithm of hash. Otherwise, the old 
> algorithm of hash is acquired. Because of the inconsistency of the algorithm 
> of hash, the partition of data allocation caused are different. At stage of 
> Reducer, Data with the same key can not be paired resulting in data loss.
> *Scenario 2*: create two test tables, create table 
> table_bucketversion_1(col_1 string, col_2 string) TBLPROPERTIES 
> ('bucketing_version'='1'); table_bucketversion_2(col_1 string, col_2 string) 
> TBLPROPERTIES ('bucketing_version'='2');
>  when use table_bucketversion_1 to join table_bucketversion_2, partial result 
> data will be loss due to bucketVerison is different.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22098) Data loss occurs when multiple tables are join with different bucket_version

2020-02-22 Thread JithendhiraKumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JithendhiraKumar updated HIVE-22098:

Labels: data-loss wrongresults  (was: )

> Data loss occurs when multiple tables are join with different bucket_version
> 
>
> Key: HIVE-22098
> URL: https://issues.apache.org/jira/browse/HIVE-22098
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 3.1.0
>Reporter: LuGuangMing
>Assignee: LuGuangMing
>Priority: Blocker
>  Labels: data-loss, wrongresults
> Attachments: HIVE-22098.1.patch, image-2019-08-12-18-45-15-771.png, 
> join_test.sql, table_a_data.orc, table_b_data.orc, table_c_data.orc
>
>
> When different bucketVersion of tables do join and no of reducers number 
> greater than 2, the result is incorrect (*data loss*).
>  *Scenario 1*: Three tables join. The temporary result data of table_a in the 
> first table and table_b in the second table joins result is recorded as 
> tmp_a_b, When it joins with the third table, the bucket_version=2 of the 
> table created by default after hive-3.0.0, temporary data tmp_a_b initialized 
> the bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In 
> the init method, the hash algorithm of selecting join column is selected 
> according to bucketVersion. If bucketVersion = 2 and is not an acid 
> operation, it will acquired the new algorithm of hash. Otherwise, the old 
> algorithm of hash is acquired. Because of the inconsistency of the algorithm 
> of hash, the partition of data allocation caused are different. At stage of 
> Reducer, Data with the same key can not be paired resulting in data loss.
> *Scenario 2*: create two test tables, create table 
> table_bucketversion_1(col_1 string, col_2 string) TBLPROPERTIES 
> ('bucketing_version'='1'); table_bucketversion_2(col_1 string, col_2 string) 
> TBLPROPERTIES ('bucketing_version'='2');
>  when use table_bucketversion_1 to join table_bucketversion_2, partial result 
> data will be loss due to bucketVerison is different.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22098) Data loss occurs when multiple tables are join with different bucket_version

2020-02-22 Thread JithendhiraKumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JithendhiraKumar updated HIVE-22098:

Description: 
When different bucketVersion of tables do join and no of reducers number 
greater than 2, the result is incorrect (*data loss*).
 *Scenario 1*: Three tables join. The temporary result data of table_a in the 
first table and table_b in the second table joins result is recorded as 
tmp_a_b, When it joins with the third table, the bucket_version=2 of the table 
created by default after hive-3.0.0, temporary data tmp_a_b initialized the 
bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In the 
init method, the hash algorithm of selecting join column is selected according 
to bucketVersion. If bucketVersion = 2 and is not an acid operation, it will 
acquired the new algorithm of hash. Otherwise, the old algorithm of hash is 
acquired. Because of the inconsistency of the algorithm of hash, the partition 
of data allocation caused are different. At stage of Reducer, Data with the 
same key can not be paired resulting in data loss.

*Scenario 2*: create two test tables, create table table_bucketversion_1(col_1 
string, col_2 string) TBLPROPERTIES ('bucketing_version'='1'); 
table_bucketversion_2(col_1 string, col_2 string) TBLPROPERTIES 
('bucketing_version'='2');
 when use table_bucketversion_1 to join table_bucketversion_2, partial result 
data will be loss due to bucketVerison is different.

 

  was:
When different bucketVersion of tables do join and  reducers number greater 
than 2, result is easy to lose data.
*Scenario 1*: Three tables join. The temporary result data of table_a in the 
first table and table_b in the second table joins result is recorded as 
tmp_a_b, When it joins with the third table, the bucket_version=2 of the table 
created by default after hive-3.0.0, temporary data tmp_a_b initialized the 
bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In the 
init method, the hash algorithm of selecting join column is selected according 
to bucketVersion. If bucketVersion = 2 and is not an acid operation, it will 
acquired the new algorithm of hash. Otherwise, the old algorithm of hash is 
acquired. Because of the inconsistency of the algorithm of hash, the partition 
of data allocation caused are different. At stage of Reducer, Data with the 
same key can not be paired resulting in data loss.

*Scenario 2*: create two test tables, create table table_bucketversion_1(col_1 
string, col_2 string) TBLPROPERTIES ('bucketing_version'='1'); 
table_bucketversion_2(col_1 string, col_2 string) TBLPROPERTIES 
('bucketing_version'='2');
when use table_bucketversion_1 to join table_bucketversion_2, partial result 
data will be loss due to bucketVerison is different.

 


> Data loss occurs when multiple tables are join with different bucket_version
> 
>
> Key: HIVE-22098
> URL: https://issues.apache.org/jira/browse/HIVE-22098
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 3.1.0
>Reporter: LuGuangMing
>Assignee: LuGuangMing
>Priority: Blocker
> Attachments: HIVE-22098.1.patch, image-2019-08-12-18-45-15-771.png, 
> join_test.sql, table_a_data.orc, table_b_data.orc, table_c_data.orc
>
>
> When different bucketVersion of tables do join and no of reducers number 
> greater than 2, the result is incorrect (*data loss*).
>  *Scenario 1*: Three tables join. The temporary result data of table_a in the 
> first table and table_b in the second table joins result is recorded as 
> tmp_a_b, When it joins with the third table, the bucket_version=2 of the 
> table created by default after hive-3.0.0, temporary data tmp_a_b initialized 
> the bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In 
> the init method, the hash algorithm of selecting join column is selected 
> according to bucketVersion. If bucketVersion = 2 and is not an acid 
> operation, it will acquired the new algorithm of hash. Otherwise, the old 
> algorithm of hash is acquired. Because of the inconsistency of the algorithm 
> of hash, the partition of data allocation caused are different. At stage of 
> Reducer, Data with the same key can not be paired resulting in data loss.
> *Scenario 2*: create two test tables, create table 
> table_bucketversion_1(col_1 string, col_2 string) TBLPROPERTIES 
> ('bucketing_version'='1'); table_bucketversion_2(col_1 string, col_2 string) 
> TBLPROPERTIES ('bucketing_version'='2');
>  when use table_bucketversion_1 to join table_bucketversion_2, partial result 
> data will be loss due to bucketVerison is different.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22376) Cancelled query still prints exception if it was stuck in waiting for lock

2020-02-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042656#comment-17042656
 ] 

Hive QA commented on HIVE-22376:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
55s{color} | {color:blue} ql in master has 1530 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
15s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-20787/dev-support/hive-personality.sh
 |
| git revision | master / 6c3ee53 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20787/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20787/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Cancelled query still prints exception if it was stuck in waiting for lock
> --
>
> Key: HIVE-22376
> URL: https://issues.apache.org/jira/browse/HIVE-22376
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking
>Affects Versions: 3.1.2
>Reporter: Peter Vary
>Assignee: Aron Hamvas
>Priority: Major
> Attachments: HIVE-22376.patch
>
>
> The query waits for locks, then cancelled.
> It prints this to the logs, which is unnecessary and missleading:
> {code}
> apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:326)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:344)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread

[jira] [Commented] (HIVE-22893) Enhance data size estimation for fields computed by UDFs

2020-02-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042647#comment-17042647
 ] 

Hive QA commented on HIVE-22893:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12994213/HIVE-22893.11.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 18056 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[unionall_unbalancedppd] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_case_when_1] 
(batchId=98)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_case_when_1]
 (batchId=191)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[dynamic_rdd_cache]
 (batchId=200)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby3_map_skew] 
(batchId=144)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby7_map] 
(batchId=119)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby7_map_skew] 
(batchId=137)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby8_map_skew] 
(batchId=140)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby9] 
(batchId=120)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_position] 
(batchId=135)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[cbo_query23]
 (batchId=305)
org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query23]
 (batchId=305)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20786/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20786/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20786/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12994213 - PreCommit-HIVE-Build

> Enhance data size estimation for fields computed by UDFs
> 
>
> Key: HIVE-22893
> URL: https://issues.apache.org/jira/browse/HIVE-22893
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22893.01.patch, HIVE-22893.02.patch, 
> HIVE-22893.03.patch, HIVE-22893.04.patch, HIVE-22893.05.patch, 
> HIVE-22893.06.patch, HIVE-22893.07.patch, HIVE-22893.08.patch, 
> HIVE-22893.09.patch, HIVE-22893.10.patch, HIVE-22893.11.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Right now if we have columnstat on a column ; we use that to estimate things 
> about the column; - however if an UDF is executed on a column ; the resulting 
> column is treated as unknown thing and defaults are assumed.
> An improvement could be to give wide estimation(s) in case of frequently used 
> udf.
> For example; consider {{substr(c,1,1)}} ; no matter what the input; the 
> output is at most a 1 long string



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22893) Enhance data size estimation for fields computed by UDFs

2020-02-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042619#comment-17042619
 ] 

Hive QA commented on HIVE-22893:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 7s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
35s{color} | {color:blue} common in master has 63 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
49s{color} | {color:blue} ql in master has 1530 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
27s{color} | {color:blue} contrib in master has 11 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
42s{color} | {color:red} ql: The patch generated 6 new + 127 unchanged - 0 
fixed = 133 total (was 127) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m  
0s{color} | {color:red} ql generated 1 new + 1530 unchanged - 0 fixed = 1531 
total (was 1530) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 31m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Dead store to start in 
org.apache.hadoop.hive.ql.udf.UDFSubstr$SubStrStatEstimator.estimate(List)  At 
UDFSubstr.java:org.apache.hadoop.hive.ql.udf.UDFSubstr$SubStrStatEstimator.estimate(List)
  At UDFSubstr.java:[line 157] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-20786/dev-support/hive-personality.sh
 |
| git revision | master / 6c3ee53 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20786/yetus/diff-checkstyle-ql.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20786/yetus/new-findbugs-ql.html
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20786/yetus/patch-asflicense-problems.txt
 |
| modules | C: common ql contrib U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20786/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Enhance data size estimation for fields computed by UDFs
> 
>
> Key: HIVE-22893
> URL: https://issues.apache.org/jira/browse/HIVE-22893
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan H

[jira] [Commented] (HIVE-21304) Show Bucketing version for ReduceSinkOp in explain extended plan

2020-02-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042610#comment-17042610
 ] 

Hive QA commented on HIVE-21304:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12994211/HIVE-21304.15.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 18056 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket4] 
(batchId=188)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[disable_merge_for_bucketing]
 (batchId=188)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20785/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20785/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20785/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12994211 - PreCommit-HIVE-Build

> Show Bucketing version for ReduceSinkOp in explain extended plan
> 
>
> Key: HIVE-21304
> URL: https://issues.apache.org/jira/browse/HIVE-21304
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-21304.01.patch, HIVE-21304.02.patch, 
> HIVE-21304.03.patch, HIVE-21304.04.patch, HIVE-21304.05.patch, 
> HIVE-21304.06.patch, HIVE-21304.07.patch, HIVE-21304.08.patch, 
> HIVE-21304.09.patch, HIVE-21304.10.patch, HIVE-21304.11.patch, 
> HIVE-21304.12.patch, HIVE-21304.13.patch, HIVE-21304.14.patch, 
> HIVE-21304.15.patch
>
>
> Show Bucketing version for ReduceSinkOp in explain extended plan.
> This helps identify what hashing algorithm is being used by by ReduceSinkOp.
>  
> cc [~vgarg]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22891) Skip PartitonDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode

2020-02-22 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042609#comment-17042609
 ] 

Syed Shameerur Rahman commented on HIVE-22891:
--

[~szita] Gentle reminder...

> Skip PartitonDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode
> -
>
> Key: HIVE-22891
> URL: https://issues.apache.org/jira/browse/HIVE-22891
> Project: Hive
>  Issue Type: Task
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22891.01.patch, HIVE-22891.02.patch, 
> HIVE-22891.03.patch
>
>
> {code:java}
> try {
>   // TODO: refactor this out
>   if (pathToPartInfo == null) {
> MapWork mrwork;
> if (HiveConf.getVar(conf, 
> HiveConf.ConfVars.HIVE_EXECUTION_ENGINE).equals("tez")) {
>   mrwork = (MapWork) Utilities.getMergeWork(jobConf);
>   if (mrwork == null) {
> mrwork = Utilities.getMapWork(jobConf);
>   }
> } else {
>   mrwork = Utilities.getMapWork(jobConf);
> }
> pathToPartInfo = mrwork.getPathToPartitionInfo();
>   }  PartitionDesc part = extractSinglePartSpec(hsplit);
>   inputFormat = HiveInputFormat.wrapForLlap(inputFormat, jobConf, part);
> } catch (HiveException e) {
>   throw new IOException(e);
> }
> {code}
> The above piece of code in CombineHiveRecordReader.java was introduced in 
> HIVE-15147. This overwrites inputFormat based on the PartitionDesc which is 
> not required in non-LLAP mode of execution as the method 
> HiveInputFormat.wrapForLlap() simply returns the previously defined 
> inputFormat in case of non-LLAP mode. The method call extractSinglePartSpec() 
> has some serious performance implications. If there are large no. of small 
> files, each call in the method extractSinglePartSpec() takes approx ~ (2 - 3) 
> seconds. Hence the same query which runs in Hive 1.x / Hive 2 is way faster 
> than the query run on latest hive.
> {code:java}
> 2020-02-11 07:15:04,701 INFO [main] 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl: Reading ORC rows from 
> 2020-02-11 07:15:06,468 WARN [main] 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: Multiple partitions 
> found; not going to pass a part spec to LLAP IO: {{logdate=2020-02-03, 
> hour=01, event=win}} and {{logdate=2020-02-03, hour=02, event=act}}
> 2020-02-11 07:15:06,468 INFO [main] 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: succeeded in getting 
> org.apache.hadoop.mapred.FileSplit{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22900) Predicate Push Down Of Like Filter While Fetching Partition Data From MetaStore

2020-02-22 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042608#comment-17042608
 ] 

Syed Shameerur Rahman commented on HIVE-22900:
--

[~jcamachorodriguez] Can you please review?

> Predicate Push Down Of Like Filter While Fetching Partition Data From 
> MetaStore
> ---
>
> Key: HIVE-22900
> URL: https://issues.apache.org/jira/browse/HIVE-22900
> Project: Hive
>  Issue Type: New Feature
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-22900.01.patch, HIVE-22900.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently PPD is disabled for like filter while fetching partition data from 
> metastore. The following patch covers all the test cases mentioned in 
> HIVE-5134



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21304) Show Bucketing version for ReduceSinkOp in explain extended plan

2020-02-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042594#comment-17042594
 ] 

Hive QA commented on HIVE-21304:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
54s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
56s{color} | {color:blue} ql in master has 1530 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
54s{color} | {color:red} ql: The patch generated 2 new + 891 unchanged - 6 
fixed = 893 total (was 897) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-20785/dev-support/hive-personality.sh
 |
| git revision | master / 6c3ee53 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20785/yetus/diff-checkstyle-ql.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20785/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql itests/hive-blobstore U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20785/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Show Bucketing version for ReduceSinkOp in explain extended plan
> 
>
> Key: HIVE-21304
> URL: https://issues.apache.org/jira/browse/HIVE-21304
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-21304.01.patch, HIVE-21304.02.patch, 
> HIVE-21304.03.patch, HIVE-21304.04.patch, HIVE-21304.05.patch, 
> HIVE-21304.06.patch, HIVE-21304.07.patch, HIVE-21304.08.patch, 
> HIVE-21304.09.patch, HIVE-21304.10.patch, HIVE-21304.11.patch, 
> HIVE-21304.12.patch, HIVE-21304.13.patch, HIVE-21304.14.patch, 
> HIVE-21304.15.patch
>
>
> Show Bucketing version for ReduceSinkOp in explain extended plan.
> This helps identify what hashing algorithm is being used by by ReduceSinkOp.
>  
> cc [~vgarg]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22376) Cancelled query still prints exception if it was stuck in waiting for lock

2020-02-22 Thread Aron Hamvas (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aron Hamvas updated HIVE-22376:
---
Attachment: HIVE-22376.patch
Status: Patch Available  (was: Open)

> Cancelled query still prints exception if it was stuck in waiting for lock
> --
>
> Key: HIVE-22376
> URL: https://issues.apache.org/jira/browse/HIVE-22376
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking
>Affects Versions: 3.1.2
>Reporter: Peter Vary
>Assignee: Aron Hamvas
>Priority: Major
> Attachments: HIVE-22376.patch
>
>
> The query waits for locks, then cancelled.
> It prints this to the logs, which is unnecessary and missleading:
> {code}
> apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:326)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:344)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: NoSuchLockException(message:No such lock lockid:272)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result.read(ThriftHiveMetastore.java)
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_check_lock(ThriftHiveMetastore.java:5730)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.check_lock(ThriftHiveMetastore.java:5717)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.checkLock(HiveMetaStoreClient.java:3128)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:115)
>   ... 25 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22376) Cancelled query still prints exception if it was stuck in waiting for lock

2020-02-22 Thread Aron Hamvas (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aron Hamvas updated HIVE-22376:
---
Attachment: (was: HIVE-22376.patch)

> Cancelled query still prints exception if it was stuck in waiting for lock
> --
>
> Key: HIVE-22376
> URL: https://issues.apache.org/jira/browse/HIVE-22376
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking
>Affects Versions: 3.1.2
>Reporter: Peter Vary
>Assignee: Aron Hamvas
>Priority: Major
> Attachments: HIVE-22376.patch
>
>
> The query waits for locks, then cancelled.
> It prints this to the logs, which is unnecessary and missleading:
> {code}
> apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:326)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:344)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: NoSuchLockException(message:No such lock lockid:272)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result.read(ThriftHiveMetastore.java)
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_check_lock(ThriftHiveMetastore.java:5730)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.check_lock(ThriftHiveMetastore.java:5717)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.checkLock(HiveMetaStoreClient.java:3128)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:115)
>   ... 25 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22893) Enhance data size estimation for fields computed by UDFs

2020-02-22 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-22893:

Attachment: HIVE-22893.11.patch

> Enhance data size estimation for fields computed by UDFs
> 
>
> Key: HIVE-22893
> URL: https://issues.apache.org/jira/browse/HIVE-22893
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22893.01.patch, HIVE-22893.02.patch, 
> HIVE-22893.03.patch, HIVE-22893.04.patch, HIVE-22893.05.patch, 
> HIVE-22893.06.patch, HIVE-22893.07.patch, HIVE-22893.08.patch, 
> HIVE-22893.09.patch, HIVE-22893.10.patch, HIVE-22893.11.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Right now if we have columnstat on a column ; we use that to estimate things 
> about the column; - however if an UDF is executed on a column ; the resulting 
> column is treated as unknown thing and defaults are assumed.
> An improvement could be to give wide estimation(s) in case of frequently used 
> udf.
> For example; consider {{substr(c,1,1)}} ; no matter what the input; the 
> output is at most a 1 long string



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22893) Enhance data size estimation for fields computed by UDFs

2020-02-22 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-22893:

Attachment: HIVE-22893.10.patch

> Enhance data size estimation for fields computed by UDFs
> 
>
> Key: HIVE-22893
> URL: https://issues.apache.org/jira/browse/HIVE-22893
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22893.01.patch, HIVE-22893.02.patch, 
> HIVE-22893.03.patch, HIVE-22893.04.patch, HIVE-22893.05.patch, 
> HIVE-22893.06.patch, HIVE-22893.07.patch, HIVE-22893.08.patch, 
> HIVE-22893.09.patch, HIVE-22893.10.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Right now if we have columnstat on a column ; we use that to estimate things 
> about the column; - however if an UDF is executed on a column ; the resulting 
> column is treated as unknown thing and defaults are assumed.
> An improvement could be to give wide estimation(s) in case of frequently used 
> udf.
> For example; consider {{substr(c,1,1)}} ; no matter what the input; the 
> output is at most a 1 long string



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21304) Show Bucketing version for ReduceSinkOp in explain extended plan

2020-02-22 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-21304:

Attachment: HIVE-21304.15.patch

> Show Bucketing version for ReduceSinkOp in explain extended plan
> 
>
> Key: HIVE-21304
> URL: https://issues.apache.org/jira/browse/HIVE-21304
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-21304.01.patch, HIVE-21304.02.patch, 
> HIVE-21304.03.patch, HIVE-21304.04.patch, HIVE-21304.05.patch, 
> HIVE-21304.06.patch, HIVE-21304.07.patch, HIVE-21304.08.patch, 
> HIVE-21304.09.patch, HIVE-21304.10.patch, HIVE-21304.11.patch, 
> HIVE-21304.12.patch, HIVE-21304.13.patch, HIVE-21304.14.patch, 
> HIVE-21304.15.patch
>
>
> Show Bucketing version for ReduceSinkOp in explain extended plan.
> This helps identify what hashing algorithm is being used by by ReduceSinkOp.
>  
> cc [~vgarg]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22376) Cancelled query still prints exception if it was stuck in waiting for lock

2020-02-22 Thread Aron Hamvas (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aron Hamvas updated HIVE-22376:
---
Attachment: (was: HIVE-22376.patch)

> Cancelled query still prints exception if it was stuck in waiting for lock
> --
>
> Key: HIVE-22376
> URL: https://issues.apache.org/jira/browse/HIVE-22376
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking
>Affects Versions: 3.1.2
>Reporter: Peter Vary
>Assignee: Aron Hamvas
>Priority: Major
> Attachments: HIVE-22376.patch
>
>
> The query waits for locks, then cancelled.
> It prints this to the logs, which is unnecessary and missleading:
> {code}
> apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:326)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:344)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: NoSuchLockException(message:No such lock lockid:272)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result.read(ThriftHiveMetastore.java)
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_check_lock(ThriftHiveMetastore.java:5730)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.check_lock(ThriftHiveMetastore.java:5717)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.checkLock(HiveMetaStoreClient.java:3128)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:115)
>   ... 25 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22376) Cancelled query still prints exception if it was stuck in waiting for lock

2020-02-22 Thread Aron Hamvas (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aron Hamvas updated HIVE-22376:
---
Attachment: HIVE-22376.patch

> Cancelled query still prints exception if it was stuck in waiting for lock
> --
>
> Key: HIVE-22376
> URL: https://issues.apache.org/jira/browse/HIVE-22376
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking
>Affects Versions: 3.1.2
>Reporter: Peter Vary
>Assignee: Aron Hamvas
>Priority: Major
> Attachments: HIVE-22376.patch
>
>
> The query waits for locks, then cancelled.
> It prints this to the logs, which is unnecessary and missleading:
> {code}
> apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:326)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:344)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: NoSuchLockException(message:No such lock lockid:272)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result.read(ThriftHiveMetastore.java)
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_check_lock(ThriftHiveMetastore.java:5730)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.check_lock(ThriftHiveMetastore.java:5717)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.checkLock(HiveMetaStoreClient.java:3128)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:115)
>   ... 25 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22376) Cancelled query still prints exception if it was stuck in waiting for lock

2020-02-22 Thread Aron Hamvas (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aron Hamvas updated HIVE-22376:
---
Status: Open  (was: Patch Available)

> Cancelled query still prints exception if it was stuck in waiting for lock
> --
>
> Key: HIVE-22376
> URL: https://issues.apache.org/jira/browse/HIVE-22376
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking
>Affects Versions: 3.1.2
>Reporter: Peter Vary
>Assignee: Aron Hamvas
>Priority: Major
> Attachments: HIVE-22376.patch
>
>
> The query waits for locks, then cancelled.
> It prints this to the logs, which is unnecessary and missleading:
> {code}
> apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:326)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:344)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: NoSuchLockException(message:No such lock lockid:272)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result.read(ThriftHiveMetastore.java)
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_check_lock(ThriftHiveMetastore.java:5730)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.check_lock(ThriftHiveMetastore.java:5717)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.checkLock(HiveMetaStoreClient.java:3128)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:115)
>   ... 25 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22376) Cancelled query still prints exception if it was stuck in waiting for lock

2020-02-22 Thread Aron Hamvas (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042564#comment-17042564
 ] 

Aron Hamvas commented on HIVE-22376:


Hi [~pvary], the new patch is available with the modification requested; 
pending tests.

> Cancelled query still prints exception if it was stuck in waiting for lock
> --
>
> Key: HIVE-22376
> URL: https://issues.apache.org/jira/browse/HIVE-22376
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking
>Affects Versions: 3.1.2
>Reporter: Peter Vary
>Assignee: Aron Hamvas
>Priority: Major
> Attachments: HIVE-22376.patch
>
>
> The query waits for locks, then cancelled.
> It prints this to the logs, which is unnecessary and missleading:
> {code}
> apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:326)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:344)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: NoSuchLockException(message:No such lock lockid:272)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result.read(ThriftHiveMetastore.java)
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_check_lock(ThriftHiveMetastore.java:5730)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.check_lock(ThriftHiveMetastore.java:5717)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.checkLock(HiveMetaStoreClient.java:3128)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:115)
>   ... 25 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22376) Cancelled query still prints exception if it was stuck in waiting for lock

2020-02-22 Thread Aron Hamvas (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aron Hamvas updated HIVE-22376:
---
Attachment: HIVE-22376.patch
Status: Patch Available  (was: Open)

> Cancelled query still prints exception if it was stuck in waiting for lock
> --
>
> Key: HIVE-22376
> URL: https://issues.apache.org/jira/browse/HIVE-22376
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking
>Affects Versions: 3.1.2
>Reporter: Peter Vary
>Assignee: Aron Hamvas
>Priority: Major
> Attachments: HIVE-22376.patch
>
>
> The query waits for locks, then cancelled.
> It prints this to the logs, which is unnecessary and missleading:
> {code}
> apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:326)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:344)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: NoSuchLockException(message:No such lock lockid:272)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result.read(ThriftHiveMetastore.java)
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_check_lock(ThriftHiveMetastore.java:5730)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.check_lock(ThriftHiveMetastore.java:5717)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.checkLock(HiveMetaStoreClient.java:3128)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:115)
>   ... 25 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22376) Cancelled query still prints exception if it was stuck in waiting for lock

2020-02-22 Thread Aron Hamvas (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aron Hamvas updated HIVE-22376:
---
Attachment: (was: HIVE-22376.patch)

> Cancelled query still prints exception if it was stuck in waiting for lock
> --
>
> Key: HIVE-22376
> URL: https://issues.apache.org/jira/browse/HIVE-22376
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking
>Affects Versions: 3.1.2
>Reporter: Peter Vary
>Assignee: Aron Hamvas
>Priority: Major
>
> The query waits for locks, then cancelled.
> It prints this to the logs, which is unnecessary and missleading:
> {code}
> apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:326)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:344)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: NoSuchLockException(message:No such lock lockid:272)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result.read(ThriftHiveMetastore.java)
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_check_lock(ThriftHiveMetastore.java:5730)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.check_lock(ThriftHiveMetastore.java:5717)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.checkLock(HiveMetaStoreClient.java:3128)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:115)
>   ... 25 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22376) Cancelled query still prints exception if it was stuck in waiting for lock

2020-02-22 Thread Aron Hamvas (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aron Hamvas updated HIVE-22376:
---
Status: Open  (was: Patch Available)

> Cancelled query still prints exception if it was stuck in waiting for lock
> --
>
> Key: HIVE-22376
> URL: https://issues.apache.org/jira/browse/HIVE-22376
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking
>Affects Versions: 3.1.2
>Reporter: Peter Vary
>Assignee: Aron Hamvas
>Priority: Major
>
> The query waits for locks, then cancelled.
> It prints this to the logs, which is unnecessary and missleading:
> {code}
> apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:326)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:344)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: NoSuchLockException(message:No such lock lockid:272)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result$check_lock_resultStandardScheme.read(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$check_lock_result.read(ThriftHiveMetastore.java)
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_check_lock(ThriftHiveMetastore.java:5730)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.check_lock(ThriftHiveMetastore.java:5717)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.checkLock(HiveMetaStoreClient.java:3128)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor351.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:)
>   at com.sun.proxy.$Proxy59.checkLock(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:115)
>   ... 25 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22891) Skip PartitonDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode

2020-02-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042486#comment-17042486
 ] 

Hive QA commented on HIVE-22891:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12994159/HIVE-22891.03.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 18056 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20783/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20783/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20783/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12994159 - PreCommit-HIVE-Build

> Skip PartitonDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode
> -
>
> Key: HIVE-22891
> URL: https://issues.apache.org/jira/browse/HIVE-22891
> Project: Hive
>  Issue Type: Task
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22891.01.patch, HIVE-22891.02.patch, 
> HIVE-22891.03.patch
>
>
> {code:java}
> try {
>   // TODO: refactor this out
>   if (pathToPartInfo == null) {
> MapWork mrwork;
> if (HiveConf.getVar(conf, 
> HiveConf.ConfVars.HIVE_EXECUTION_ENGINE).equals("tez")) {
>   mrwork = (MapWork) Utilities.getMergeWork(jobConf);
>   if (mrwork == null) {
> mrwork = Utilities.getMapWork(jobConf);
>   }
> } else {
>   mrwork = Utilities.getMapWork(jobConf);
> }
> pathToPartInfo = mrwork.getPathToPartitionInfo();
>   }  PartitionDesc part = extractSinglePartSpec(hsplit);
>   inputFormat = HiveInputFormat.wrapForLlap(inputFormat, jobConf, part);
> } catch (HiveException e) {
>   throw new IOException(e);
> }
> {code}
> The above piece of code in CombineHiveRecordReader.java was introduced in 
> HIVE-15147. This overwrites inputFormat based on the PartitionDesc which is 
> not required in non-LLAP mode of execution as the method 
> HiveInputFormat.wrapForLlap() simply returns the previously defined 
> inputFormat in case of non-LLAP mode. The method call extractSinglePartSpec() 
> has some serious performance implications. If there are large no. of small 
> files, each call in the method extractSinglePartSpec() takes approx ~ (2 - 3) 
> seconds. Hence the same query which runs in Hive 1.x / Hive 2 is way faster 
> than the query run on latest hive.
> {code:java}
> 2020-02-11 07:15:04,701 INFO [main] 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl: Reading ORC rows from 
> 2020-02-11 07:15:06,468 WARN [main] 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: Multiple partitions 
> found; not going to pass a part spec to LLAP IO: {{logdate=2020-02-03, 
> hour=01, event=win}} and {{logdate=2020-02-03, hour=02, event=act}}
> 2020-02-11 07:15:06,468 INFO [main] 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader: succeeded in getting 
> org.apache.hadoop.mapred.FileSplit{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22891) Skip PartitonDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode

2020-02-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042469#comment-17042469
 ] 

Hive QA commented on HIVE-22891:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
56s{color} | {color:blue} ql in master has 1530 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} ql: The patch generated 0 new + 22 unchanged - 1 
fixed = 22 total (was 23) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
15s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 22s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-20783/dev-support/hive-personality.sh
 |
| git revision | master / 6c3ee53 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20783/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20783/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Skip PartitonDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode
> -
>
> Key: HIVE-22891
> URL: https://issues.apache.org/jira/browse/HIVE-22891
> Project: Hive
>  Issue Type: Task
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22891.01.patch, HIVE-22891.02.patch, 
> HIVE-22891.03.patch
>
>
> {code:java}
> try {
>   // TODO: refactor this out
>   if (pathToPartInfo == null) {
> MapWork mrwork;
> if (HiveConf.getVar(conf, 
> HiveConf.ConfVars.HIVE_EXECUTION_ENGINE).equals("tez")) {
>   mrwork = (MapWork) Utilities.getMergeWork(jobConf);
>   if (mrwork == null) {
> mrwork = Utilities.getMapWork(jobConf);
>   }
> } else {
>   mrwork = Utilities.getMapWork(jobConf);
> }
> pathToPartInfo = mrwork.getPathToPartitionInfo();
>   }  PartitionDesc part = extractSinglePartSpec(hsplit);
>   inputFormat = HiveInputFormat.wrapForLlap(inputFormat, jobConf, part);
> } catch (HiveException e) {
>   throw new IOException(e);
> }
> {code}
> The above piece of code in CombineHiveRecordReader.java was introduced in 
> HIVE-15147. This overwrites inputFormat based on the PartitionDesc which is 
> not required in non-LLAP mode of execution as the method 
> HiveInputForm

[jira] [Commented] (HIVE-22914) Make Hive Connection ZK Interactions Easier to Troubleshoot

2020-02-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042451#comment-17042451
 ] 

Hive QA commented on HIVE-22914:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12994156/HIVE-22914.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 18056 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20782/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20782/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20782/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12994156 - PreCommit-HIVE-Build

> Make Hive Connection ZK Interactions Easier to Troubleshoot
> ---
>
> Key: HIVE-22914
> URL: https://issues.apache.org/jira/browse/HIVE-22914
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.1.2
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-22914.1.patch, HIVE-22914.1.patch
>
>
> Add better logging and make errors more consistent and meaningful.
> Recently was trying to troubleshoot an issue where the ZK namespace of the 
> client and the HS2 were different and it was way too difficult to diagnose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)