date:20201002

[jira] [Assigned] (HIVE-24223) Insert into Hive tables doesn't work for decimal numbers with no preceding digit before the decimal

2020-10-02 Thread Ashish Sharma (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma reassigned HIVE-24223:


Assignee: Ashish Sharma

> Insert into Hive tables doesn't work for decimal numbers with no preceding 
> digit before the decimal
> ---
>
> Key: HIVE-24223
> URL: https://issues.apache.org/jira/browse/HIVE-24223
> Project: Hive
>  Issue Type: Bug
>Affects Versions: All Versions
>Reporter: Kriti Jha
>Assignee: Ashish Sharma
>Priority: Minor
>
> Any insert operation to a table in Hive with decimal integers without a digit 
> before the DOT ('.') fails with an exception as shown below:
> --
> hive> create table test_dec(id decimal(10,8));
> hive> insert into test_dec values (-.5);
> NoViableAltException(16@[412:1: atomExpression : ( constant | ( 
> intervalExpression )=> intervalExpression | castExpression | 
> extractExpression | floorExpression | caseExpression | whenExpression | ( 
> subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR 
> TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | 
> tableOrColumn | expressionsInParenthesis[true] 
> );])NoViableAltException(16@[412:1: atomExpression : ( constant | ( 
> intervalExpression )=> intervalExpression | castExpression | 
> extractExpression | floorExpression | caseExpression | whenExpression | ( 
> subQueryExpression )=> ( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR 
> TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN )=> function | 
> tableOrColumn | expressionsInParenthesis[true] );]) at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31810)
>  at org.antlr.runtime.DFA.predict(DFA.java:80) at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6988)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnaryPrefixExpression(HiveParser_IdentifiersParser.java:7324)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnarySuffixExpression(HiveParser_IdentifiersParser.java:7380)
>  at 
>  ...
>  
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expression(HiveParser_IdentifiersParser.java:6686)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsNotInParenthesis(HiveParser_IdentifiersParser.java:2287)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsInParenthesis(HiveParser_IdentifiersParser.java:2233)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.expressionsInParenthesis(HiveParser.java:42106)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:6499)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:6583)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:6704)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:41954)
>  at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36536) 
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822)
>  at
>  ...
>  org.apache.hadoop.util.RunJar.main(RunJar.java:153)FAILED: ParseException 
> line 1:30 cannot recognize input near '.' '5' ')' in expression 
> specification
> --
> It seems to be coming from the Lexer where the types are defined and the 
> definition of 'Number' should be coming into play: 
>  -- 
>  Number : (Digit)+ ( DOT (Digit)* (Exponent)? | Exponent)? ; 
>  -- 
>  
> >[https://github.com/apache/hive/blob/2006e52713508a92fb4d1d28262fd7175eade8b7/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerParent.g#L469]
> However, the below works:
> > insert into test_dec values ('-.5');
>  > insert into test_dec values (-0.5);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-8950) Add support in ParquetHiveSerde to create table schema from a parquet file

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-8950?focusedWorklogId=494221&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494221
 ]

ASF GitHub Bot logged work on HIVE-8950:


Author: ASF GitHub Bot
Created on: 03/Oct/20 00:51
Start Date: 03/Oct/20 00:51
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1353:
URL: https://github.com/apache/hive/pull/1353#issuecomment-703016434


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494221)
Time Spent: 20m  (was: 10m)

> Add support in ParquetHiveSerde to create table schema from a parquet file
> --
>
> Key: HIVE-8950
> URL: https://issues.apache.org/jira/browse/HIVE-8950
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashish Singh
>Assignee: Ashish Singh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-8950.1.patch, HIVE-8950.10.patch, 
> HIVE-8950.11.patch, HIVE-8950.2.patch, HIVE-8950.3.patch, HIVE-8950.4.patch, 
> HIVE-8950.5.patch, HIVE-8950.6.patch, HIVE-8950.7.patch, HIVE-8950.8.patch, 
> HIVE-8950.9.patch, HIVE-8950.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> PARQUET-76 and PARQUET-47 ask for creating parquet backed tables without 
> having to specify the column names and types. As, parquet files store schema 
> in their footer, it is possible to generate hive schema from parquet file's 
> metadata. This will improve usability of parquet backed tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23835) Repl Dump should dump function binaries to staging directory

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23835?focusedWorklogId=494218&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494218
 ]

ASF GitHub Bot logged work on HIVE-23835:
-

Author: ASF GitHub Bot
Created on: 03/Oct/20 00:51
Start Date: 03/Oct/20 00:51
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1249:
URL: https://github.com/apache/hive/pull/1249


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494218)
Time Spent: 1h 20m  (was: 1h 10m)

> Repl Dump should dump function binaries to staging directory
> 
>
> Key: HIVE-23835
> URL: https://issues.apache.org/jira/browse/HIVE-23835
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23835.01.patch, HIVE-23835.02.patch, 
> HIVE-23835.03.patch, HIVE-23835.04.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {color:#172b4d}When hive function's binaries are on source HDFS, repl dump 
> should dump it to the staging location in order to break cross clusters 
> visibility requirement.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23977) Consolidate partition fetch to one place

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23977?focusedWorklogId=494219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494219
 ]

ASF GitHub Bot logged work on HIVE-23977:
-

Author: ASF GitHub Bot
Created on: 03/Oct/20 00:51
Start Date: 03/Oct/20 00:51
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1354:
URL: https://github.com/apache/hive/pull/1354#issuecomment-703016426


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494219)
Time Spent: 20m  (was: 10m)

> Consolidate partition fetch to one place
> 
>
> Key: HIVE-23977
> URL: https://issues.apache.org/jira/browse/HIVE-23977
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24205) Optimise CuckooSetBytes

2020-10-02 Thread Mustafa Iman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman updated HIVE-24205:

Status: Patch Available  (was: Open)

> Optimise CuckooSetBytes
> ---
>
> Key: HIVE-24205
> URL: https://issues.apache.org/jira/browse/HIVE-24205
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, 
> vectorized.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{FilterStringColumnInList, StringColumnInList}}  etc use CuckooSetBytes for 
> lookup.
> !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508!
> One option to optimize would be to add boundary conditions on "length" with 
> the min/max length stored in the hashes (ref: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85])
>  . This would significantly reduce the number of hash computation that needs 
> to happen. E.g 
> [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24205) Optimise CuckooSetBytes

2020-10-02 Thread Mustafa Iman (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206560#comment-17206560
 ] 

Mustafa Iman commented on HIVE-24205:
-

[~hashutosh] [~rajesh.balamohan] can you take a look?

> Optimise CuckooSetBytes
> ---
>
> Key: HIVE-24205
> URL: https://issues.apache.org/jira/browse/HIVE-24205
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, 
> vectorized.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{FilterStringColumnInList, StringColumnInList}}  etc use CuckooSetBytes for 
> lookup.
> !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508!
> One option to optimize would be to add boundary conditions on "length" with 
> the min/max length stored in the hashes (ref: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85])
>  . This would significantly reduce the number of hash computation that needs 
> to happen. E.g 
> [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24205) Optimise CuckooSetBytes

2020-10-02 Thread Mustafa Iman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman updated HIVE-24205:

Attachment: bench.png

> Optimise CuckooSetBytes
> ---
>
> Key: HIVE-24205
> URL: https://issues.apache.org/jira/browse/HIVE-24205
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, 
> vectorized.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{FilterStringColumnInList, StringColumnInList}}  etc use CuckooSetBytes for 
> lookup.
> !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508!
> One option to optimize would be to add boundary conditions on "length" with 
> the min/max length stored in the hashes (ref: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85])
>  . This would significantly reduce the number of hash computation that needs 
> to happen. E.g 
> [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24205) Optimise CuckooSetBytes

2020-10-02 Thread Mustafa Iman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman updated HIVE-24205:

Attachment: vectorized.patch

> Optimise CuckooSetBytes
> ---
>
> Key: HIVE-24205
> URL: https://issues.apache.org/jira/browse/HIVE-24205
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, 
> vectorized.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{FilterStringColumnInList, StringColumnInList}}  etc use CuckooSetBytes for 
> lookup.
> !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508!
> One option to optimize would be to add boundary conditions on "length" with 
> the min/max length stored in the hashes (ref: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85])
>  . This would significantly reduce the number of hash computation that needs 
> to happen. E.g 
> [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24205) Optimise CuckooSetBytes

2020-10-02 Thread Mustafa Iman (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206559#comment-17206559
 ] 

Mustafa Iman commented on HIVE-24205:
-

I added a simple max/min length check in CuckooSetBytes#lookup. Attached file 
shows some benchmark results.

 

*TPCH_Q12* is a select with IN clause and a join afterwards. Selectivity of the 
filter is 30%.

*Synthetic* query ** is Simple select with IN clause. IN is over two of the 
longest comment fields (both 72 characters wide). So selectivity is very high 
at about 2%:

select o_orderkey, o_comment from orders where o_comment in ('jole quickly 
furiously bold escapades: regular accounts play regular req', 's foxes. regular 
warhorses detect fluffily. carefull 
y regular tithes amo', 'grate ironic, pending sauternes. deposits do are slyly. 
carefully ironic')

*Synthetic Wide* query is the same as synthetic except IN clause is over one 
shortest length and one longest length comment. Selectivity is still high at 4% 
but our optimization cannot eliminate any tuples.

select o_orderkey, o_comment from orders where o_comment in ('jole quickly 
furiously bold escapades: regular accounts play regular req', 'ts nag 
furiously. even');

 

The patch outperforms original code by 50% on synthetic query. For tpch q12, 
there is no meaningful difference between two runs. My conclusion is that the 
optimization is very low overhead and it gives significant perf improvement in 
certain cases.

I implemented a vectorized version of the early return from cuckooset. It is 
attached as vectorized.patch. However, in all cases simpler patch outperforms 
vectorized one.

> Optimise CuckooSetBytes
> ---
>
> Key: HIVE-24205
> URL: https://issues.apache.org/jira/browse/HIVE-24205
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Mustafa Iman
>Priority: Major
> Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, 
> vectorized.patch
>
>
> {{FilterStringColumnInList, StringColumnInList}}  etc use CuckooSetBytes for 
> lookup.
> !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508!
> One option to optimize would be to add boundary conditions on "length" with 
> the min/max length stored in the hashes (ref: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85])
>  . This would significantly reduce the number of hash computation that needs 
> to happen. E.g 
> [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24205) Optimise CuckooSetBytes

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24205:
--
Labels: pull-request-available  (was: )

> Optimise CuckooSetBytes
> ---
>
> Key: HIVE-24205
> URL: https://issues.apache.org/jira/browse/HIVE-24205
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, 
> vectorized.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{FilterStringColumnInList, StringColumnInList}}  etc use CuckooSetBytes for 
> lookup.
> !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508!
> One option to optimize would be to add boundary conditions on "length" with 
> the min/max length stored in the hashes (ref: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85])
>  . This would significantly reduce the number of hash computation that needs 
> to happen. E.g 
> [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24205) Optimise CuckooSetBytes

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24205?focusedWorklogId=494204&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494204
 ]

ASF GitHub Bot logged work on HIVE-24205:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 23:37
Start Date: 02/Oct/20 23:37
Worklog Time Spent: 10m 
  Work Description: mustafaiman opened a new pull request #1549:
URL: https://github.com/apache/hive/pull/1549


   Change-Id: I86a28b27859824daf381d5581241fd683d5c85f0
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494204)
Remaining Estimate: 0h
Time Spent: 10m

> Optimise CuckooSetBytes
> ---
>
> Key: HIVE-24205
> URL: https://issues.apache.org/jira/browse/HIVE-24205
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Mustafa Iman
>Priority: Major
> Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, 
> vectorized.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{FilterStringColumnInList, StringColumnInList}}  etc use CuckooSetBytes for 
> lookup.
> !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508!
> One option to optimize would be to add boundary conditions on "length" with 
> the min/max length stored in the hashes (ref: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85])
>  . This would significantly reduce the number of hash computation that needs 
> to happen. E.g 
> [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24205) Optimise CuckooSetBytes

2020-10-02 Thread Mustafa Iman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman updated HIVE-24205:

Attachment: (was: Screen Shot 2020-10-02 at 4.15.32 PM.png)

> Optimise CuckooSetBytes
> ---
>
> Key: HIVE-24205
> URL: https://issues.apache.org/jira/browse/HIVE-24205
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, 
> vectorized.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{FilterStringColumnInList, StringColumnInList}}  etc use CuckooSetBytes for 
> lookup.
> !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508!
> One option to optimize would be to add boundary conditions on "length" with 
> the min/max length stored in the hashes (ref: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85])
>  . This would significantly reduce the number of hash computation that needs 
> to happen. E.g 
> [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24205) Optimise CuckooSetBytes

2020-10-02 Thread Mustafa Iman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman updated HIVE-24205:

Attachment: Screen Shot 2020-10-02 at 4.15.32 PM.png

> Optimise CuckooSetBytes
> ---
>
> Key: HIVE-24205
> URL: https://issues.apache.org/jira/browse/HIVE-24205
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Mustafa Iman
>Priority: Major
> Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png, 
> vectorized.patch
>
>
> {{FilterStringColumnInList, StringColumnInList}}  etc use CuckooSetBytes for 
> lookup.
> !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508!
> One option to optimize would be to add boundary conditions on "length" with 
> the min/max length stored in the hashes (ref: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85])
>  . This would significantly reduce the number of hash computation that needs 
> to happen. E.g 
> [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24205) Optimise CuckooSetBytes

2020-10-02 Thread Mustafa Iman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman reassigned HIVE-24205:
---

Assignee: Mustafa Iman

> Optimise CuckooSetBytes
> ---
>
> Key: HIVE-24205
> URL: https://issues.apache.org/jira/browse/HIVE-24205
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Mustafa Iman
>Priority: Major
> Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png
>
>
> {{FilterStringColumnInList, StringColumnInList}}  etc use CuckooSetBytes for 
> lookup.
> !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508!
> One option to optimize would be to add boundary conditions on "length" with 
> the min/max length stored in the hashes (ref: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85])
>  . This would significantly reduce the number of hash computation that needs 
> to happen. E.g 
> [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24202) Clean up local HS2 HMS cache code (II)

2020-10-02 Thread Jesus Camacho Rodriguez (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206408#comment-17206408
 ] 

Jesus Camacho Rodriguez commented on HIVE-24202:


[~vgarg], could you take a look? Thanks

> Clean up local HS2 HMS cache code (II)
> --
>
> Key: HIVE-24202
> URL: https://issues.apache.org/jira/browse/HIVE-24202
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Follow-up for HIVE-24183 (split into different JIRAs).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24222) Upgrade ORC to 1.5.12

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24222?focusedWorklogId=494052&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494052
 ]

ASF GitHub Bot logged work on HIVE-24222:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 16:24
Start Date: 02/Oct/20 16:24
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun edited a comment on pull request #1545:
URL: https://github.com/apache/hive/pull/1545#issuecomment-702828726


   Thank you so much for merging, @sunchao !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494052)
Time Spent: 2h  (was: 1h 50m)

> Upgrade ORC to 1.5.12
> -
>
> Key: HIVE-24222
> URL: https://issues.apache.org/jira/browse/HIVE-24222
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24222) Upgrade ORC to 1.5.12

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24222?focusedWorklogId=494051&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494051
 ]

ASF GitHub Bot logged work on HIVE-24222:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 16:23
Start Date: 02/Oct/20 16:23
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun commented on pull request #1545:
URL: https://github.com/apache/hive/pull/1545#issuecomment-702828726


   Thank you so much, @sunchao !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494051)
Time Spent: 1h 50m  (was: 1h 40m)

> Upgrade ORC to 1.5.12
> -
>
> Key: HIVE-24222
> URL: https://issues.apache.org/jira/browse/HIVE-24222
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24222) Upgrade ORC to 1.5.12

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24222?focusedWorklogId=494044&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494044
 ]

ASF GitHub Bot logged work on HIVE-24222:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 16:18
Start Date: 02/Oct/20 16:18
Worklog Time Spent: 10m 
  Work Description: sunchao merged pull request #1545:
URL: https://github.com/apache/hive/pull/1545


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494044)
Time Spent: 1h 40m  (was: 1.5h)

> Upgrade ORC to 1.5.12
> -
>
> Key: HIVE-24222
> URL: https://issues.apache.org/jira/browse/HIVE-24222
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24210) PartitionManagementTask fails if one of tables dropped after fetching TableMeta

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24210?focusedWorklogId=494041&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494041
 ]

ASF GitHub Bot logged work on HIVE-24210:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 16:12
Start Date: 02/Oct/20 16:12
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 merged pull request #1536:
URL: https://github.com/apache/hive/pull/1536


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494041)
Time Spent: 20m  (was: 10m)

> PartitionManagementTask fails if one of tables dropped after fetching 
> TableMeta
> ---
>
> Key: HIVE-24210
> URL: https://issues.apache.org/jira/browse/HIVE-24210
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> After fetching tableMeta based on configured dbPattern & tablePattern for PMT
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java#L125
> If one of the tables dropped before scheduling AutoPartition Discovery or 
> MSCK, then entire PMT will be stopped because of below exception even though 
> we can run MSCK for other valid tables.
> {code:java}
> 2020-09-21T10:45:15,875 ERROR [pool-4-thread-150]: 
> metastore.PartitionManagementTask (PartitionManagementTask.java:run(163)) - 
> Exception while running partition discovery task for table: null
> org.apache.hadoop.hive.metastore.api.NoSuchObjectException: 
> hive.default.test_table table not found
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:3391)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3315)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3291)
>  
>  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
>  at java.lang.reflect.Method.invoke(Method.java:498) 
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>  
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>  
>  at com.sun.proxy.$Proxy30.get_table_req(Unknown Source) ~[?:?]
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1804)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1791)
>  
>  at 
> org.apache.hadoop.hive.metastore.PartitionManagementTask.run(PartitionManagementTask.java:130){code}
> Exception is thrown from here.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java#L130]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24210) PartitionManagementTask fails if one of tables dropped after fetching TableMeta

2020-10-02 Thread Vineet Garg (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg resolved HIVE-24210.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master. Thanks [~nareshpr]

> PartitionManagementTask fails if one of tables dropped after fetching 
> TableMeta
> ---
>
> Key: HIVE-24210
> URL: https://issues.apache.org/jira/browse/HIVE-24210
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> After fetching tableMeta based on configured dbPattern & tablePattern for PMT
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java#L125
> If one of the tables dropped before scheduling AutoPartition Discovery or 
> MSCK, then entire PMT will be stopped because of below exception even though 
> we can run MSCK for other valid tables.
> {code:java}
> 2020-09-21T10:45:15,875 ERROR [pool-4-thread-150]: 
> metastore.PartitionManagementTask (PartitionManagementTask.java:run(163)) - 
> Exception while running partition discovery task for table: null
> org.apache.hadoop.hive.metastore.api.NoSuchObjectException: 
> hive.default.test_table table not found
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:3391)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3315)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3291)
>  
>  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
>  at java.lang.reflect.Method.invoke(Method.java:498) 
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>  
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>  
>  at com.sun.proxy.$Proxy30.get_table_req(Unknown Source) ~[?:?]
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1804)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1791)
>  
>  at 
> org.apache.hadoop.hive.metastore.PartitionManagementTask.run(PartitionManagementTask.java:130){code}
> Exception is thrown from here.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java#L130]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24210) PartitionManagementTask fails if one of tables dropped after fetching TableMeta

2020-10-02 Thread Vineet Garg (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206302#comment-17206302
 ] 

Vineet Garg commented on HIVE-24210:


+1. LGTM

> PartitionManagementTask fails if one of tables dropped after fetching 
> TableMeta
> ---
>
> Key: HIVE-24210
> URL: https://issues.apache.org/jira/browse/HIVE-24210
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After fetching tableMeta based on configured dbPattern & tablePattern for PMT
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java#L125
> If one of the tables dropped before scheduling AutoPartition Discovery or 
> MSCK, then entire PMT will be stopped because of below exception even though 
> we can run MSCK for other valid tables.
> {code:java}
> 2020-09-21T10:45:15,875 ERROR [pool-4-thread-150]: 
> metastore.PartitionManagementTask (PartitionManagementTask.java:run(163)) - 
> Exception while running partition discovery task for table: null
> org.apache.hadoop.hive.metastore.api.NoSuchObjectException: 
> hive.default.test_table table not found
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:3391)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTableInternal(HiveMetaStore.java:3315)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_req(HiveMetaStore.java:3291)
>  
>  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
>  at java.lang.reflect.Method.invoke(Method.java:498) 
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>  
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>  
>  at com.sun.proxy.$Proxy30.get_table_req(Unknown Source) ~[?:?]
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1804)
>  
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1791)
>  
>  at 
> org.apache.hadoop.hive.metastore.PartitionManagementTask.run(PartitionManagementTask.java:130){code}
> Exception is thrown from here.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java#L130]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=494022&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494022
 ]

ASF GitHub Bot logged work on HIVE-22826:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 15:49
Start Date: 02/Oct/20 15:49
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #1528:
URL: https://github.com/apache/hive/pull/1528#discussion_r498905028



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/column/change/AlterTableChangeColumnOperation.java
##
@@ -72,6 +74,10 @@ protected void doAlteration(Table table, Partition 
partition) throws HiveExcepti
 if (desc.getNewColumnComment() != null) {
   oldColumn.setComment(desc.getNewColumnComment());
 }
+if (CollectionUtils.isNotEmpty(sd.getBucketCols()) && 
sd.getBucketCols().contains(oldColumnName)) {
+  sd.getBucketCols().remove(oldColumnName);
+  sd.getBucketCols().add(desc.getNewColumnName());

Review comment:
   newColumnName is converted toLowerCase in query planning while 
populating "desc" but to be fail safe i have added toLowerCase() here also.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494022)
Time Spent: 2h 40m  (was: 2.5h)

>  ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
> ---
>
> Key: HIVE-22826
> URL: https://issues.apache.org/jira/browse/HIVE-22826
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: unitTest.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Compaction for tables where a bucketed column has been renamed fails since 
> the list of bucketed columns in the StorageDescriptor doesn't get updated 
> when the column is renamed, therefore we can't recreate the table correctly 
> during compaction.
> Attached a unit test that fails.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=494019&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494019
 ]

ASF GitHub Bot logged work on HIVE-22826:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 15:48
Start Date: 02/Oct/20 15:48
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #1528:
URL: https://github.com/apache/hive/pull/1528#discussion_r498904232



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/column/change/AlterTableChangeColumnOperation.java
##
@@ -72,6 +74,10 @@ protected void doAlteration(Table table, Partition 
partition) throws HiveExcepti
 if (desc.getNewColumnComment() != null) {
   oldColumn.setComment(desc.getNewColumnComment());
 }
+if (CollectionUtils.isNotEmpty(sd.getBucketCols()) && 
sd.getBucketCols().contains(oldColumnName)) {

Review comment:
   As per HIVE column contract it should be case in-sensitive. But it is 
not handled properly in  query planning of "alter table {tablename} change". So 
I have added toLowerCase() in query planning also.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494019)
Time Spent: 2.5h  (was: 2h 20m)

>  ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
> ---
>
> Key: HIVE-22826
> URL: https://issues.apache.org/jira/browse/HIVE-22826
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: unitTest.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Compaction for tables where a bucketed column has been renamed fails since 
> the list of bucketed columns in the StorageDescriptor doesn't get updated 
> when the column is renamed, therefore we can't recreate the table correctly 
> during compaction.
> Attached a unit test that fails.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=494016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494016
 ]

ASF GitHub Bot logged work on HIVE-22826:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 15:45
Start Date: 02/Oct/20 15:45
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #1528:
URL: https://github.com/apache/hive/pull/1528#discussion_r498902660



##
File path: 
ql/src/test/queries/clientpositive/alter_bucketedtable_change_column.q
##
@@ -0,0 +1,10 @@
+--! qt:dataset:src
+create table alter_bucket_change_col_t1(key string, value string) partitioned 
by (ds string) clustered by (key) into 10 buckets;
+
+describe formatted alter_bucket_change_col_t1;
+
+-- Test changing name of bucket column
+
+alter table alter_bucket_change_col_t1 change key keys string;

Review comment:
   added - "Serial_Num" which cover lower case, Upper case and also special 
char.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494016)
Time Spent: 2h 20m  (was: 2h 10m)

>  ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
> ---
>
> Key: HIVE-22826
> URL: https://issues.apache.org/jira/browse/HIVE-22826
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: unitTest.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Compaction for tables where a bucketed column has been renamed fails since 
> the list of bucketed columns in the StorageDescriptor doesn't get updated 
> when the column is renamed, therefore we can't recreate the table correctly 
> during compaction.
> Attached a unit test that fails.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=494015&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494015
 ]

ASF GitHub Bot logged work on HIVE-22826:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 15:44
Start Date: 02/Oct/20 15:44
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #1528:
URL: https://github.com/apache/hive/pull/1528#discussion_r498902147



##
File path: 
ql/src/test/queries/clientpositive/alter_numbuckets_partitioned_table_h23.q
##
@@ -52,6 +52,12 @@ alter table tst1_n1 clustered by (value) into 12 buckets;
 
 describe formatted tst1_n1;
 
+-- Test changing name of bucket column
+
+alter table tst1_n1 change key keys string;
+
+describe formatted tst1_n1;

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494015)
Time Spent: 2h 10m  (was: 2h)

>  ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
> ---
>
> Key: HIVE-22826
> URL: https://issues.apache.org/jira/browse/HIVE-22826
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: unitTest.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Compaction for tables where a bucketed column has been renamed fails since 
> the list of bucketed columns in the StorageDescriptor doesn't get updated 
> when the column is renamed, therefore we can't recreate the table correctly 
> during compaction.
> Attached a unit test that fails.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=494014&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494014
 ]

ASF GitHub Bot logged work on HIVE-22826:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 15:44
Start Date: 02/Oct/20 15:44
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #1528:
URL: https://github.com/apache/hive/pull/1528#discussion_r498901990



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
##
@@ -130,6 +130,11 @@ public void alterTable(RawStore msdb, Warehouse wh, String 
catName, String dbnam
   throw new InvalidOperationException("Invalid column " + validate);
 }
 
+// Validate bucketedColumns in new table
+if (!MetaStoreServerUtils.validateBucketColumns(newt.getSd())) {
+  throw new InvalidOperationException("Bucket column doesn't match with 
any table columns");

Review comment:
   1. Converted return type to List.
   2. Added Log.error() along with column name.
   3. Added column to exception also. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494014)
Time Spent: 2h  (was: 1h 50m)

>  ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
> ---
>
> Key: HIVE-22826
> URL: https://issues.apache.org/jira/browse/HIVE-22826
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: unitTest.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Compaction for tables where a bucketed column has been renamed fails since 
> the list of bucketed columns in the StorageDescriptor doesn't get updated 
> when the column is renamed, therefore we can't recreate the table correctly 
> during compaction.
> Attached a unit test that fails.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24222) Upgrade ORC to 1.5.12

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24222?focusedWorklogId=494013&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494013
 ]

ASF GitHub Bot logged work on HIVE-24222:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 15:43
Start Date: 02/Oct/20 15:43
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun commented on pull request #1545:
URL: https://github.com/apache/hive/pull/1545#issuecomment-702807686


   Thank you, @pgaref and @sunchao . It's passed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494013)
Time Spent: 1.5h  (was: 1h 20m)

> Upgrade ORC to 1.5.12
> -
>
> Key: HIVE-24222
> URL: https://issues.apache.org/jira/browse/HIVE-24222
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=494012&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494012
 ]

ASF GitHub Bot logged work on HIVE-22826:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 15:42
Start Date: 02/Oct/20 15:42
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #1528:
URL: https://github.com/apache/hive/pull/1528#discussion_r498900786



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
##
@@ -1554,4 +1555,32 @@ public static Partition createMetaPartitionObject(Table 
tbl, Map
 }
 return tpart;
   }
+
+  /**
+   * Validate bucket columns should belong to table columns.
+   * @param sd StorageDescriptor of given table
+   * @return true if bucket columns are empty or belong to table columns else 
false
+   */
+  public static boolean validateBucketColumns(StorageDescriptor sd) {
+List columnNames = getColumnNames(sd.getCols());

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494012)
Time Spent: 1h 50m  (was: 1h 40m)

>  ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
> ---
>
> Key: HIVE-22826
> URL: https://issues.apache.org/jira/browse/HIVE-22826
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: unitTest.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Compaction for tables where a bucketed column has been renamed fails since 
> the list of bucketed columns in the StorageDescriptor doesn't get updated 
> when the column is renamed, therefore we can't recreate the table correctly 
> during compaction.
> Attached a unit test that fails.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=494011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494011
 ]

ASF GitHub Bot logged work on HIVE-22826:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 15:42
Start Date: 02/Oct/20 15:42
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #1528:
URL: https://github.com/apache/hive/pull/1528#discussion_r498900655



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
##
@@ -1554,4 +1555,32 @@ public static Partition createMetaPartitionObject(Table 
tbl, Map
 }
 return tpart;
   }
+
+  /**
+   * Validate bucket columns should belong to table columns.
+   * @param sd StorageDescriptor of given table
+   * @return true if bucket columns are empty or belong to table columns else 
false
+   */
+  public static boolean validateBucketColumns(StorageDescriptor sd) {
+List columnNames = getColumnNames(sd.getCols());
+if(CollectionUtils.isNotEmpty(sd.getBucketCols()) &&  
CollectionUtils.isNotEmpty(columnNames)){

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494011)
Time Spent: 1h 40m  (was: 1.5h)

>  ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
> ---
>
> Key: HIVE-22826
> URL: https://issues.apache.org/jira/browse/HIVE-22826
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: unitTest.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Compaction for tables where a bucketed column has been renamed fails since 
> the list of bucketed columns in the StorageDescriptor doesn't get updated 
> when the column is renamed, therefore we can't recreate the table correctly 
> during compaction.
> Attached a unit test that fails.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=494010&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494010
 ]

ASF GitHub Bot logged work on HIVE-22826:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 15:41
Start Date: 02/Oct/20 15:41
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #1528:
URL: https://github.com/apache/hive/pull/1528#discussion_r498900535



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
##
@@ -1554,4 +1555,32 @@ public static Partition createMetaPartitionObject(Table 
tbl, Map
 }
 return tpart;
   }
+
+  /**
+   * Validate bucket columns should belong to table columns.
+   * @param sd StorageDescriptor of given table
+   * @return true if bucket columns are empty or belong to table columns else 
false
+   */
+  public static boolean validateBucketColumns(StorageDescriptor sd) {
+List columnNames = getColumnNames(sd.getCols());
+if(CollectionUtils.isNotEmpty(sd.getBucketCols()) &&  
CollectionUtils.isNotEmpty(columnNames)){
+  return 
columnNames.containsAll(sd.getBucketCols().stream().map(String::toLowerCase).collect(Collectors.toList()));
+} else if (CollectionUtils.isNotEmpty(sd.getBucketCols()) &&  
CollectionUtils.isEmpty(columnNames)) {
+  return false;
+} else {
+  return true;
+}
+  }
+
+  /**
+   * Generate column name list  from the fieldSchema list
+   * @param cols fieldSchema list
+   * @return column name list
+   */
+  public static List getColumnNames(List cols) {
+if (CollectionUtils.isNotEmpty(cols)) {
+  return 
cols.stream().map(FieldSchema::getName).collect(Collectors.toList());

Review comment:
   Expected column name in lower case. But in order to be fail safe added 
toLowerCase() here also.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494010)
Time Spent: 1.5h  (was: 1h 20m)

>  ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
> ---
>
> Key: HIVE-22826
> URL: https://issues.apache.org/jira/browse/HIVE-22826
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: unitTest.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Compaction for tables where a bucketed column has been renamed fails since 
> the list of bucketed columns in the StorageDescriptor doesn't get updated 
> when the column is renamed, therefore we can't recreate the table correctly 
> during compaction.
> Attached a unit test that fails.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=494005&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494005
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 15:38
Start Date: 02/Oct/20 15:38
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on pull request #1548:
URL: https://github.com/apache/hive/pull/1548#issuecomment-702804902


   @vpnvishv , could you please check.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 494005)
Time Spent: 50m  (was: 40m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=493985&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493985
 ]

ASF GitHub Bot logged work on HIVE-24217:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 15:04
Start Date: 02/Oct/20 15:04
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on a change in pull request #1542:
URL: https://github.com/apache/hive/pull/1542#discussion_r498878756



##
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##
@@ -1549,6 +1549,83 @@
 
   
 
+
+
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+

Review comment:
   I'll double check it, I remember having some problem with another 
textual datatype, I can't remember which one was that, that's why MEDIUMTEXT 
was chosen.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 493985)
Time Spent: 1h 20m  (was: 1h 10m)

> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HPL_SQL storedproc HMS storage.pdf
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures. 
> This is an incremental step towards having fully capable stored procedures in 
> Hive.
>  
> See the attached design for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=493984&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493984
 ]

ASF GitHub Bot logged work on HIVE-24217:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 15:03
Start Date: 02/Oct/20 15:03
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on a change in pull request #1542:
URL: https://github.com/apache/hive/pull/1542#discussion_r498877894



##
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##
@@ -1549,6 +1549,83 @@
 
   
 
+
+
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+
+
+
+
+  
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+  
+
+  

Review comment:
   The signature is already parsed by the time the procedure is being 
created. We would need to drop that information, get back the textual 
representation of the signature to store it in HMS, and reparse it on the 
client side when someone calls the procedure. That's maybe not a big deal but 
still unnecessary to parse it twice. Storing it in a structured way also 
ensures some degree of validity, you can't store a syntactically incorrect 
signature if we store it in a structured way.
   
   I'm not sure if they never participate in a query. If one wants to discover 
the stored procedures which are currently stored in a DB and find out on what 
data they operate they would need to do some clumsy string manipulations on the 
signature.
   
   Considering that other DB engines also store these information separately I 
would like to keep it as it is for now and see how it works in practice. Later 
on when we have multi language support we can revisit this issue. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 493984)
Time Spent: 1h 10m  (was: 1h)

> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HPL_SQL storedproc HMS storage.pdf
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures. 
> This is an incremental step towards having fully capable stored procedures in 
> Hive.
>  
> See the attached design for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=493978&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493978
 ]

ASF GitHub Bot logged work on HIVE-24217:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 14:51
Start Date: 02/Oct/20 14:51
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on a change in pull request #1542:
URL: https://github.com/apache/hive/pull/1542#discussion_r498868959



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -2830,6 +2848,11 @@ PartitionsResponse 
get_partitions_req(1:PartitionsRequest req)
   void add_replication_metrics(1: ReplicationMetricList replicationMetricList) 
throws(1:MetaException o1)
   ReplicationMetricList get_replication_metrics(1: 
GetReplicationMetricsRequest rqst) throws(1:MetaException o1)
   GetOpenTxnsResponse get_open_txns_req(1: GetOpenTxnsRequest 
getOpenTxnsRequest)
+
+  void create_stored_procedure(1: string catName, 2: StoredProcedure proc) 
throws(1:NoSuchObjectException o1, 2:MetaException o2)
+  StoredProcedure get_stored_procedure(1: string catName, 2: string db, 3: 
string name) throws (1:MetaException o1, 2:NoSuchObjectException o2)
+  void drop_stored_procedure(1: string catName, 2: string dbName, 3: string 
funcName) throws (1:MetaException o1, 2:NoSuchObjectException o2)
+  list get_all_stored_procedures(1: string catName) throws 
(1:MetaException o1)

Review comment:
   You mean putting (1: string catName, 2: string dbName, 3: string 
funcName) into a request object? I can do that. But if we have only one 
parameter, like in the last case that would be an overkill in my opinion.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 493978)
Time Spent: 1h  (was: 50m)

> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HPL_SQL storedproc HMS storage.pdf
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures. 
> This is an incremental step towards having fully capable stored procedures in 
> Hive.
>  
> See the attached design for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=493976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493976
 ]

ASF GitHub Bot logged work on HIVE-24217:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 14:48
Start Date: 02/Oct/20 14:48
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on a change in pull request #1542:
URL: https://github.com/apache/hive/pull/1542#discussion_r498868959



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -2830,6 +2848,11 @@ PartitionsResponse 
get_partitions_req(1:PartitionsRequest req)
   void add_replication_metrics(1: ReplicationMetricList replicationMetricList) 
throws(1:MetaException o1)
   ReplicationMetricList get_replication_metrics(1: 
GetReplicationMetricsRequest rqst) throws(1:MetaException o1)
   GetOpenTxnsResponse get_open_txns_req(1: GetOpenTxnsRequest 
getOpenTxnsRequest)
+
+  void create_stored_procedure(1: string catName, 2: StoredProcedure proc) 
throws(1:NoSuchObjectException o1, 2:MetaException o2)
+  StoredProcedure get_stored_procedure(1: string catName, 2: string db, 3: 
string name) throws (1:MetaException o1, 2:NoSuchObjectException o2)
+  void drop_stored_procedure(1: string catName, 2: string dbName, 3: string 
funcName) throws (1:MetaException o1, 2:NoSuchObjectException o2)
+  list get_all_stored_procedures(1: string catName) throws 
(1:MetaException o1)

Review comment:
   You mean putting (1: string catName, 2: string dbName, 3: string 
funcName) into a request object? I can do that. But if we have only can 
parameter, like in the last case that would be an overkill in my opinion.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 493976)
Time Spent: 50m  (was: 40m)

> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HPL_SQL storedproc HMS storage.pdf
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures. 
> This is an incremental step towards having fully capable stored procedures in 
> Hive.
>  
> See the attached design for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=493974&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493974
 ]

ASF GitHub Bot logged work on HIVE-24217:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 14:46
Start Date: 02/Oct/20 14:46
Worklog Time Spent: 10m 
  Work Description: zeroflag commented on a change in pull request #1542:
URL: https://github.com/apache/hive/pull/1542#discussion_r498867496



##
File path: 
standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql
##
@@ -786,6 +786,35 @@ CREATE TABLE "APP"."REPLICATION_METRICS" (
 CREATE INDEX "POLICY_IDX" ON "APP"."REPLICATION_METRICS" ("RM_POLICY");
 CREATE INDEX "DUMP_IDX" ON "APP"."REPLICATION_METRICS" 
("RM_DUMP_EXECUTION_ID");
 
+-- Create stored procedure tables
+CREATE TABLE "APP"."STORED_PROCS" (
+  "SP_ID" BIGINT NOT NULL,
+  "CREATE_TIME" INTEGER NOT NULL,
+  "LAST_ACCESS_TIME" INTEGER NOT NULL,

Review comment:
   the intention was to have something the represents the last modification 
date (maybe the name was chosen poorly), but ok I'll remove it, it is not used





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 493974)
Time Spent: 40m  (was: 0.5h)

> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Attachments: HPL_SQL storedproc HMS storage.pdf
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures. 
> This is an incremental step towards having fully capable stored procedures in 
> Hive.
>  
> See the attached design for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24217?focusedWorklogId=493970&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493970
 ]

ASF GitHub Bot logged work on HIVE-24217:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 14:41
Start Date: 02/Oct/20 14:41
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1542:
URL: https://github.com/apache/hive/pull/1542#discussion_r498830563



##
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##
@@ -2830,6 +2848,11 @@ PartitionsResponse 
get_partitions_req(1:PartitionsRequest req)
   void add_replication_metrics(1: ReplicationMetricList replicationMetricList) 
throws(1:MetaException o1)
   ReplicationMetricList get_replication_metrics(1: 
GetReplicationMetricsRequest rqst) throws(1:MetaException o1)
   GetOpenTxnsResponse get_open_txns_req(1: GetOpenTxnsRequest 
getOpenTxnsRequest)
+
+  void create_stored_procedure(1: string catName, 2: StoredProcedure proc) 
throws(1:NoSuchObjectException o1, 2:MetaException o2)
+  StoredProcedure get_stored_procedure(1: string catName, 2: string db, 3: 
string name) throws (1:MetaException o1, 2:NoSuchObjectException o2)
+  void drop_stored_procedure(1: string catName, 2: string dbName, 3: string 
funcName) throws (1:MetaException o1, 2:NoSuchObjectException o2)
+  list get_all_stored_procedures(1: string catName) throws 
(1:MetaException o1)

Review comment:
   could you please follow the convention of other methods and define a 
struct for the requests arguments

##
File path: 
standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql
##
@@ -786,6 +786,35 @@ CREATE TABLE "APP"."REPLICATION_METRICS" (
 CREATE INDEX "POLICY_IDX" ON "APP"."REPLICATION_METRICS" ("RM_POLICY");
 CREATE INDEX "DUMP_IDX" ON "APP"."REPLICATION_METRICS" 
("RM_DUMP_EXECUTION_ID");
 
+-- Create stored procedure tables
+CREATE TABLE "APP"."STORED_PROCS" (
+  "SP_ID" BIGINT NOT NULL,
+  "CREATE_TIME" INTEGER NOT NULL,
+  "LAST_ACCESS_TIME" INTEGER NOT NULL,

Review comment:
   I think we should only add fields which are actually usefull and in use 
- because right now the accesstime would not be updated at all I don't think we 
should add it.

##
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##
@@ -1549,6 +1549,83 @@
 
   
 
+
+
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+
+
+
+
+  
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+  
+
+  

Review comment:
   I think instead of storing the return_type/argument types and such in 
the metastore - as they would never participate in a query or anything 
"usefull"; they will just travel as payload in the messages.
   Given the fact that they are effectively implicit data which can be figured 
out from the function defintion - I think we may leave it to the execution 
engine; it should be able to figure it out (since it should be able to use it) .
   
   optionally; to give ourselfs(and users) some type of clarity we could add a 
"signature" string to the table - which could provide a human readable signature

##
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##
@@ -1549,6 +1549,83 @@
 
   
 
+
+
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+
+  
+  
+

Review comment:
   this is the first occurence of MEDIUMTEXT in package.jdo - I don't know 
how well that will work
   
   we had quite a few problems with "long" tableproperty values - and 
PARAM_VALUE was updated to use CLOB in oracle/etc
   
   the most important would be to make sure that we can store the procedure in 
all supported metastore databases - if possible this should also be tested in 
some way (at least by hand)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 493970)
Time Spent: 0.5h  (was: 20m)

> HMS storage backend for HPL/SQL stored procedur

[jira] [Assigned] (HIVE-24226) Avoid Copy of Bytes in Protobuf BinaryWriter

2020-10-02 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-24226:
-


> Avoid Copy of Bytes in Protobuf BinaryWriter
> 
>
> Key: HIVE-24226
> URL: https://issues.apache.org/jira/browse/HIVE-24226
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> {code:java|title=ProtoWriteSupport.java}
>   class BinaryWriter extends FieldWriter {
> @Override
> final void writeRawValue(Object value) {
>   ByteString byteString = (ByteString) value;
>   Binary binary = Binary.fromConstantByteArray(byteString.toByteArray());
>   recordConsumer.addBinary(binary);
> }
>   }
> {code}
> {{toByteArray()}} creates a copy of the buffer.  There is already support 
> with Parquet and Protobuf to pass instead a ByteBuffer which avoids the copy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=493956&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493956
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 13:45
Start Date: 02/Oct/20 13:45
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on pull request #1415:
URL: https://github.com/apache/hive/pull/1415#issuecomment-702742918


   created same for master: https://github.com/apache/hive/pull/1548



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 493956)
Time Spent: 40m  (was: 0.5h)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21052) Make sure transactions get cleaned if they are aborted before addPartitions is called

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21052?focusedWorklogId=493954&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493954
 ]

ASF GitHub Bot logged work on HIVE-21052:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 13:45
Start Date: 02/Oct/20 13:45
Worklog Time Spent: 10m 
  Work Description: deniskuzZ opened a new pull request #1548:
URL: https://github.com/apache/hive/pull/1548


   
   
   ### What changes were proposed in this pull request?
   
   Below changes are only with respect to branch-3.1.
   
   Design: taken from 
https://issues.apache.org/jira/secure/attachment/12954375/Aborted%20Txn%20w_Direct%20Write.pdf
   
   **Overview:**
   1. add a dummy row to TXN_COMPONENTS with operation type 'p' in 
enqueueLockWithRetry, which will be removed in addDynamicPartition
   2. If anytime txn is aborted, this dummy entry will be block initiator to 
remove this txnId from TXNS
   3. Initiator will add a row in COMPACTION_QUEUE (with type 'p') for the 
above aborted txn with the state as READY_FOR_CLEANING, at a time there will be 
a single entry of this type for a table in COMPACTION_QUEUE.
   4. Cleaner will directly pickup above request, and process it via new 
cleanAborted code path(scan all partitions and remove aborted dirs), once 
successful cleaner will remove dummy row from TXN_COMPONENTS
   
   **Cleaner Design:**
   - We are keeping cleaner single thread, and this new type of cleanup will be 
handled similar to any regular cleanup
   
   **Aborted dirs cleanup:**
   - In p-type cleanup, cleaner will iterate over all the partitions and remove 
all delta/base dirs with given aborted writeId list
   - added cleanup of aborted base/delta in the worker also
   
   **TXN_COMPONENTS cleanup:**
   - If successful, p-type entry will be removed from TXN_COMPONENTS during 
addDynamicPartitions
   - If aborted, cleaner will clean in markCleaned after successful processing 
of p-type cleanup
   
   **TXNS cleanup:**
   - No change, will be cleaned up by the initiator 
   
   
   
   
   ### Why are the changes needed?
   To fix above mentioned issue.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   unit-tests added
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 493954)
Time Spent: 0.5h  (was: 20m)

> Make sure transactions get cleaned if they are aborted before addPartitions 
> is called
> -
>
> Key: HIVE-21052
> URL: https://issues.apache.org/jira/browse/HIVE-21052
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Jaume M
>Assignee: Jaume M
>Priority: Critical
>  Labels: pull-request-available
> Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, 
> HIVE-21052.10.patch, HIVE-21052.11.patch, HIVE-21052.12.patch, 
> HIVE-21052.2.patch, HIVE-21052.3.patch, HIVE-21052.4.patch, 
> HIVE-21052.5.patch, HIVE-21052.6.patch, HIVE-21052.7.patch, 
> HIVE-21052.8.patch, HIVE-21052.9.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If the transaction is aborted between openTxn and addPartitions and data has 
> been written on the table the transaction manager will think it's an empty 
> transaction and no cleaning will be done.
> This is currently an issue in the streaming API and in micromanaged tables. 
> As proposed by [~ekoifman] this can be solved by:
> * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and 
> when addPartitions is called remove this entry from TXN_COMPONENTS and add 
> the corresponding partition entry to TXN_COMPONENTS.
> * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that 
> specifies that a transaction was opened and it was aborted it must generate 
> jobs for the worker for every possible partition available.
> cc [~ewohlstadter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24157) Strict mode to fail on CAST timestamp <-> numeric

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24157?focusedWorklogId=493935&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493935
 ]

ASF GitHub Bot logged work on HIVE-24157:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 13:02
Start Date: 02/Oct/20 13:02
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1497:
URL: https://github.com/apache/hive/pull/1497


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 493935)
Time Spent: 1h 10m  (was: 1h)

> Strict mode to fail on CAST timestamp <-> numeric
> -
>
> Key: HIVE-24157
> URL: https://issues.apache.org/jira/browse/HIVE-24157
> Project: Hive
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Jesus Camacho Rodriguez
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> There is some interest in enforcing that CAST numeric <\-> timestamp is 
> disallowed to avoid confusion among users, e.g., SQL standard does not allow 
> numeric <\-> timestamp casting, timestamp type is timezone agnostic, etc.
> We should introduce a strict config for timestamp (similar to others before): 
> If the config is true, we shall fail while compiling the query with a 
> meaningful message.
> To provide similar behavior, Hive has multiple functions that provide clearer 
> semantics for numeric to timestamp conversion (and vice versa):
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24157) Strict mode to fail on CAST timestamp <-> numeric

2020-10-02 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24157.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Jesus for reviewing the changes!

> Strict mode to fail on CAST timestamp <-> numeric
> -
>
> Key: HIVE-24157
> URL: https://issues.apache.org/jira/browse/HIVE-24157
> Project: Hive
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Jesus Camacho Rodriguez
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> There is some interest in enforcing that CAST numeric <\-> timestamp is 
> disallowed to avoid confusion among users, e.g., SQL standard does not allow 
> numeric <\-> timestamp casting, timestamp type is timezone agnostic, etc.
> We should introduce a strict config for timestamp (similar to others before): 
> If the config is true, we shall fail while compiling the query with a 
> meaningful message.
> To provide similar behavior, Hive has multiple functions that provide clearer 
> semantics for numeric to timestamp conversion (and vice versa):
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23976) Enable vectorization for multi-col semi join reducers

2020-10-02 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206145#comment-17206145
 ] 

Stamatis Zampetakis commented on HIVE-23976:


Hi [~abstractdog],

While working on HIVE-24221, I got some further questions/ideas regarding this 
issue.

It seems that we make use of n-ary vectorized expressions for the evaluation of 
AND and OR operators; its true it is not done with the descriptor but through 
{{VectorizationContext}}. I am not sure what this mean in terms of efficiency, 
but it looks like we are saving at least some memory since I get the impression 
that we can reuse the output vector and not have a different output vector per 
pair of binary operations. We could employ something similar for an n-ary hash 
function.

Assuming that we cannot/should not treat the hash as n-ary operator then I 
think it makes more sense to make it unary (single input, single output), 
instead of binary, being only a kind of wrapper around Murmur for the different 
datatypes. By doing this the implementation will be simpler and we can cover 
more use-cases as the combine step is delegated to another abstraction.

+Currently+ 
{noformat}
hash(a,b) = 31*murmur(a) + murmur(b)
{noformat}

+After+
{noformat}
hash(a) = murmur(a)
{noformat}

What do you think?

> Enable vectorization for multi-col semi join reducers
> -
>
> Key: HIVE-23976
> URL: https://issues.apache.org/jira/browse/HIVE-23976
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-21196 introduces multi-column semi-join reducers in the query engine. 
> However, the implementation relies on GenericUDFMurmurHash which is not 
> vectorized thus the respective operators cannot be executed in vectorized 
> mode. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-24225) FIX S3A recordReader policy selection

2020-10-02 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24225 started by Panagiotis Garefalakis.
-
> FIX S3A recordReader policy selection
> -
>
> Key: HIVE-24225
> URL: https://issues.apache.org/jira/browse/HIVE-24225
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Dynamic S3A recordReader policy selection can cause issues on lazy 
> initialized FS objects



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24225) FIX S3A recordReader policy selection

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24225:
--
Labels: pull-request-available  (was: )

> FIX S3A recordReader policy selection
> -
>
> Key: HIVE-24225
> URL: https://issues.apache.org/jira/browse/HIVE-24225
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Dynamic S3A recordReader policy selection can cause issues on lazy 
> initialized FS objects



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24225) FIX S3A recordReader policy selection

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24225?focusedWorklogId=493916&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493916
 ]

ASF GitHub Bot logged work on HIVE-24225:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 11:44
Start Date: 02/Oct/20 11:44
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #1547:
URL: https://github.com/apache/hive/pull/1547


   
   
   This reverts commit c87e60d4
   
   Change-Id: Ie8b783e0b1e0e32d9a54f6663e9aae5dd0a0f94f
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 493916)
Remaining Estimate: 0h
Time Spent: 10m

> FIX S3A recordReader policy selection
> -
>
> Key: HIVE-24225
> URL: https://issues.apache.org/jira/browse/HIVE-24225
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Dynamic S3A recordReader policy selection can cause issues on lazy 
> initialized FS objects



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24225) FIX S3A recordReader policy selection

2020-10-02 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-24225:
-


> FIX S3A recordReader policy selection
> -
>
> Key: HIVE-24225
> URL: https://issues.apache.org/jira/browse/HIVE-24225
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>
> Dynamic S3A recordReader policy selection can cause issues on lazy 
> initialized FS objects



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24224) Fix skipping header/footer for Hive on Tez on compressed files

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24224?focusedWorklogId=493913&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493913
 ]

ASF GitHub Bot logged work on HIVE-24224:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 11:34
Start Date: 02/Oct/20 11:34
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1546:
URL: https://github.com/apache/hive/pull/1546#issuecomment-702682798


   @abstractdog  @mustafaiman can you please take a look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 493913)
Time Spent: 20m  (was: 10m)

> Fix skipping header/footer for Hive on Tez on compressed files
> --
>
> Key: HIVE-24224
> URL: https://issues.apache.org/jira/browse/HIVE-24224
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Compressed file with Hive on Tez  returns header and footers - for both 
> select * and select count ( * ):
> {noformat}
> printf "offset,id,other\n9,\"20200315 X00 1356\",123\n17,\"20200315 X00 
> 1357\",123\nrst,rst,rst" > data.csv
> hdfs dfs -put -f data.csv /apps/hive/warehouse/bz2test/bz2tbl1/
> bzip2 -f data.csv 
> hdfs dfs -put -f data.csv.bz2 /apps/hive/warehouse/bz2test/bz2tbl2/
> beeline -e "CREATE EXTERNAL TABLE default.bz2tst2 (
>   sequence   int,
>   id string,
>   other  string) 
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> LOCATION '/apps/hive/warehouse/bz2test/bz2tbl2' 
> TBLPROPERTIES (
>   'skip.header.line.count'='1',
>   'skip.footer.line.count'='1');"
> beeline -e "
>   SET hive.fetch.task.conversion = none;
>   SELECT * FROM default.bz2tst2;"
> +---+++
> | bz2tst2.sequence  | bz2tst2.id | bz2tst2.other  |
> +---+++
> | offset| id | other  |
> | 9 | 20200315 X00 1356  | 123|
> | 17| 20200315 X00 1357  | 123|
> | rst   | rst| rst|
> +---+++
> {noformat}
> PS: HIVE-22769 addressed the issue for Hive on LLAP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24224) Fix skipping header/footer for Hive on Tez on compressed files

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24224?focusedWorklogId=493912&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493912
 ]

ASF GitHub Bot logged work on HIVE-24224:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 11:31
Start Date: 02/Oct/20 11:31
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #1546:
URL: https://github.com/apache/hive/pull/1546


   Change-Id: I918a2eff0197e7d92db1f1858f3402b874d3b10a
   
   
   ### What changes were proposed in this pull request?
   Fix header/footer skipping for compressed files -- bug discovered for Hive 
on Tez
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 493912)
Remaining Estimate: 0h
Time Spent: 10m

> Fix skipping header/footer for Hive on Tez on compressed files
> --
>
> Key: HIVE-24224
> URL: https://issues.apache.org/jira/browse/HIVE-24224
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Compressed file with Hive on Tez  returns header and footers - for both 
> select * and select count ( * ):
> {noformat}
> printf "offset,id,other\n9,\"20200315 X00 1356\",123\n17,\"20200315 X00 
> 1357\",123\nrst,rst,rst" > data.csv
> hdfs dfs -put -f data.csv /apps/hive/warehouse/bz2test/bz2tbl1/
> bzip2 -f data.csv 
> hdfs dfs -put -f data.csv.bz2 /apps/hive/warehouse/bz2test/bz2tbl2/
> beeline -e "CREATE EXTERNAL TABLE default.bz2tst2 (
>   sequence   int,
>   id string,
>   other  string) 
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> LOCATION '/apps/hive/warehouse/bz2test/bz2tbl2' 
> TBLPROPERTIES (
>   'skip.header.line.count'='1',
>   'skip.footer.line.count'='1');"
> beeline -e "
>   SET hive.fetch.task.conversion = none;
>   SELECT * FROM default.bz2tst2;"
> +---+++
> | bz2tst2.sequence  | bz2tst2.id | bz2tst2.other  |
> +---+++
> | offset| id | other  |
> | 9 | 20200315 X00 1356  | 123|
> | 17| 20200315 X00 1357  | 123|
> | rst   | rst| rst|
> +---+++
> {noformat}
> PS: HIVE-22769 addressed the issue for Hive on LLAP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24224) Fix skipping header/footer for Hive on Tez on compressed files

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24224:
--
Labels: pull-request-available  (was: )

> Fix skipping header/footer for Hive on Tez on compressed files
> --
>
> Key: HIVE-24224
> URL: https://issues.apache.org/jira/browse/HIVE-24224
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Compressed file with Hive on Tez  returns header and footers - for both 
> select * and select count ( * ):
> {noformat}
> printf "offset,id,other\n9,\"20200315 X00 1356\",123\n17,\"20200315 X00 
> 1357\",123\nrst,rst,rst" > data.csv
> hdfs dfs -put -f data.csv /apps/hive/warehouse/bz2test/bz2tbl1/
> bzip2 -f data.csv 
> hdfs dfs -put -f data.csv.bz2 /apps/hive/warehouse/bz2test/bz2tbl2/
> beeline -e "CREATE EXTERNAL TABLE default.bz2tst2 (
>   sequence   int,
>   id string,
>   other  string) 
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> LOCATION '/apps/hive/warehouse/bz2test/bz2tbl2' 
> TBLPROPERTIES (
>   'skip.header.line.count'='1',
>   'skip.footer.line.count'='1');"
> beeline -e "
>   SET hive.fetch.task.conversion = none;
>   SELECT * FROM default.bz2tst2;"
> +---+++
> | bz2tst2.sequence  | bz2tst2.id | bz2tst2.other  |
> +---+++
> | offset| id | other  |
> | 9 | 20200315 X00 1356  | 123|
> | 17| 20200315 X00 1357  | 123|
> | rst   | rst| rst|
> +---+++
> {noformat}
> PS: HIVE-22769 addressed the issue for Hive on LLAP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24224) Fix skipping header/footer for Hive on Tez on compressed files

2020-10-02 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-24224:
-


> Fix skipping header/footer for Hive on Tez on compressed files
> --
>
> Key: HIVE-24224
> URL: https://issues.apache.org/jira/browse/HIVE-24224
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>
> Compressed file with Hive on Tez  returns header and footers - for both 
> select * and select count ( * ):
> {noformat}
> printf "offset,id,other\n9,\"20200315 X00 1356\",123\n17,\"20200315 X00 
> 1357\",123\nrst,rst,rst" > data.csv
> hdfs dfs -put -f data.csv /apps/hive/warehouse/bz2test/bz2tbl1/
> bzip2 -f data.csv 
> hdfs dfs -put -f data.csv.bz2 /apps/hive/warehouse/bz2test/bz2tbl2/
> beeline -e "CREATE EXTERNAL TABLE default.bz2tst2 (
>   sequence   int,
>   id string,
>   other  string) 
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> LOCATION '/apps/hive/warehouse/bz2test/bz2tbl2' 
> TBLPROPERTIES (
>   'skip.header.line.count'='1',
>   'skip.footer.line.count'='1');"
> beeline -e "
>   SET hive.fetch.task.conversion = none;
>   SELECT * FROM default.bz2tst2;"
> +---+++
> | bz2tst2.sequence  | bz2tst2.id | bz2tst2.other  |
> +---+++
> | offset| id | other  |
> | 9 | 20200315 X00 1356  | 123|
> | 17| 20200315 X00 1357  | 123|
> | rst   | rst| rst|
> +---+++
> {noformat}
> PS: HIVE-22769 addressed the issue for Hive on LLAP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=493896&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493896
 ]

ASF GitHub Bot logged work on HIVE-22826:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 11:09
Start Date: 02/Oct/20 11:09
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #1528:
URL: https://github.com/apache/hive/pull/1528#discussion_r498651478



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/column/change/AlterTableChangeColumnOperation.java
##
@@ -72,6 +74,10 @@ protected void doAlteration(Table table, Partition 
partition) throws HiveExcepti
 if (desc.getNewColumnComment() != null) {
   oldColumn.setComment(desc.getNewColumnComment());
 }
+if (CollectionUtils.isNotEmpty(sd.getBucketCols()) && 
sd.getBucketCols().contains(oldColumnName)) {
+  sd.getBucketCols().remove(oldColumnName);
+  sd.getBucketCols().add(desc.getNewColumnName());

Review comment:
   Should we store it in lower-case?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/column/change/AlterTableChangeColumnOperation.java
##
@@ -72,6 +74,10 @@ protected void doAlteration(Table table, Partition 
partition) throws HiveExcepti
 if (desc.getNewColumnComment() != null) {
   oldColumn.setComment(desc.getNewColumnComment());
 }
+if (CollectionUtils.isNotEmpty(sd.getBucketCols()) && 
sd.getBucketCols().contains(oldColumnName)) {

Review comment:
   Is oldColumnName case in-sensitive?

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
##
@@ -130,6 +130,11 @@ public void alterTable(RawStore msdb, Warehouse wh, String 
catName, String dbnam
   throw new InvalidOperationException("Invalid column " + validate);
 }
 
+// Validate bucketedColumns in new table
+if (!MetaStoreServerUtils.validateBucketColumns(newt.getSd())) {
+  throw new InvalidOperationException("Bucket column doesn't match with 
any table columns");

Review comment:
   Useful to add an error log with the column name which is missing from 
table columns list.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
##
@@ -1554,4 +1555,32 @@ public static Partition createMetaPartitionObject(Table 
tbl, Map
 }
 return tpart;
   }
+
+  /**
+   * Validate bucket columns should belong to table columns.
+   * @param sd StorageDescriptor of given table
+   * @return true if bucket columns are empty or belong to table columns else 
false
+   */
+  public static boolean validateBucketColumns(StorageDescriptor sd) {
+List columnNames = getColumnNames(sd.getCols());
+if(CollectionUtils.isNotEmpty(sd.getBucketCols()) &&  
CollectionUtils.isNotEmpty(columnNames)){
+  return 
columnNames.containsAll(sd.getBucketCols().stream().map(String::toLowerCase).collect(Collectors.toList()));
+} else if (CollectionUtils.isNotEmpty(sd.getBucketCols()) &&  
CollectionUtils.isEmpty(columnNames)) {
+  return false;
+} else {
+  return true;
+}
+  }
+
+  /**
+   * Generate column name list  from the fieldSchema list
+   * @param cols fieldSchema list
+   * @return column name list
+   */
+  public static List getColumnNames(List cols) {
+if (CollectionUtils.isNotEmpty(cols)) {
+  return 
cols.stream().map(FieldSchema::getName).collect(Collectors.toList());

Review comment:
   Will cols always have names in lower case?

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
##
@@ -1554,4 +1555,32 @@ public static Partition createMetaPartitionObject(Table 
tbl, Map
 }
 return tpart;
   }
+
+  /**
+   * Validate bucket columns should belong to table columns.
+   * @param sd StorageDescriptor of given table
+   * @return true if bucket columns are empty or belong to table columns else 
false
+   */
+  public static boolean validateBucketColumns(StorageDescriptor sd) {
+List columnNames = getColumnNames(sd.getCols());
+if(CollectionUtils.isNotEmpty(sd.getBucketCols()) &&  
CollectionUtils.isNotEmpty(columnNames)){

Review comment:
   nit: Add space before "("

##
File path: 
ql/src/test/queries/clientpositive/alter_numbuckets_partitioned_table_h23.q
##
@@ -52,6 +52,12 @@ alter table tst1_n1 clustered by (value) into 12 buckets;
 
 describe formatted tst1_n1;
 
+-- Test changing name of bucket column
+
+alter table tst1_n1 change key keys string;
+
+describe formatted tst1_n1;

Review comment:
   Also check the output of "show create table" command.

##
File path: 
ql/src/test/queries/c

[jira] [Work logged] (HIVE-22826) ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22826?focusedWorklogId=493897&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493897
 ]

ASF GitHub Bot logged work on HIVE-22826:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 11:09
Start Date: 02/Oct/20 11:09
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #1528:
URL: https://github.com/apache/hive/pull/1528#discussion_r498652268



##
File path: 
ql/src/test/queries/clientpositive/alter_bucketedtable_change_column.q
##
@@ -0,0 +1,10 @@
+--! qt:dataset:src
+create table alter_bucket_change_col_t1(key string, value string) partitioned 
by (ds string) clustered by (key) into 10 buckets;
+
+describe formatted alter_bucket_change_col_t1;
+
+-- Test changing name of bucket column
+
+alter table alter_bucket_change_col_t1 change key keys string;

Review comment:
   Add test for column names with mix of upper and lower case letters.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 493897)
Time Spent: 1h 20m  (was: 1h 10m)

>  ALTER TABLE RENAME COLUMN doesn't update list of bucketed column names
> ---
>
> Key: HIVE-22826
> URL: https://issues.apache.org/jira/browse/HIVE-22826
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: unitTest.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Compaction for tables where a bucketed column has been renamed fails since 
> the list of bucketed columns in the StorageDescriptor doesn't get updated 
> when the column is renamed, therefore we can't recreate the table correctly 
> during compaction.
> Attached a unit test that fails.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24223) Insert into Hive tables doesn't work for decimal numbers with no preceding digit before the decimal

2020-10-02 Thread Kriti Jha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kriti Jha updated HIVE-24223:
-
Description: 
Any insert operation to a table in Hive with decimal integers without a digit 
before the DOT ('.') fails with an exception as shown below:

--

hive> create table test_dec(id decimal(10,8));

hive> insert into test_dec values (-.5);

NoViableAltException(16@[412:1: atomExpression : ( constant | ( 
intervalExpression )=> intervalExpression | castExpression | extractExpression 
| floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> 
( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP 
subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | 
expressionsInParenthesis[true] );])NoViableAltException(16@[412:1: 
atomExpression : ( constant | ( intervalExpression )=> intervalExpression | 
castExpression | extractExpression | floorExpression | caseExpression | 
whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( 
TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN 
)=> function | tableOrColumn | expressionsInParenthesis[true] );]) at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31810)
 at org.antlr.runtime.DFA.predict(DFA.java:80) at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6988)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnaryPrefixExpression(HiveParser_IdentifiersParser.java:7324)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnarySuffixExpression(HiveParser_IdentifiersParser.java:7380)
 at 
 ...
 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expression(HiveParser_IdentifiersParser.java:6686)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsNotInParenthesis(HiveParser_IdentifiersParser.java:2287)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsInParenthesis(HiveParser_IdentifiersParser.java:2233)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.expressionsInParenthesis(HiveParser.java:42106)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:6499)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:6583)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:6704)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:41954) 
at 
org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36536) 
at 
org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822)
 at
 ...
 org.apache.hadoop.util.RunJar.main(RunJar.java:153)FAILED: ParseException line 
1:30 cannot recognize input near '.' '5' ')' in expression specification

--

It seems to be coming from the Lexer where the types are defined and the 
definition of 'Number' should be coming into play: 
 -- 
 Number : (Digit)+ ( DOT (Digit)* (Exponent)? | Exponent)? ; 
 -- 
 >[https://github.com/apache/hive/blob/2006e52713508a92fb4d1d28262fd7175eade8b7/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerParent.g#L469]

However, the below works:

> insert into test_dec values ('-.5');
 > insert into test_dec values (-0.5);

  was:
Any insert operation to a table in Hive with decimal integers without a digit 
before the DOT ('.') fails with an exception as shown below:

--

hive> create table test_dec(id decimal(10,8));

hive> insert into test_dec values (-.5);

NoViableAltException(16@[412:1: atomExpression : ( constant | ( 
intervalExpression )=> intervalExpression | castExpression | extractExpression 
| floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> 
( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP 
subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | 
expressionsInParenthesis[true] );])NoViableAltException(16@[412:1: 
atomExpression : ( constant | ( intervalExpression )=> intervalExpression | 
castExpression | extractExpression | floorExpression | caseExpression | 
whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( 
TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN 
)=> function | tableOrColumn | expressionsInParenthesis[true] );]) at 
or

[jira] [Updated] (HIVE-24223) Insert into Hive tables doesn't work for decimal numbers with no preceding digit before the decimal

2020-10-02 Thread Kriti Jha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kriti Jha updated HIVE-24223:
-
Description: 
Any insert operation to a table in Hive with decimal integers without a digit 
before the DOT ('.') fails with an exception as shown below:

--

hive> create table test_dec(id decimal(10,8));

hive> insert into test_dec values (-.5);

NoViableAltException(16@[412:1: atomExpression : ( constant | ( 
intervalExpression )=> intervalExpression | castExpression | extractExpression 
| floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> 
( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP 
subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | 
expressionsInParenthesis[true] );])NoViableAltException(16@[412:1: 
atomExpression : ( constant | ( intervalExpression )=> intervalExpression | 
castExpression | extractExpression | floorExpression | caseExpression | 
whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( 
TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN 
)=> function | tableOrColumn | expressionsInParenthesis[true] );]) at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31810)
 at org.antlr.runtime.DFA.predict(DFA.java:80) at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6988)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnaryPrefixExpression(HiveParser_IdentifiersParser.java:7324)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnarySuffixExpression(HiveParser_IdentifiersParser.java:7380)
 at 
 ...
 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expression(HiveParser_IdentifiersParser.java:6686)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsNotInParenthesis(HiveParser_IdentifiersParser.java:2287)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsInParenthesis(HiveParser_IdentifiersParser.java:2233)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.expressionsInParenthesis(HiveParser.java:42106)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:6499)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:6583)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:6704)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:41954) 
at 
org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36536) 
at 
org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822)
 at
 ...
 org.apache.hadoop.util.RunJar.main(RunJar.java:153)FAILED: ParseException line 
1:30 cannot recognize input near '.' '5' ')' in expression specification

--

It seems to be coming from the Lexer where the types are defined and the 
definition of 'Number' should be coming into play: 
 -- 
 Number : (Digit)+ ( DOT (Digit)* (Exponent)? | Exponent)? ; 
 -- 
 >[https://github.com/apache/hive/blob/2006e52713508a92fb4d1d28262fd7175eade8b7/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerParent.g#L469]

However, the below works:

-> insert into test_dec values ('-.5');
-> insert into test_dec values (-0.5);

  was:
Any insert operation to a table in Hive with decimal integers without a digit 
before the DOT ('.') fails with an exception as shown below:

--

hive> create table test_dec(id decimal(10,8));

hive> insert into test_dec values (-.5);

NoViableAltException(16@[412:1: atomExpression : ( constant | ( 
intervalExpression )=> intervalExpression | castExpression | extractExpression 
| floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> 
( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP 
subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | 
expressionsInParenthesis[true] );])NoViableAltException(16@[412:1: 
atomExpression : ( constant | ( intervalExpression )=> intervalExpression | 
castExpression | extractExpression | floorExpression | caseExpression | 
whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( 
TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN 
)=> function | tableOrColumn | expressionsInParenthesis[true] );]) at 
org.apache.hadoop

[jira] [Updated] (HIVE-24223) Insert into Hive tables doesn't work for decimal numbers with no preceding digit before the decimal

2020-10-02 Thread Kriti Jha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kriti Jha updated HIVE-24223:
-
Description: 
Any insert operation to a table in Hive with decimal integers without a digit 
before the DOT ('.') fails with an exception as shown below:

--

hive> create table test_dec(id decimal(10,8));

hive> insert into test_dec values (-.5);

NoViableAltException(16@[412:1: atomExpression : ( constant | ( 
intervalExpression )=> intervalExpression | castExpression | extractExpression 
| floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> 
( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP 
subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | 
expressionsInParenthesis[true] );])NoViableAltException(16@[412:1: 
atomExpression : ( constant | ( intervalExpression )=> intervalExpression | 
castExpression | extractExpression | floorExpression | caseExpression | 
whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( 
TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN 
)=> function | tableOrColumn | expressionsInParenthesis[true] );]) at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31810)
 at org.antlr.runtime.DFA.predict(DFA.java:80) at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6988)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnaryPrefixExpression(HiveParser_IdentifiersParser.java:7324)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnarySuffixExpression(HiveParser_IdentifiersParser.java:7380)
 at 
 ...
 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expression(HiveParser_IdentifiersParser.java:6686)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsNotInParenthesis(HiveParser_IdentifiersParser.java:2287)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsInParenthesis(HiveParser_IdentifiersParser.java:2233)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.expressionsInParenthesis(HiveParser.java:42106)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:6499)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:6583)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:6704)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:41954) 
at 
org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36536) 
at 
org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822)
 at
 ...
 org.apache.hadoop.util.RunJar.main(RunJar.java:153)FAILED: ParseException line 
1:30 cannot recognize input near '.' '5' ')' in expression specification

--

It seems to be coming from the Lexer where the types are defined and the 
definition of 'Number' should be coming into play: 
 -- 
 Number : (Digit)+ ( DOT (Digit)* (Exponent)? | Exponent)? ; 
 -- 
 >[https://github.com/apache/hive/blob/2006e52713508a92fb4d1d28262fd7175eade8b7/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerParent.g#L469]

However, the below works:

> insert into test_dec values ('-.5');
> insert into test_dec values (-0.5);

  was:
Any insert operation to a table in Hive with decimal integers without a digit 
before the DOT ('.') fails with an exception as shown below:

--

hive> create table test_dec(id decimal(10,8));

hive> insert into test_dec values (-.5);

NoViableAltException(16@[412:1: atomExpression : ( constant | ( 
intervalExpression )=> intervalExpression | castExpression | extractExpression 
| floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> 
( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP 
subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | 
expressionsInParenthesis[true] );])NoViableAltException(16@[412:1: 
atomExpression : ( constant | ( intervalExpression )=> intervalExpression | 
castExpression | extractExpression | floorExpression | caseExpression | 
whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( 
TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN 
)=> function | tableOrColumn | expressionsInParenthesis[true] );]) at 
org.apache.hadoop.h

[jira] [Updated] (HIVE-24223) Insert into Hive tables doesn't work for decimal numbers with no preceding digit before the decimal

2020-10-02 Thread Kriti Jha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kriti Jha updated HIVE-24223:
-
Description: 
Any insert operation to a table in Hive with decimal integers without a digit 
before the DOT ('.') fails with an exception as shown below:

--

hive> create table test_dec(id decimal(10,8));

hive> insert into test_dec values (-.5);

NoViableAltException(16@[412:1: atomExpression : ( constant | ( 
intervalExpression )=> intervalExpression | castExpression | extractExpression 
| floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> 
( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP 
subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | 
expressionsInParenthesis[true] );])NoViableAltException(16@[412:1: 
atomExpression : ( constant | ( intervalExpression )=> intervalExpression | 
castExpression | extractExpression | floorExpression | caseExpression | 
whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( 
TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN 
)=> function | tableOrColumn | expressionsInParenthesis[true] );]) at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31810)
 at org.antlr.runtime.DFA.predict(DFA.java:80) at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6988)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnaryPrefixExpression(HiveParser_IdentifiersParser.java:7324)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnarySuffixExpression(HiveParser_IdentifiersParser.java:7380)
 at 
 ...
 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expression(HiveParser_IdentifiersParser.java:6686)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsNotInParenthesis(HiveParser_IdentifiersParser.java:2287)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expressionsInParenthesis(HiveParser_IdentifiersParser.java:2233)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.expressionsInParenthesis(HiveParser.java:42106)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:6499)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:6583)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:6704)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:41954) 
at 
org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:36536) 
at 
org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:35822)
 at
 ...
 org.apache.hadoop.util.RunJar.main(RunJar.java:153)FAILED: ParseException line 
1:30 cannot recognize input near '.' '5' ')' in expression specification

--

It seems to be coming from the Lexer where the types are defined and the 
definition of 'Number' should be coming into play: 
 -- 
 Number : (Digit)+ ( DOT (Digit)* (Exponent)? | Exponent)? ; 
 -- 
 >[https://github.com/apache/hive/blob/2006e52713508a92fb4d1d28262fd7175eade8b7/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerParent.g#L469]

However, the below works:

--> insert into test_dec values ('-.5');
--> insert into test_dec values (-0.5);

  was:
Any insert operation to a table in Hive with decimal integers without a digit 
before the DOT ('.') fails with an exception as shown below:

--

hive> create table test_dec(id decimal(10,8));

hive> insert into test_dec values (-.5);

NoViableAltException(16@[412:1: atomExpression : ( constant | ( 
intervalExpression )=> intervalExpression | castExpression | extractExpression 
| floorExpression | caseExpression | whenExpression | ( subQueryExpression )=> 
( subQueryExpression ) -> ^( TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP 
subQueryExpression ) | ( functionName LPAREN )=> function | tableOrColumn | 
expressionsInParenthesis[true] );])NoViableAltException(16@[412:1: 
atomExpression : ( constant | ( intervalExpression )=> intervalExpression | 
castExpression | extractExpression | floorExpression | caseExpression | 
whenExpression | ( subQueryExpression )=> ( subQueryExpression ) -> ^( 
TOK_SUBQUERY_EXPR TOK_SUBQUERY_OP subQueryExpression ) | ( functionName LPAREN 
)=> function | tableOrColumn | expressionsInParenthesis[true] );]) at 
org.apache.hado

[jira] [Commented] (HIVE-24221) Use vectorizable expression to combine multiple columns in semijoin bloom filters

2020-10-02 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206086#comment-17206086
 ] 

Stamatis Zampetakis commented on HIVE-24221:


There are various ways to create a hash from composite keys/columns. Without 
any special effort to derive a perfect hash function we can do the following:

Input columns: A, B, C, D

+Option A:+ 
{noformat}
hash(hash(hash(A, B), C), D)
{noformat}

+Option B:+ 
{noformat}
31*(31*(31 * hash(A) + hash(B)) + hash(C)) + hash(D)
{noformat}

The second option is more or less what happens currently when we write hash(A, 
B, C, D) in the non-vectorized implementation of GenericUDFMurmurHash.
The first option although it looks simpler is computationally more expensive.

> Use vectorizable expression to combine multiple columns in semijoin bloom 
> filters
> -
>
> Key: HIVE-24221
> URL: https://issues.apache.org/jira/browse/HIVE-24221
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
> Environment: 
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, multi-column semijoin reducers use an n-ary call to 
> GenericUDFMurmurHash to combine multiple values into one, which is used as an 
> entry to the bloom filter. However, there are no vectorized operators that 
> treat n-ary inputs. The same goes for the vectorized implementation of 
> GenericUDFMurmurHash introduced in HIVE-23976. 
> The goal of this issue is to choose an alternative way to combine multiple 
> values into one to pass in the bloom filter comprising only vectorized 
> operators.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-19253) HMS ignores tableType property for external tables

2020-10-02 Thread Szehon Ho (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-19253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206064#comment-17206064
 ] 

Szehon Ho commented on HIVE-19253:
--

i think there's no new test failures [~vihangk1] what do you think about it?

> HMS ignores tableType property for external tables
> --
>
> Key: HIVE-19253
> URL: https://issues.apache.org/jira/browse/HIVE-19253
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0, 3.1.0, 4.0.0
>Reporter: Alex Kolbasov
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: newbie, pull-request-available
> Attachments: HIVE-19253.01.patch, HIVE-19253.02.patch, 
> HIVE-19253.03.patch, HIVE-19253.03.patch, HIVE-19253.04.patch, 
> HIVE-19253.05.patch, HIVE-19253.06.patch, HIVE-19253.07.patch, 
> HIVE-19253.08.patch, HIVE-19253.09.patch, HIVE-19253.10.patch, 
> HIVE-19253.11.patch, HIVE-19253.12.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When someone creates a table using Thrift API they may think that setting 
> tableType to {{EXTERNAL_TABLE}} creates an external table. And boom - their 
> table is gone later because HMS will silently change it to managed table.
> here is the offending code:
> {code:java}
>   private MTable convertToMTable(Table tbl) throws InvalidObjectException,
>   MetaException {
> ...
> // If the table has property EXTERNAL set, update table type
> // accordingly
> String tableType = tbl.getTableType();
> boolean isExternal = 
> Boolean.parseBoolean(tbl.getParameters().get("EXTERNAL"));
> if (TableType.MANAGED_TABLE.toString().equals(tableType)) {
>   if (isExternal) {
> tableType = TableType.EXTERNAL_TABLE.toString();
>   }
> }
> if (TableType.EXTERNAL_TABLE.toString().equals(tableType)) {
>   if (!isExternal) { // Here!
> tableType = TableType.MANAGED_TABLE.toString();
>   }
> }
> {code}
> So if the EXTERNAL parameter is not set, table type is changed to managed 
> even if it was external in the first place - which is wrong.
> More over, in other places code looks at the table property to decide table 
> type and some places look at parameter. HMS should really make its mind which 
> one to use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24213) Incorrect exception in the Merge MapJoinTask into its child MapRedTask optimizer

2020-10-02 Thread Marta Kuczora (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora updated HIVE-24213:
-
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master.

Thanks a lot [~zmatyus] for the patch and [~kgyrtkirk] for the review!

> Incorrect exception in the Merge MapJoinTask into its child MapRedTask 
> optimizer
> 
>
> Key: HIVE-24213
> URL: https://issues.apache.org/jira/browse/HIVE-24213
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 4.0.0
>Reporter: Zoltan Matyus
>Assignee: Zoltan Matyus
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The {{CommonJoinTaskDispatcher#mergeMapJoinTaskIntoItsChildMapRedTask}} 
> method throws a {{SemanticException}} if the number of {{FileSinkOperator}}s 
> it finds is not exactly 1. The exception is valid if zero operators are 
> found, but there can be valid use cases where multiple FileSinkOperators 
> exist.
> Example: the MapJoin and it child are used in a common table expression, 
> which is used for multiple inserts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24213) Incorrect exception in the Merge MapJoinTask into its child MapRedTask optimizer

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24213?focusedWorklogId=493867&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493867
 ]

ASF GitHub Bot logged work on HIVE-24213:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 09:01
Start Date: 02/Oct/20 09:01
Worklog Time Spent: 10m 
  Work Description: kuczoram merged pull request #1539:
URL: https://github.com/apache/hive/pull/1539


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 493867)
Time Spent: 20m  (was: 10m)

> Incorrect exception in the Merge MapJoinTask into its child MapRedTask 
> optimizer
> 
>
> Key: HIVE-24213
> URL: https://issues.apache.org/jira/browse/HIVE-24213
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 4.0.0
>Reporter: Zoltan Matyus
>Assignee: Zoltan Matyus
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The {{CommonJoinTaskDispatcher#mergeMapJoinTaskIntoItsChildMapRedTask}} 
> method throws a {{SemanticException}} if the number of {{FileSinkOperator}}s 
> it finds is not exactly 1. The exception is valid if zero operators are 
> found, but there can be valid use cases where multiple FileSinkOperators 
> exist.
> Example: the MapJoin and it child are used in a common table expression, 
> which is used for multiple inserts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24222) Upgrade ORC to 1.5.12

2020-10-02 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-24222:
--
Issue Type: Improvement  (was: Bug)

> Upgrade ORC to 1.5.12
> -
>
> Key: HIVE-24222
> URL: https://issues.apache.org/jira/browse/HIVE-24222
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24222) Upgrade ORC to 1.5.12

2020-10-02 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-24222:
-

Assignee: Dongjoon Hyun

> Upgrade ORC to 1.5.12
> -
>
> Key: HIVE-24222
> URL: https://issues.apache.org/jira/browse/HIVE-24222
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-21375) Closing TransactionBatch closes FileSystem for other batches in Hive streaming v1

2020-10-02 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita resolved HIVE-21375.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Committed to branch-3. Thanks Peter for reviewing.

> Closing TransactionBatch closes FileSystem for other batches in Hive 
> streaming v1
> -
>
> Key: HIVE-21375
> URL: https://issues.apache.org/jira/browse/HIVE-21375
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Streaming
>Affects Versions: 3.2.0
>Reporter: Shawn Weeks
>Assignee: Ádám Szita
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The patch in HIVE-13151 added FileSystem.closeAllForUGI(ugi); to the close 
> method of HiveEndPoint for the legacy Streaming API. This seems to have a 
> side effect of closing the FileSystem for all open TransactionBatches as used 
> by NiFi and Storm when writing to multiple partitions. Setting 
> fs.hdfs.impl.disable.cache=true negates the issue but at a performance cost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-21375) Closing TransactionBatch closes FileSystem for other batches in Hive streaming v1

2020-10-02 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita updated HIVE-21375:
--
Affects Version/s: (was: 3.2.0)
   3.1.3

> Closing TransactionBatch closes FileSystem for other batches in Hive 
> streaming v1
> -
>
> Key: HIVE-21375
> URL: https://issues.apache.org/jira/browse/HIVE-21375
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Streaming
>Affects Versions: 3.1.3
>Reporter: Shawn Weeks
>Assignee: Ádám Szita
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The patch in HIVE-13151 added FileSystem.closeAllForUGI(ugi); to the close 
> method of HiveEndPoint for the legacy Streaming API. This seems to have a 
> side effect of closing the FileSystem for all open TransactionBatches as used 
> by NiFi and Storm when writing to multiple partitions. Setting 
> fs.hdfs.impl.disable.cache=true negates the issue but at a performance cost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21375) Closing TransactionBatch closes FileSystem for other batches in Hive streaming v1

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21375?focusedWorklogId=493861&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493861
 ]

ASF GitHub Bot logged work on HIVE-21375:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 08:19
Start Date: 02/Oct/20 08:19
Worklog Time Spent: 10m 
  Work Description: szlta merged pull request #1541:
URL: https://github.com/apache/hive/pull/1541


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 493861)
Time Spent: 1h  (was: 50m)

> Closing TransactionBatch closes FileSystem for other batches in Hive 
> streaming v1
> -
>
> Key: HIVE-21375
> URL: https://issues.apache.org/jira/browse/HIVE-21375
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Streaming
>Affects Versions: 3.2.0
>Reporter: Shawn Weeks
>Assignee: Ádám Szita
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The patch in HIVE-13151 added FileSystem.closeAllForUGI(ugi); to the close 
> method of HiveEndPoint for the legacy Streaming API. This seems to have a 
> side effect of closing the FileSystem for all open TransactionBatches as used 
> by NiFi and Storm when writing to multiple partitions. Setting 
> fs.hdfs.impl.disable.cache=true negates the issue but at a performance cost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21375) Closing TransactionBatch closes FileSystem for other batches in Hive streaming v1

2020-10-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21375?focusedWorklogId=493860&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493860
 ]

ASF GitHub Bot logged work on HIVE-21375:
-

Author: ASF GitHub Bot
Created on: 02/Oct/20 08:12
Start Date: 02/Oct/20 08:12
Worklog Time Spent: 10m 
  Work Description: szlta commented on pull request #1541:
URL: https://github.com/apache/hive/pull/1541#issuecomment-702591651


   Cherry-picked changes here that allow test run on this PR. There were many 
test failures as branch-3 is in bad shape currently, but none of those seemed 
to be related to hcatalog streaming, thus I'm moving forward with this change.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 493860)
Time Spent: 50m  (was: 40m)

> Closing TransactionBatch closes FileSystem for other batches in Hive 
> streaming v1
> -
>
> Key: HIVE-21375
> URL: https://issues.apache.org/jira/browse/HIVE-21375
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Streaming
>Affects Versions: 3.2.0
>Reporter: Shawn Weeks
>Assignee: Ádám Szita
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The patch in HIVE-13151 added FileSystem.closeAllForUGI(ugi); to the close 
> method of HiveEndPoint for the legacy Streaming API. This seems to have a 
> side effect of closing the FileSystem for all open TransactionBatches as used 
> by NiFi and Storm when writing to multiple partitions. Setting 
> fs.hdfs.impl.disable.cache=true negates the issue but at a performance cost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

67 matches

Mail list logo