date:20210811

[jira] [Updated] (HIVE-25446) VectorMapJoinFastHashTable.validateCapacity AssertionError: Capacity must be a power of two

2021-08-11 Thread Matt McCline (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-25446:

Description: 
Encountered this in a very large query:

Caused by: java.lang.AssertionError: Capacity must be a power of two

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.validateCapacity(VectorMapJoinFastHashTable.java:60)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.(VectorMapJoinFastHashTable.java:77)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashTable.(VectorMapJoinFastBytesHashTable.java:132)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.(VectorMapJoinFastBytesHashMap.java:166)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMap.(VectorMapJoinFastStringHashMap.java:43)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.createHashTable(VectorMapJoinFastTableContainer.java:137)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.(VectorMapJoinFastTableContainer.java:86)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:122)

   at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:344)

   at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:413)

   at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.lambda$initializeOp$0(MapJoinOperator.java:215)

   at 
org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:96)

   at 
org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:113)

   at java.util.concurrent.FutureTask.run(FutureTask.java:266)

> VectorMapJoinFastHashTable.validateCapacity AssertionError: Capacity must be 
> a power of two
> ---
>
> Key: HIVE-25446
> URL: https://issues.apache.org/jira/browse/HIVE-25446
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
> Fix For: 4.0.0
>
>
> Encountered this in a very large query:
> Caused by: java.lang.AssertionError: Capacity must be a power of two
>    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.validateCapacity(VectorMapJoinFastHashTable.java:60)
>    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.(VectorMapJoinFastHashTable.java:77)
>    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashTable.(VectorMapJoinFastBytesHashTable.java:132)
>    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.(VectorMapJoinFastBytesHashMap.java:166)
>    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMap.(VectorMapJoinFastStringHashMap.java:43)
>    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.createHashTable(VectorMapJoinFastTableContainer.java:137)
>    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.(VectorMapJoinFastTableContainer.java:86)
>    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:122)
>    at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:344)
>    at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:413)
>    at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.lambda$initializeOp$0(MapJoinOperator.java:215)
>    at 
> org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:96)
>    at 
> org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:113)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:266)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25446) VectorMapJoinFastHashTable.validateCapacity AssertionError: Capacity must be a power of two

2021-08-11 Thread Matt McCline (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-25446:

Environment: (was: Encountered this in a very large query:

Caused by: java.lang.AssertionError: Capacity must be a power of two

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.validateCapacity(VectorMapJoinFastHashTable.java:60)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.(VectorMapJoinFastHashTable.java:77)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashTable.(VectorMapJoinFastBytesHashTable.java:132)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.(VectorMapJoinFastBytesHashMap.java:166)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMap.(VectorMapJoinFastStringHashMap.java:43)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.createHashTable(VectorMapJoinFastTableContainer.java:137)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.(VectorMapJoinFastTableContainer.java:86)

   at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:122)

   at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:344)

   at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:413)

   at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.lambda$initializeOp$0(MapJoinOperator.java:215)

   at 
org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:96)

   at 
org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:113)

   at java.util.concurrent.FutureTask.run(FutureTask.java:266))

> VectorMapJoinFastHashTable.validateCapacity AssertionError: Capacity must be 
> a power of two
> ---
>
> Key: HIVE-25446
> URL: https://issues.apache.org/jira/browse/HIVE-25446
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25446) VectorMapJoinFastHashTable.validateCapacity AssertionError: Capacity must be a power of two

2021-08-11 Thread Matt McCline (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-25446:
---


> VectorMapJoinFastHashTable.validateCapacity AssertionError: Capacity must be 
> a power of two
> ---
>
> Key: HIVE-25446
> URL: https://issues.apache.org/jira/browse/HIVE-25446
> Project: Hive
>  Issue Type: Bug
> Environment: Encountered this in a very large query:
> Caused by: java.lang.AssertionError: Capacity must be a power of two
>    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.validateCapacity(VectorMapJoinFastHashTable.java:60)
>    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.(VectorMapJoinFastHashTable.java:77)
>    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashTable.(VectorMapJoinFastBytesHashTable.java:132)
>    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.(VectorMapJoinFastBytesHashMap.java:166)
>    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMap.(VectorMapJoinFastStringHashMap.java:43)
>    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.createHashTable(VectorMapJoinFastTableContainer.java:137)
>    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.(VectorMapJoinFastTableContainer.java:86)
>    at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:122)
>    at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:344)
>    at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:413)
>    at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.lambda$initializeOp$0(MapJoinOperator.java:215)
>    at 
> org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:96)
>    at 
> org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:113)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25445) Enable JdbcStorageHandler to get password from AWS Secrets Service.

2021-08-11 Thread Harish JP (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish JP reassigned HIVE-25445:



> Enable JdbcStorageHandler to get password from AWS Secrets Service.
> ---
>
> Key: HIVE-25445
> URL: https://issues.apache.org/jira/browse/HIVE-25445
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Harish JP
>Assignee: Harish JP
>Priority: Major
>
> Currently, password for JdbcStorageHandler can be set only via the password 
> field or keystore. This Jira is to add framework to fetch password from any 
> source and implement AWS Secrets Manager as a source.
>  
> The approach takes is to use a new table property dbcp.password.uri which 
> will be used if password and keyfile are not available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25444) Make tables based on storage handlers authorization (HIVE-24705) configurable.

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25444:
--
Labels: pull-request-available  (was: )

> Make tables based on storage handlers authorization (HIVE-24705) configurable.
> --
>
> Key: HIVE-25444
> URL: https://issues.apache.org/jira/browse/HIVE-25444
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Using a config "hive.security.authorization.tables.on.storagehandlers" with 
> default true, we'll enable the authorization on storage handlers by default. 
> Authorization is disabled if this config is set to false. 
> Background: Previously, whenever a user is trying to create a table based on 
> a storage handler, the end user we are seeing in the external storage (Ex: 
> hbase, kafka, and druid) is ‘hive’ so we cannot really enforce the condition 
> in ranger on the end-user.
> https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, 
> by enforcing a check in Apache ranger for hive service. This patch had 
> changes in both hive and ranger. (ranger client depends on hive changes). Now 
> the reason why we to make this feature configurable is that users can update 
> hive code but not ranger code. In that case, users see a permission denied 
> error when executing a statement like: {{CREATE TABLE hive_table_0(key int, 
> value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} 
> but user/admin cannot add a ranger policy in the hive because ranger code is 
> not updated. By making this feature configurable,  we’ll unblock users from 
> creating tables based on storage handlers as they were previously doing.
> Users can turn 'off' this config if they don't have updated the ranger code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25444) Make tables based on storage handlers authorization (HIVE-24705) configurable.

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25444?focusedWorklogId=637118=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637118
 ]

ASF GitHub Bot logged work on HIVE-25444:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 20:43
Start Date: 11/Aug/21 20:43
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera opened a new pull request #2583:
URL: https://github.com/apache/hive/pull/2583


   … configurable.
   
   
   
   ### What changes were proposed in this pull request?
   Making the tables based on storage handlers authorization configurable
   
   
   
   ### Why are the changes needed?
   Authorization may fail if ranger code doesn't have HIVE-24705 related patch.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   Local machine, Remote cluster
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637118)
Remaining Estimate: 0h
Time Spent: 10m

> Make tables based on storage handlers authorization (HIVE-24705) configurable.
> --
>
> Key: HIVE-25444
> URL: https://issues.apache.org/jira/browse/HIVE-25444
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Using a config "hive.security.authorization.tables.on.storagehandlers" with 
> default true, we'll enable the authorization on storage handlers by default. 
> Authorization is disabled if this config is set to false. 
> Background: Previously, whenever a user is trying to create a table based on 
> a storage handler, the end user we are seeing in the external storage (Ex: 
> hbase, kafka, and druid) is ‘hive’ so we cannot really enforce the condition 
> in ranger on the end-user.
> https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, 
> by enforcing a check in Apache ranger for hive service. This patch had 
> changes in both hive and ranger. (ranger client depends on hive changes). Now 
> the reason why we to make this feature configurable is that users can update 
> hive code but not ranger code. In that case, users see a permission denied 
> error when executing a statement like: {{CREATE TABLE hive_table_0(key int, 
> value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} 
> but user/admin cannot add a ranger policy in the hive because ranger code is 
> not updated. By making this feature configurable,  we’ll unblock users from 
> creating tables based on storage handlers as they were previously doing.
> Users can turn 'off' this config if they don't have updated the ranger code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25444) Make tables based on storage handlers authorization (HIVE-24705) configurable.

2021-08-11 Thread Sai Hemanth Gantasala (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala updated HIVE-25444:
-
Description: 
Using a config "hive.security.authorization.tables.on.storagehandlers" with 
default true, we'll enable the authorization on storage handlers by default. 
Authorization is disabled if this config is set to false. 

Background: Previously, whenever a user is trying to create a table based on a 
storage handler, the end user we are seeing in the external storage (Ex: hbase, 
kafka, and druid) is ‘hive’ so we cannot really enforce the condition in ranger 
on the end-user.

https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, by 
enforcing a check in Apache ranger for hive service. This patch had changes in 
both hive and ranger. (ranger client depends on hive changes). Now the reason 
why we to make this feature configurable is that users can update hive code but 
not ranger code. In that case, users see a permission denied error when 
executing a statement like: {{CREATE TABLE hive_table_0(key int, value string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} but user/admin 
cannot add a ranger policy in the hive because ranger code is not updated. By 
making this feature configurable,  we’ll unblock users from creating tables 
based on storage handlers as they were previously doing.

Users can turn 'off' this config if they don't have updated the ranger code.

  was:
Using a config "hive.security.authorization.tables.on.storagehandlers" with 
default true, we'll enable the authorization on storage handlers by default. 
Authorization is disabled if this config is set to true. 

Background: Previously, whenever a user is trying to create a table based on a 
storage handler, the end user we are seeing in the external storage (Ex: hbase, 
kafka, and druid) is ‘hive’ so we cannot really enforce the condition in ranger 
on the end-user.

https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, by 
enforcing a check in Apache ranger for hive service. This patch had changes in 
both hive and ranger. (ranger client depends on hive changes.)Now the reason 
why I’m disabling this feature by default is that users can updated hive code 
but not ranger code. In that case, users see a permission denied error when 
executing a statement like: {{CREATE TABLE hive_table_0(key int, value string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} but user/admin 
cannot add a ranger policy in hive because ranger code is not updated. This way 
we’ll unblocked users from creating tables based on storage handlers as they 
were previously doing.Users can turn on this config if they have updated ranger 
code.


> Make tables based on storage handlers authorization (HIVE-24705) configurable.
> --
>
> Key: HIVE-25444
> URL: https://issues.apache.org/jira/browse/HIVE-25444
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> Using a config "hive.security.authorization.tables.on.storagehandlers" with 
> default true, we'll enable the authorization on storage handlers by default. 
> Authorization is disabled if this config is set to false. 
> Background: Previously, whenever a user is trying to create a table based on 
> a storage handler, the end user we are seeing in the external storage (Ex: 
> hbase, kafka, and druid) is ‘hive’ so we cannot really enforce the condition 
> in ranger on the end-user.
> https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, 
> by enforcing a check in Apache ranger for hive service. This patch had 
> changes in both hive and ranger. (ranger client depends on hive changes). Now 
> the reason why we to make this feature configurable is that users can update 
> hive code but not ranger code. In that case, users see a permission denied 
> error when executing a statement like: {{CREATE TABLE hive_table_0(key int, 
> value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} 
> but user/admin cannot add a ranger policy in the hive because ranger code is 
> not updated. By making this feature configurable,  we’ll unblock users from 
> creating tables based on storage handlers as they were previously doing.
> Users can turn 'off' this config if they don't have updated the ranger code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21614) Derby does not support CLOB comparisons

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-21614?focusedWorklogId=637098=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637098
 ]

ASF GitHub Bot logged work on HIVE-21614:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 20:16
Start Date: 11/Aug/21 20:16
Worklog Time Spent: 10m 
  Work Description: hankfanchiu commented on pull request #2484:
URL: https://github.com/apache/hive/pull/2484#issuecomment-897122168


   @pvary, is this good to go?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637098)
Time Spent: 1h  (was: 50m)

> Derby does not support CLOB comparisons
> ---
>
> Key: HIVE-21614
> URL: https://issues.apache.org/jira/browse/HIVE-21614
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.4, 3.0.0
>Reporter: Vlad Rozov
>Assignee: Hank Fanchiu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> HiveMetaStoreClient.listTableNamesByFilter() with non empty filter causes 
> exception with Derby DB:
> {noformat}
> Caused by: ERROR 42818: Comparisons between 'CLOB (UCS_BASIC)' and 'CLOB 
> (UCS_BASIC)' are not supported. Types must be comparable. String types must 
> also have matching collation. If collation does not match, a possible 
> solution is to cast operands to force them to the default collation (e.g. 
> SELECT tablename FROM sys.systables WHERE CAST(tablename AS VARCHAR(128)) = 
> 'T1')
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindComparisonOperator(Unknown
>  Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindExpression(Unknown
>  Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryOperatorNode.bindExpression(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.BinaryLogicalOperatorNode.bindExpression(Unknown
>  Source)
>   at org.apache.derby.impl.sql.compile.AndNode.bindExpression(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.SelectNode.bindExpressions(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.compile.DMLStatementNode.bindExpressions(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.DMLStatementNode.bind(Unknown 
> Source)
>   at org.apache.derby.impl.sql.compile.CursorNode.bindStatement(Unknown 
> Source)
>   at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
>   at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
>   at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
>  Source)
>   ... 42 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25275) OOM during query planning due to HiveJoinPushTransitivePredicatesRule matching infinitely

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25275?focusedWorklogId=637071=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637071
 ]

ASF GitHub Bot logged work on HIVE-25275:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 18:54
Start Date: 11/Aug/21 18:54
Worklog Time Spent: 10m 
  Work Description: asolimando opened a new pull request #2582:
URL: https://github.com/apache/hive/pull/2582


   
   
   ### Organization of the PRs commits:
   
   1. qtest with reproducer added
   2. fix for the HiveJoinAddNotNullRule rule
   3. fix for the HiveFilterProjectTransposeRule
   
   FYI: few minor warnings were addressed in the commits for the two rules.
   
   ### What changes were proposed in this pull request?
   
   
   Redundant _IS NOT NULL_ predicates created by either 
_HiveJoinAddNotNullRule_ or _HiveFilterProjectTransposeRule_ are infinitely 
pulled up by _HiveJoinPushTransitivePredicatesRule_ and then pushed down again 
by the two aforementioned rules, the PR addresses the problem by fixing the two 
rules as follows:
   
   - HiveFilterProjectTransposeRule - do not push-down _IS NOT NULL(EXPR($i))_ 
if _IS NOT NULL($i)_ already exists down the subtree
   - HiveJoinAddNotNullRule - preventing _IS_NOT_NULL(EXPR($i))_ in presence of 
_IS_NOT_NULL($i)_
   
   _IS NOT NULL(EXPR($i))_ predicates are generally redundant in presence of if 
_IS NOT NULL($i)_, and might even hinder performance when a filter tests 
complex expressions.
   
   ### Why are the changes needed?
   
   
   Query planning runs infinitely when some extra IS NOT NULL predicates are 
created.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   The first commit adds a qtest with the reproducer attached to the ticket, 
all qtests were run locally with no issues identified.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637071)
Remaining Estimate: 0h
Time Spent: 10m

> OOM during query planning due to HiveJoinPushTransitivePredicatesRule 
> matching infinitely
> -
>
> Key: HIVE-25275
> URL: https://issues.apache.org/jira/browse/HIVE-25275
> Project: Hive
>  Issue Type: Bug
>Reporter: László Pintér
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While running the following query OOM is raised during the planning phase
> {code:sql}
> CREATE TABLE A (`value_date` date) STORED AS ORC;
> CREATE TABLE B (`business_date` date) STORED AS ORC;
> SELECT A.VALUE_DATE
> FROM A, B
> WHERE A.VALUE_DATE = BUSINESS_DATE
>   AND A.VALUE_DATE = TRUNC(BUSINESS_DATE, 'MONTH');
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25275) OOM during query planning due to HiveJoinPushTransitivePredicatesRule matching infinitely

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25275:
--
Labels: pull-request-available  (was: )

> OOM during query planning due to HiveJoinPushTransitivePredicatesRule 
> matching infinitely
> -
>
> Key: HIVE-25275
> URL: https://issues.apache.org/jira/browse/HIVE-25275
> Project: Hive
>  Issue Type: Bug
>Reporter: László Pintér
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While running the following query OOM is raised during the planning phase
> {code:sql}
> CREATE TABLE A (`value_date` date) STORED AS ORC;
> CREATE TABLE B (`business_date` date) STORED AS ORC;
> SELECT A.VALUE_DATE
> FROM A, B
> WHERE A.VALUE_DATE = BUSINESS_DATE
>   AND A.VALUE_DATE = TRUNC(BUSINESS_DATE, 'MONTH');
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25444) Make tables based on storage handlers authorization (HIVE-24705) configurable.

2021-08-11 Thread Sai Hemanth Gantasala (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala updated HIVE-25444:
-
Description: 
Using a config "hive.security.authorization.tables.on.storagehandlers" with 
default true, we'll enable the authorization on storage handlers by default. 
Authorization is disabled if this config is set to true. 

Background: Previously, whenever a user is trying to create a table based on a 
storage handler, the end user we are seeing in the external storage (Ex: hbase, 
kafka, and druid) is ‘hive’ so we cannot really enforce the condition in ranger 
on the end-user.

https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, by 
enforcing a check in Apache ranger for hive service. This patch had changes in 
both hive and ranger. (ranger client depends on hive changes.)Now the reason 
why I’m disabling this feature by default is that users can updated hive code 
but not ranger code. In that case, users see a permission denied error when 
executing a statement like: {{CREATE TABLE hive_table_0(key int, value string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} but user/admin 
cannot add a ranger policy in hive because ranger code is not updated. This way 
we’ll unblocked users from creating tables based on storage handlers as they 
were previously doing.Users can turn on this config if they have updated ranger 
code.

  was:
Using a config "hive.security.authorization.tables.on.storagehandlers" with 
default false, we'll disable the authorization on storage handlers by default. 
Authorization is enabled if this config is set to true. 

Background: Previously, whenever a user is trying to create a table based on a 
storage handler, the end user we are seeing in the external storage (Ex: hbase, 
kafka, and druid) is ‘hive’ so we cannot really enforce the condition in ranger 
on the end-user.


https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, by 
enforcing a check in Apache ranger for hive service. This patch had changes in 
both hive and ranger. (ranger client depends on hive changes.)Now the reason 
why I’m disabling this feature by default is that users can updated hive code 
but not ranger code. In that case, users see a permission denied error when 
executing a statement like: {{CREATE TABLE hive_table_0(key int, value string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} but user/admin 
cannot add a ranger policy in hive because ranger code is not updated. This way 
we’ll unblocked users from creating tables based on storage handlers as they 
were previously doing.Users can turn on this config if they have updated ranger 
code.


> Make tables based on storage handlers authorization (HIVE-24705) configurable.
> --
>
> Key: HIVE-25444
> URL: https://issues.apache.org/jira/browse/HIVE-25444
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> Using a config "hive.security.authorization.tables.on.storagehandlers" with 
> default true, we'll enable the authorization on storage handlers by default. 
> Authorization is disabled if this config is set to true. 
> Background: Previously, whenever a user is trying to create a table based on 
> a storage handler, the end user we are seeing in the external storage (Ex: 
> hbase, kafka, and druid) is ‘hive’ so we cannot really enforce the condition 
> in ranger on the end-user.
> https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, 
> by enforcing a check in Apache ranger for hive service. This patch had 
> changes in both hive and ranger. (ranger client depends on hive changes.)Now 
> the reason why I’m disabling this feature by default is that users can 
> updated hive code but not ranger code. In that case, users see a permission 
> denied error when executing a statement like: {{CREATE TABLE hive_table_0(key 
> int, value string) STORED BY 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} but user/admin cannot 
> add a ranger policy in hive because ranger code is not updated. This way 
> we’ll unblocked users from creating tables based on storage handlers as they 
> were previously doing.Users can turn on this config if they have updated 
> ranger code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25444) Make tables based on storage handlers authorization (HIVE-24705) configurable.

2021-08-11 Thread Sai Hemanth Gantasala (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala updated HIVE-25444:
-
Summary: Make tables based on storage handlers authorization (HIVE-24705) 
configurable.  (was: Use a config to disable authorization on tables based on 
storage handlers by default.)

> Make tables based on storage handlers authorization (HIVE-24705) configurable.
> --
>
> Key: HIVE-25444
> URL: https://issues.apache.org/jira/browse/HIVE-25444
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> Using a config "hive.security.authorization.tables.on.storagehandlers" with 
> default false, we'll disable the authorization on storage handlers by 
> default. Authorization is enabled if this config is set to true. 
> Background: Previously, whenever a user is trying to create a table based on 
> a storage handler, the end user we are seeing in the external storage (Ex: 
> hbase, kafka, and druid) is ‘hive’ so we cannot really enforce the condition 
> in ranger on the end-user.
> https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, 
> by enforcing a check in Apache ranger for hive service. This patch had 
> changes in both hive and ranger. (ranger client depends on hive changes.)Now 
> the reason why I’m disabling this feature by default is that users can 
> updated hive code but not ranger code. In that case, users see a permission 
> denied error when executing a statement like: {{CREATE TABLE hive_table_0(key 
> int, value string) STORED BY 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} but user/admin cannot 
> add a ranger policy in hive because ranger code is not updated. This way 
> we’ll unblocked users from creating tables based on storage handlers as they 
> were previously doing.Users can turn on this config if they have updated 
> ranger code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25444) Use a config to disable authorization on tables based on storage handlers by default.

2021-08-11 Thread Sai Hemanth Gantasala (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala updated HIVE-25444:
-
Description: 
Using a config "hive.security.authorization.tables.on.storagehandlers" with 
default false, we'll disable the authorization on storage handlers by default. 
Authorization is enabled if this config is set to true. 

Background: Previously, whenever a user is trying to create a table based on a 
storage handler, the end user we are seeing in the external storage (Ex: hbase, 
kafka, and druid) is ‘hive’ so we cannot really enforce the condition in ranger 
on the end-user.


https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, by 
enforcing a check in Apache ranger for hive service. This patch had changes in 
both hive and ranger. (ranger client depends on hive changes.)Now the reason 
why I’m disabling this feature by default is that users can updated hive code 
but not ranger code. In that case, users see a permission denied error when 
executing a statement like: {{CREATE TABLE hive_table_0(key int, value string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} but user/admin 
cannot add a ranger policy in hive because ranger code is not updated. This way 
we’ll unblocked users from creating tables based on storage handlers as they 
were previously doing.Users can turn on this config if they have updated ranger 
code.

  was:
Using a config "hive.security.authorization.tables.on.storagehandlers" with a 
default false, we'll disable the authorization on storage handlers by default. 
Authorization is enabled if this config is set to true. 



Back


> Use a config to disable authorization on tables based on storage handlers by 
> default.
> -
>
> Key: HIVE-25444
> URL: https://issues.apache.org/jira/browse/HIVE-25444
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> Using a config "hive.security.authorization.tables.on.storagehandlers" with 
> default false, we'll disable the authorization on storage handlers by 
> default. Authorization is enabled if this config is set to true. 
> Background: Previously, whenever a user is trying to create a table based on 
> a storage handler, the end user we are seeing in the external storage (Ex: 
> hbase, kafka, and druid) is ‘hive’ so we cannot really enforce the condition 
> in ranger on the end-user.
> https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, 
> by enforcing a check in Apache ranger for hive service. This patch had 
> changes in both hive and ranger. (ranger client depends on hive changes.)Now 
> the reason why I’m disabling this feature by default is that users can 
> updated hive code but not ranger code. In that case, users see a permission 
> denied error when executing a statement like: {{CREATE TABLE hive_table_0(key 
> int, value string) STORED BY 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} but user/admin cannot 
> add a ranger policy in hive because ranger code is not updated. This way 
> we’ll unblocked users from creating tables based on storage handlers as they 
> were previously doing.Users can turn on this config if they have updated 
> ranger code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25444) Use a config to disable authorization on tables based on storage handlers by default.

2021-08-11 Thread Sai Hemanth Gantasala (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala updated HIVE-25444:
-
Summary: Use a config to disable authorization on tables based on storage 
handlers by default.  (was: Use a config to disable authorization on storage 
handlers by default.)

> Use a config to disable authorization on tables based on storage handlers by 
> default.
> -
>
> Key: HIVE-25444
> URL: https://issues.apache.org/jira/browse/HIVE-25444
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> Using a config "hive.security.authorization.tables.on.storagehandlers" with a 
> default false, we'll disable the authorization on storage handlers by 
> default. Authorization is enabled if this config is set to true. 
> Back



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25444) Use a config to disable authorization on storage handlers by default.

2021-08-11 Thread Sai Hemanth Gantasala (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala updated HIVE-25444:
-
Description: 
Using a config "hive.security.authorization.tables.on.storagehandlers" with a 
default false, we'll disable the authorization on storage handlers by default. 
Authorization is enabled if this config is set to true. 



Back

  was:Using a config "hive.security.authorization.tables.on.storagehandlers" 
with a default false, we'll enable the authorization on storage handlers by 
default. Authorization is enabled if this config is set to true. 


> Use a config to disable authorization on storage handlers by default.
> -
>
> Key: HIVE-25444
> URL: https://issues.apache.org/jira/browse/HIVE-25444
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> Using a config "hive.security.authorization.tables.on.storagehandlers" with a 
> default false, we'll disable the authorization on storage handlers by 
> default. Authorization is enabled if this config is set to true. 
> Back



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24705) Create/Alter/Drop tables based on storage handlers in HS2 should be authorized by Ranger/Sentry

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24705?focusedWorklogId=637003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637003
 ]

ASF GitHub Bot logged work on HIVE-24705:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 17:37
Start Date: 11/Aug/21 17:37
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera closed pull request #1960:
URL: https://github.com/apache/hive/pull/1960


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637003)
Time Spent: 1h 50m  (was: 1h 40m)

> Create/Alter/Drop tables based on storage handlers in HS2 should be 
> authorized by Ranger/Sentry
> ---
>
> Key: HIVE-24705
> URL: https://issues.apache.org/jira/browse/HIVE-24705
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> With doAs=false in Hive3.x, whenever a user is trying to create a table based 
> on storage handlers on external storage for ex: HBase table, the end user we 
> are seeing is hive so we cannot really enforce the condition in Apache 
> Ranger/Sentry on the end-user. So, we need to enforce this condition in the 
> hive in the event of create/alter/drop tables based on storage handlers.
> Built-in hive storage handlers like HbaseStorageHandler, KafkaStorageHandler 
> e.t.c should implement a method getURIForAuthentication() which returns a URI 
> that is formed from table properties. This URI can be sent for authorization 
> to Ranger/Sentry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25444) Use a config to disable authorization on storage handlers by default.

2021-08-11 Thread Sai Hemanth Gantasala (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala reassigned HIVE-25444:



> Use a config to disable authorization on storage handlers by default.
> -
>
> Key: HIVE-25444
> URL: https://issues.apache.org/jira/browse/HIVE-25444
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> Using a config "hive.security.authorization.tables.on.storagehandlers" with a 
> default false, we'll enable the authorization on storage handlers by default. 
> Authorization is enabled if this config is set to true. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25441) Incorrect deltas split for sub-compactions when using `hive.compactor.max.num.delta`

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25441?focusedWorklogId=636954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636954
 ]

ASF GitHub Bot logged work on HIVE-25441:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 15:52
Start Date: 11/Aug/21 15:52
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2579:
URL: https://github.com/apache/hive/pull/2579#discussion_r686962046



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
##
@@ -234,12 +234,24 @@ void run(HiveConf conf, String jobName, Table t, 
Partition p, StorageDescriptor
 "especially if this message repeats.  Check that compaction is running 
properly.  Check for any " +
 "runaway/mis-configured process writing to ACID tables, especially 
using Streaming Ingest API.");
   int numMinorCompactions = parsedDeltas.size() / maxDeltasToHandle;
+  parsedDeltas.sort(AcidUtils.ParsedDeltaLight::compareTo);
+
+  int start = 0;
+  int end = maxDeltasToHandle;
+
   for (int jobSubId = 0; jobSubId < numMinorCompactions; jobSubId++) {
+while (parsedDeltas.get(end).getMinWriteId() == parsedDeltas.get(end - 
1).getMinWriteId() &&
+  parsedDeltas.get(end).getMaxWriteId() == parsedDeltas.get(end - 
1).getMaxWriteId()) {

Review comment:
   Oh, thank you for explaining! It might be nice to include a comment 
about this – but that's just a suggestion.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636954)
Time Spent: 1h  (was: 50m)

> Incorrect deltas split for sub-compactions when using 
> `hive.compactor.max.num.delta`
> 
>
> Key: HIVE-25441
> URL: https://issues.apache.org/jira/browse/HIVE-25441
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code}
> #Repro steps:
> #1./ set hive.compactor.max.num.delta to 5 on HMS
> #2./ Set up the table
> set hive.merge.cardinality.check=false;
> create table test (k int);
> ALTER TABLE test SET TBLPROPERTIES ('NO_AUTO_COMPACTION'='true');
> insert into test values (1);
> alter table test compact 'major' and wait;
> dfs -ls '/warehouse/tablespace/managed/hive/test';
> # drwxrwx---+  - hive hive  0 2021-08-09 12:26 
> /warehouse/tablespace/managed/hive/test/base_008_v416
> select * from test;
> # k=1
> #run 3 times so there's enough delta dirs, ie. 6 (should just increase k by 1)
> #basically just removes the row and adds a new row with k+1 value
> MERGE INTO test AS T USING (select * from test union all select k+1 from 
> test) AS S
> ON T.k=s.k
> WHEN MATCHED THEN DELETE
> WHEN not MATCHED THEN INSERT values (s.k);
> select * from test;
> #k=4
> dfs -ls '/warehouse/tablespace/managed/hive/test';
> #drwxrwx---+  - hive hive  0 2021-08-09 12:26 
> /warehouse/tablespace/managed/hive/test/base_008_v416
> #drwxrwx---+  - hive hive  0 2021-08-09 12:28 
> /warehouse/tablespace/managed/hive/test/delete_delta_009_009_0001
> #drwxrwx---+  - hive hive  0 2021-08-09 12:29 
> /warehouse/tablespace/managed/hive/test/delete_delta_010_010_0001
> #drwxrwx---+  - hive hive  0 2021-08-09 12:29 
> /warehouse/tablespace/managed/hive/test/delete_delta_011_011_0001
> #drwxrwx---+  - hive hive  0 2021-08-09 12:28 
> /warehouse/tablespace/managed/hive/test/delta_009_009_0003
> #drwxrwx---+  - hive hive  0 2021-08-09 12:29 
> /warehouse/tablespace/managed/hive/test/delta_010_010_0003
> #drwxrwx---+  - hive hive  0 2021-08-09 12:29 
> /warehouse/tablespace/managed/hive/test/delta_011_011_0003
> alter table test compact 'major' and wait;
> select * from test;
> #result is empty
> dfs -ls '/warehouse/tablespace/managed/hive/test';
> #2drwxrwx---+  - hive hive  0 2021-08-09 12:31 
> /warehouse/tablespace/managed/hive/test/base_011_v428
> {code}
> Some logs from the above example: 
> {code}
> 2021-08-09 12:30:37,532 WARN  
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR: 
> [nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49_executor]: 6 delta files 
> found for default.test located at 
> hdfs://nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site:8020/warehouse/tablespace/managed/hive/test!
>  This is likely a sign

[jira] [Work logged] (HIVE-25441) Incorrect deltas split for sub-compactions when using `hive.compactor.max.num.delta`

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25441?focusedWorklogId=636952=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636952
 ]

ASF GitHub Bot logged work on HIVE-25441:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 15:50
Start Date: 11/Aug/21 15:50
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2579:
URL: https://github.com/apache/hive/pull/2579#discussion_r686960430



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
##
@@ -781,6 +782,67 @@ public void 
autoCompactOnStreamingIngestWithDynamicPartition() throws Exception
 }
   }
 
+  @Test
+  public void testNoDataLossWhenMaxNumDeltaIsUsed() throws Exception {
+String dbName = "default";
+String tblName = "cws";
+executeStatementOnDriver("drop table if exists " + tblName, driver);
+
+executeStatementOnDriver("CREATE TABLE " + tblName + "(a INT, b STRING) " +
+  " STORED AS ORC  TBLPROPERTIES ('transactional'='true')", driver);
+executeStatementOnDriver("insert into " + tblName + " values (1, 'a')", 
driver);
+executeStatementOnDriver("insert into " + tblName + " values (3, 'b')", 
driver);
+
+runMajorCompaction(dbName, tblName);
+runCleaner(conf);
+
+for (int i = 0; i < 3; i++) {
+  executeStatementOnDriver("MERGE INTO " + tblName + " AS T USING (" +
+"select * from " + tblName + " union all select a+1, b from " + 
tblName + ") AS S " +
+"ON T.a=s.a " +
+"WHEN MATCHED THEN DELETE " +
+"WHEN not MATCHED THEN INSERT values (s.a, s.b)", driver);
+}
+
+driver.run("select a from " + tblName);
+List res = new ArrayList<>();
+driver.getFetchTask().fetch(res);
+Assert.assertEquals(res, Arrays.asList("4", "6"));
+
+conf.setIntVar(HiveConf.ConfVars.COMPACTOR_MAX_NUM_DELTA, 5);
+runMajorCompaction(dbName, tblName);
+
+List matchesNotFound = new ArrayList<>(5);
+matchesNotFound.add(AcidUtils.deleteDeltaSubdir(3, 4) + 
VISIBILITY_PATTERN);
+matchesNotFound.add(AcidUtils.deltaSubdir(3, 4) + VISIBILITY_PATTERN);
+matchesNotFound.add(AcidUtils.deleteDeltaSubdir(5, 5, 0));
+matchesNotFound.add(AcidUtils.deltaSubdir(5, 5, 1));
+matchesNotFound.add(AcidUtils.baseDir(5) + VISIBILITY_PATTERN);
+
+IMetaStoreClient msClient = new HiveMetaStoreClient(conf);
+Table table = msClient.getTable(dbName, tblName);
+msClient.close();
+
+FileSystem fs = FileSystem.get(conf);
+FileStatus[] stat = fs.listStatus(new Path(table.getSd().getLocation()));
+
+for (FileStatus f : stat) {
+  for (int j = 0; j < matchesNotFound.size(); j++) {
+if (f.getPath().getName().matches(matchesNotFound.get(j))) {
+  matchesNotFound.remove(j);
+  break;
+}
+  }
+}
+Assert.assertEquals("Matches Not Found: " + matchesNotFound.toArray(), 0, 
matchesNotFound.size());

Review comment:
   fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636952)
Time Spent: 50m  (was: 40m)

> Incorrect deltas split for sub-compactions when using 
> `hive.compactor.max.num.delta`
> 
>
> Key: HIVE-25441
> URL: https://issues.apache.org/jira/browse/HIVE-25441
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code}
> #Repro steps:
> #1./ set hive.compactor.max.num.delta to 5 on HMS
> #2./ Set up the table
> set hive.merge.cardinality.check=false;
> create table test (k int);
> ALTER TABLE test SET TBLPROPERTIES ('NO_AUTO_COMPACTION'='true');
> insert into test values (1);
> alter table test compact 'major' and wait;
> dfs -ls '/warehouse/tablespace/managed/hive/test';
> # drwxrwx---+  - hive hive  0 2021-08-09 12:26 
> /warehouse/tablespace/managed/hive/test/base_008_v416
> select * from test;
> # k=1
> #run 3 times so there's enough delta dirs, ie. 6 (should just increase k by 1)
> #basically just removes the row and adds a new row with k+1 value
> MERGE INTO test AS T USING (select * from test union all select k+1 from 
> test) AS S
> ON T.k=s.k
> WHEN MATCHED THEN DELETE
> WHEN not MATCHED THEN INSERT values (s.k);
> select * from test;
> #k=4
> dfs -ls '/warehouse/tablespace/managed/hive/test';
> #drwxrwx---+  - hive hive  0

[jira] [Work logged] (HIVE-25441) Incorrect deltas split for sub-compactions when using `hive.compactor.max.num.delta`

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25441?focusedWorklogId=636951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636951
 ]

ASF GitHub Bot logged work on HIVE-25441:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 15:44
Start Date: 11/Aug/21 15:44
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2579:
URL: https://github.com/apache/hive/pull/2579#discussion_r686955507



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
##
@@ -234,12 +234,24 @@ void run(HiveConf conf, String jobName, Table t, 
Partition p, StorageDescriptor
 "especially if this message repeats.  Check that compaction is running 
properly.  Check for any " +
 "runaway/mis-configured process writing to ACID tables, especially 
using Streaming Ingest API.");
   int numMinorCompactions = parsedDeltas.size() / maxDeltasToHandle;
+  parsedDeltas.sort(AcidUtils.ParsedDeltaLight::compareTo);
+
+  int start = 0;
+  int end = maxDeltasToHandle;
+
   for (int jobSubId = 0; jobSubId < numMinorCompactions; jobSubId++) {
+while (parsedDeltas.get(end).getMinWriteId() == parsedDeltas.get(end - 
1).getMinWriteId() &&
+  parsedDeltas.get(end).getMaxWriteId() == parsedDeltas.get(end - 
1).getMaxWriteId()) {

Review comment:
   it removes deltas for statements that went of the batch size. ( [... 
delta_5_5_1 ] [delta_5_5_2, ...])  - we should process multiple statements for 
the same writeIds range in the same batch, in this scenario delta_5_5_1 should 
be included in the next batch.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636951)
Time Spent: 40m  (was: 0.5h)

> Incorrect deltas split for sub-compactions when using 
> `hive.compactor.max.num.delta`
> 
>
> Key: HIVE-25441
> URL: https://issues.apache.org/jira/browse/HIVE-25441
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> #Repro steps:
> #1./ set hive.compactor.max.num.delta to 5 on HMS
> #2./ Set up the table
> set hive.merge.cardinality.check=false;
> create table test (k int);
> ALTER TABLE test SET TBLPROPERTIES ('NO_AUTO_COMPACTION'='true');
> insert into test values (1);
> alter table test compact 'major' and wait;
> dfs -ls '/warehouse/tablespace/managed/hive/test';
> # drwxrwx---+  - hive hive  0 2021-08-09 12:26 
> /warehouse/tablespace/managed/hive/test/base_008_v416
> select * from test;
> # k=1
> #run 3 times so there's enough delta dirs, ie. 6 (should just increase k by 1)
> #basically just removes the row and adds a new row with k+1 value
> MERGE INTO test AS T USING (select * from test union all select k+1 from 
> test) AS S
> ON T.k=s.k
> WHEN MATCHED THEN DELETE
> WHEN not MATCHED THEN INSERT values (s.k);
> select * from test;
> #k=4
> dfs -ls '/warehouse/tablespace/managed/hive/test';
> #drwxrwx---+  - hive hive  0 2021-08-09 12:26 
> /warehouse/tablespace/managed/hive/test/base_008_v416
> #drwxrwx---+  - hive hive  0 2021-08-09 12:28 
> /warehouse/tablespace/managed/hive/test/delete_delta_009_009_0001
> #drwxrwx---+  - hive hive  0 2021-08-09 12:29 
> /warehouse/tablespace/managed/hive/test/delete_delta_010_010_0001
> #drwxrwx---+  - hive hive  0 2021-08-09 12:29 
> /warehouse/tablespace/managed/hive/test/delete_delta_011_011_0001
> #drwxrwx---+  - hive hive  0 2021-08-09 12:28 
> /warehouse/tablespace/managed/hive/test/delta_009_009_0003
> #drwxrwx---+  - hive hive  0 2021-08-09 12:29 
> /warehouse/tablespace/managed/hive/test/delta_010_010_0003
> #drwxrwx---+  - hive hive  0 2021-08-09 12:29 
> /warehouse/tablespace/managed/hive/test/delta_011_011_0003
> alter table test compact 'major' and wait;
> select * from test;
> #result is empty
> dfs -ls '/warehouse/tablespace/managed/hive/test';
> #2drwxrwx---+  - hive hive  0 2021-08-09 12:31 
> /warehouse/tablespace/managed/hive/test/base_011_v428
> {code}
> Some logs from the above example: 
> {code}
> 2021-08-09 12:30:37,532 WARN  
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR: 
> [nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49_executor]: 6 delta files 
> found

[jira] [Work logged] (HIVE-25441) Incorrect deltas split for sub-compactions when using `hive.compactor.max.num.delta`

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25441?focusedWorklogId=636928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636928
 ]

ASF GitHub Bot logged work on HIVE-25441:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 15:10
Start Date: 11/Aug/21 15:10
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2579:
URL: https://github.com/apache/hive/pull/2579#discussion_r686914879



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
##
@@ -781,6 +782,67 @@ public void 
autoCompactOnStreamingIngestWithDynamicPartition() throws Exception
 }
   }
 
+  @Test
+  public void testNoDataLossWhenMaxNumDeltaIsUsed() throws Exception {
+String dbName = "default";
+String tblName = "cws";
+executeStatementOnDriver("drop table if exists " + tblName, driver);
+
+executeStatementOnDriver("CREATE TABLE " + tblName + "(a INT, b STRING) " +
+  " STORED AS ORC  TBLPROPERTIES ('transactional'='true')", driver);
+executeStatementOnDriver("insert into " + tblName + " values (1, 'a')", 
driver);
+executeStatementOnDriver("insert into " + tblName + " values (3, 'b')", 
driver);
+
+runMajorCompaction(dbName, tblName);
+runCleaner(conf);
+
+for (int i = 0; i < 3; i++) {
+  executeStatementOnDriver("MERGE INTO " + tblName + " AS T USING (" +
+"select * from " + tblName + " union all select a+1, b from " + 
tblName + ") AS S " +
+"ON T.a=s.a " +
+"WHEN MATCHED THEN DELETE " +
+"WHEN not MATCHED THEN INSERT values (s.a, s.b)", driver);
+}
+
+driver.run("select a from " + tblName);
+List res = new ArrayList<>();
+driver.getFetchTask().fetch(res);
+Assert.assertEquals(res, Arrays.asList("4", "6"));
+
+conf.setIntVar(HiveConf.ConfVars.COMPACTOR_MAX_NUM_DELTA, 5);
+runMajorCompaction(dbName, tblName);
+
+List matchesNotFound = new ArrayList<>(5);
+matchesNotFound.add(AcidUtils.deleteDeltaSubdir(3, 4) + 
VISIBILITY_PATTERN);
+matchesNotFound.add(AcidUtils.deltaSubdir(3, 4) + VISIBILITY_PATTERN);
+matchesNotFound.add(AcidUtils.deleteDeltaSubdir(5, 5, 0));
+matchesNotFound.add(AcidUtils.deltaSubdir(5, 5, 1));
+matchesNotFound.add(AcidUtils.baseDir(5) + VISIBILITY_PATTERN);
+
+IMetaStoreClient msClient = new HiveMetaStoreClient(conf);
+Table table = msClient.getTable(dbName, tblName);
+msClient.close();
+
+FileSystem fs = FileSystem.get(conf);
+FileStatus[] stat = fs.listStatus(new Path(table.getSd().getLocation()));
+
+for (FileStatus f : stat) {
+  for (int j = 0; j < matchesNotFound.size(); j++) {
+if (f.getPath().getName().matches(matchesNotFound.get(j))) {
+  matchesNotFound.remove(j);
+  break;
+}
+  }
+}
+Assert.assertEquals("Matches Not Found: " + matchesNotFound.toArray(), 0, 
matchesNotFound.size());

Review comment:
   matchesNotFound.toArray() should be 
Arrays.toString(matchesNotFound.toArray()) to display contents

##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
##
@@ -234,12 +234,24 @@ void run(HiveConf conf, String jobName, Table t, 
Partition p, StorageDescriptor
 "especially if this message repeats.  Check that compaction is running 
properly.  Check for any " +
 "runaway/mis-configured process writing to ACID tables, especially 
using Streaming Ingest API.");
   int numMinorCompactions = parsedDeltas.size() / maxDeltasToHandle;
+  parsedDeltas.sort(AcidUtils.ParsedDeltaLight::compareTo);
+
+  int start = 0;
+  int end = maxDeltasToHandle;
+
   for (int jobSubId = 0; jobSubId < numMinorCompactions; jobSubId++) {
+while (parsedDeltas.get(end).getMinWriteId() == parsedDeltas.get(end - 
1).getMinWriteId() &&
+  parsedDeltas.get(end).getMaxWriteId() == parsedDeltas.get(end - 
1).getMaxWriteId()) {

Review comment:
   Sorry I don't get this part. What does it do?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636928)
Time Spent: 0.5h  (was: 20m)

> Incorrect deltas split for sub-compactions when using 
> `hive.compactor.max.num.delta`
> 
>
> Key: HIVE-25441
> URL: https://issues.apache.org/jira/browse/HIVE-25441
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>

[jira] [Work logged] (HIVE-25429) Delta metrics collection may cause number of tez counters to exceed tez.counters.max limit

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25429?focusedWorklogId=636917=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636917
 ]

ASF GitHub Bot logged work on HIVE-25429:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 14:19
Start Date: 11/Aug/21 14:19
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2563:
URL: https://github.com/apache/hive/pull/2563#discussion_r686876814



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##
@@ -136,67 +133,77 @@ public static synchronized void init(HiveConf conf) 
throws Exception {
 
   private void configure(HiveConf conf) throws Exception {
 acidMetricsExtEnabled = MetastoreConf.getBoolVar(conf, 
MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON);
+if (acidMetricsExtEnabled) {
 
-deltasThreshold = HiveConf.getIntVar(conf, 
HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD);
-obsoleteDeltasThreshold = HiveConf.getIntVar(conf, 
HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD);
-
-initCachesForMetrics(conf);
-initObjectsForMetrics();
+  initCachesForMetrics(conf);
+  initObjectsForMetrics();
 
-long reportingInterval = HiveConf.getTimeVar(conf,
-  HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, 
TimeUnit.SECONDS);
+  long reportingInterval =
+  HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, TimeUnit.SECONDS);
 
-ThreadFactory threadFactory =
-  new ThreadFactoryBuilder()
-.setDaemon(true)
-.setNameFormat("DeltaFilesMetricReporter %d")
-.build();
-executorService = 
Executors.newSingleThreadScheduledExecutor(threadFactory);
-executorService.scheduleAtFixedRate(
-  new ReportingTask(), 0, reportingInterval, TimeUnit.SECONDS);
+  ThreadFactory threadFactory =
+  new 
ThreadFactoryBuilder().setDaemon(true).setNameFormat("DeltaFilesMetricReporter 
%d").build();
+  executorService = 
Executors.newSingleThreadScheduledExecutor(threadFactory);
+  executorService.scheduleAtFixedRate(new ReportingTask(), 0, 
reportingInterval, TimeUnit.SECONDS);
 
-LOG.info("Started DeltaFilesMetricReporter thread");
+  LOG.info("Started DeltaFilesMetricReporter thread");
+}
   }
 
   public void submit(TezCounters counters) {
 if (acidMetricsExtEnabled) {
-  updateMetrics(NUM_OBSOLETE_DELTAS,
-obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold,
-counters);
-  updateMetrics(NUM_DELTAS,
-deltaCache, deltaTopN, deltasThreshold,
-counters);
-  updateMetrics(NUM_SMALL_DELTAS,
-smallDeltaCache, smallDeltaTopN, deltasThreshold,
-counters);
+  updateMetrics(NUM_OBSOLETE_DELTAS, obsoleteDeltaCache, 
obsoleteDeltaTopN, counters);
+  updateMetrics(NUM_DELTAS, deltaCache, deltaTopN, counters);
+  updateMetrics(NUM_SMALL_DELTAS, smallDeltaCache, smallDeltaTopN, 
counters);
 }
   }
 
+  /**
+   * Copy counters to caches.
+   */
   private void updateMetrics(DeltaFilesMetricType metric, Cache cache, Queue> topN,
-int threshold, TezCounters counters) {
-counters.getGroup(metric.value).forEach(counter -> {
-  Integer prev = cache.getIfPresent(counter.getName());
-  if (prev != null && prev != counter.getValue()) {
-cache.invalidate(counter.getName());
+  TezCounters counters) {
+try {
+  CounterGroup group = counters.getGroup(metric.value);
+  // if the group is empty, clear the cache
+  if (group.size() == 0) {
+cache.invalidateAll();
+  } else {
+// if there is no counter corresponding to a cache entry, remove from 
cache
+ConcurrentMap cacheMap = cache.asMap();
+cacheMap.keySet().stream().filter(key -> 
counters.findCounter(group.getName(), key).getValue() == 0)
+.forEach(cache::invalidate);
   }
-  if (counter.getValue() > threshold) {
-if (topN.size() == maxCacheSize) {
-  Pair lowest = topN.peek();
-  if (lowest != null && counter.getValue() > lowest.getValue()) {
-cache.invalidate(lowest.getKey());
-  }
-}
-if (topN.size() < maxCacheSize) {
-  topN.add(Pair.of(counter.getName(), (int) counter.getValue()));
-  cache.put(counter.getName(), (int) counter.getValue());
+  // update existing cache entries or add new entries
+  for (TezCounter counter : group) {
+Integer prev = cache.getIfPresent(counter.getName());
+if (prev != null && prev != counter.getValue()) {
+  cache.invalidate(counter.getName());
 }
+topN.add(Pair.of(counter.getName(), (int) counter.getValue()));

Review comment:
   mergeDeltaFilesStats filters for whether the partition

[jira] [Work started] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values

2021-08-11 Thread Syed Shameerur Rahman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25443 started by Syed Shameerur Rahman.

> Arrow SerDe Cannot serialize/deserialize complex data types When there are 
> more than 1024 values
> 
>
> Key: HIVE-25443
> URL: https://issues.apache.org/jira/browse/HIVE-25443
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Complex data types like MAP, STRUCT cannot be serialized/deserialzed using 
> Arrow SerDe when there are more than 1024 values. This happens due to 
> ColumnVector always being initialized with a size of 1024.
> Issue #1 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213
> Issue #2 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215
> Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe :
> {code:java}
> @Test
>public void testListBooleanWithMoreThan1024Values() throws SerDeException {
>  String[][] schema = {
>  {"boolean_list", "array"},
>  };
>   
>  Object[][] rows = new Object[1025][1];
>  for (int i = 0; i < 1025; i++) {
>rows[i][0] = new BooleanWritable(true);
>  }
>   
>  initAndSerializeAndDeserialize(schema, toList(rows));
>}
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values

2021-08-11 Thread Syed Shameerur Rahman (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397249#comment-17397249
 ] 

Syed Shameerur Rahman commented on HIVE-25443:
--

[~kgyrtkirk] [~pgaref] Could you please review the pull request ?
Thanks

> Arrow SerDe Cannot serialize/deserialize complex data types When there are 
> more than 1024 values
> 
>
> Key: HIVE-25443
> URL: https://issues.apache.org/jira/browse/HIVE-25443
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Complex data types like MAP, STRUCT cannot be serialized/deserialzed using 
> Arrow SerDe when there are more than 1024 values. This happens due to 
> ColumnVector always being initialized with a size of 1024.
> Issue #1 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213
> Issue #2 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215
> Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe :
> {code:java}
> @Test
>public void testListBooleanWithMoreThan1024Values() throws SerDeException {
>  String[][] schema = {
>  {"boolean_list", "array"},
>  };
>   
>  Object[][] rows = new Object[1025][1];
>  for (int i = 0; i < 1025; i++) {
>rows[i][0] = new BooleanWritable(true);
>  }
>   
>  initAndSerializeAndDeserialize(schema, toList(rows));
>}
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25443?focusedWorklogId=636846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636846
 ]

ASF GitHub Bot logged work on HIVE-25443:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 10:12
Start Date: 11/Aug/21 10:12
Worklog Time Spent: 10m 
  Work Description: shameersss1 opened a new pull request #2581:
URL: https://github.com/apache/hive/pull/2581


   …pes When there are more than 1024 values
   
   
   
   ### What changes were proposed in this pull request?
   Instead of initializing the ColumnVector with default size which is 1024, 
Initialize it with the the size of record size required.
   
   ### Why are the changes needed?
   
   Changes are needed to allow Arrow SerDe to Serialize/deserialize complex 
data types When there are more than 1024 values
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Unit test were added to confirm the behaviour
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636846)
Remaining Estimate: 0h
Time Spent: 10m

> Arrow SerDe Cannot serialize/deserialize complex data types When there are 
> more than 1024 values
> 
>
> Key: HIVE-25443
> URL: https://issues.apache.org/jira/browse/HIVE-25443
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Complex data types like MAP, STRUCT cannot be serialized/deserialzed using 
> Arrow SerDe when there are more than 1024 values. This happens due to 
> ColumnVector always being initialized with a size of 1024.
> Issue #1 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213
> Issue #2 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215
> Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe :
> {code:java}
> @Test
>public void testListBooleanWithMoreThan1024Values() throws SerDeException {
>  String[][] schema = {
>  {"boolean_list", "array"},
>  };
>   
>  Object[][] rows = new Object[1025][1];
>  for (int i = 0; i < 1025; i++) {
>rows[i][0] = new BooleanWritable(true);
>  }
>   
>  initAndSerializeAndDeserialize(schema, toList(rows));
>}
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25443:
--
Labels: pull-request-available  (was: )

> Arrow SerDe Cannot serialize/deserialize complex data types When there are 
> more than 1024 values
> 
>
> Key: HIVE-25443
> URL: https://issues.apache.org/jira/browse/HIVE-25443
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Complex data types like MAP, STRUCT cannot be serialized/deserialzed using 
> Arrow SerDe when there are more than 1024 values. This happens due to 
> ColumnVector always being initialized with a size of 1024.
> Issue #1 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213
> Issue #2 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215
> Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe :
> {code:java}
> @Test
>public void testListBooleanWithMoreThan1024Values() throws SerDeException {
>  String[][] schema = {
>  {"boolean_list", "array"},
>  };
>   
>  Object[][] rows = new Object[1025][1];
>  for (int i = 0; i < 1025; i++) {
>rows[i][0] = new BooleanWritable(true);
>  }
>   
>  initAndSerializeAndDeserialize(schema, toList(rows));
>}
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25429) Delta metrics collection may cause number of tez counters to exceed tez.counters.max limit

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25429?focusedWorklogId=636845=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636845
 ]

ASF GitHub Bot logged work on HIVE-25429:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 10:12
Start Date: 11/Aug/21 10:12
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #2563:
URL: https://github.com/apache/hive/pull/2563#discussion_r686691938



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java
##
@@ -136,67 +133,77 @@ public static synchronized void init(HiveConf conf) 
throws Exception {
 
   private void configure(HiveConf conf) throws Exception {
 acidMetricsExtEnabled = MetastoreConf.getBoolVar(conf, 
MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON);
+if (acidMetricsExtEnabled) {
 
-deltasThreshold = HiveConf.getIntVar(conf, 
HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD);
-obsoleteDeltasThreshold = HiveConf.getIntVar(conf, 
HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD);
-
-initCachesForMetrics(conf);
-initObjectsForMetrics();
+  initCachesForMetrics(conf);
+  initObjectsForMetrics();
 
-long reportingInterval = HiveConf.getTimeVar(conf,
-  HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, 
TimeUnit.SECONDS);
+  long reportingInterval =
+  HiveConf.getTimeVar(conf, 
HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, TimeUnit.SECONDS);
 
-ThreadFactory threadFactory =
-  new ThreadFactoryBuilder()
-.setDaemon(true)
-.setNameFormat("DeltaFilesMetricReporter %d")
-.build();
-executorService = 
Executors.newSingleThreadScheduledExecutor(threadFactory);
-executorService.scheduleAtFixedRate(
-  new ReportingTask(), 0, reportingInterval, TimeUnit.SECONDS);
+  ThreadFactory threadFactory =
+  new 
ThreadFactoryBuilder().setDaemon(true).setNameFormat("DeltaFilesMetricReporter 
%d").build();
+  executorService = 
Executors.newSingleThreadScheduledExecutor(threadFactory);
+  executorService.scheduleAtFixedRate(new ReportingTask(), 0, 
reportingInterval, TimeUnit.SECONDS);
 
-LOG.info("Started DeltaFilesMetricReporter thread");
+  LOG.info("Started DeltaFilesMetricReporter thread");
+}
   }
 
   public void submit(TezCounters counters) {
 if (acidMetricsExtEnabled) {
-  updateMetrics(NUM_OBSOLETE_DELTAS,
-obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold,
-counters);
-  updateMetrics(NUM_DELTAS,
-deltaCache, deltaTopN, deltasThreshold,
-counters);
-  updateMetrics(NUM_SMALL_DELTAS,
-smallDeltaCache, smallDeltaTopN, deltasThreshold,
-counters);
+  updateMetrics(NUM_OBSOLETE_DELTAS, obsoleteDeltaCache, 
obsoleteDeltaTopN, counters);
+  updateMetrics(NUM_DELTAS, deltaCache, deltaTopN, counters);
+  updateMetrics(NUM_SMALL_DELTAS, smallDeltaCache, smallDeltaTopN, 
counters);
 }
   }
 
+  /**
+   * Copy counters to caches.
+   */
   private void updateMetrics(DeltaFilesMetricType metric, Cache cache, Queue> topN,
-int threshold, TezCounters counters) {
-counters.getGroup(metric.value).forEach(counter -> {
-  Integer prev = cache.getIfPresent(counter.getName());
-  if (prev != null && prev != counter.getValue()) {
-cache.invalidate(counter.getName());
+  TezCounters counters) {
+try {
+  CounterGroup group = counters.getGroup(metric.value);
+  // if the group is empty, clear the cache
+  if (group.size() == 0) {
+cache.invalidateAll();
+  } else {
+// if there is no counter corresponding to a cache entry, remove from 
cache
+ConcurrentMap cacheMap = cache.asMap();

Review comment:
   As discussed offline, we will also collect input ReadEntities and update 
metrics based on those




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636845)
Time Spent: 40m  (was: 0.5h)

> Delta metrics collection may cause number of tez counters to exceed 
> tez.counters.max limit
> --
>
> Key: HIVE-25429
> URL: https://issues.apache.org/jira/browse/HIVE-25429
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>

[jira] [Work logged] (HIVE-25346) cleanTxnToWriteIdTable breaks SNAPSHOT isolation

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25346?focusedWorklogId=636842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636842
 ]

ASF GitHub Bot logged work on HIVE-25346:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 10:08
Start Date: 11/Aug/21 10:08
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on pull request #2547:
URL: https://github.com/apache/hive/pull/2547#issuecomment-896692730


   > > > > Would be great if you could run the HMS benchmark and see if that 
has affected the commit step performance. See if the index on opType improves 
the situation.
   > > > 
   > > > 
   > > > There is no benchmark currently for the commitTxn() call.
   > > 
   > > 
   > > Could we create one or at least perform similar to 
https://issues.apache.org/jira/browse/HIVE-23104?focusedCommentId=17083005=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17083005
 test?
   > 
   > Based on the linked comment it is not clear to me what exactly was tested 
and how or what tools were used for it or what environment the tests were run 
in. I found no mention of the above details in the jira comments.
   
   You are not limited in a set of tools or approaches, free to choose any. 
   This patch changes the way how INSERTs are handled at the commit step, so it 
would make sense to test the throughput of INSERT/UPDATE operations in 
multithreaded env.  
   AFAIK in the above JIRA JMH tool was used and testing was done on a local 
env. 
   We cannot merge this PR until we know the performance effect of the proposed 
change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636842)
Time Spent: 4.5h  (was: 4h 20m)

> cleanTxnToWriteIdTable breaks SNAPSHOT isolation
> 
>
> Key: HIVE-25346
> URL: https://issues.apache.org/jira/browse/HIVE-25346
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25438) Update partition column stats fails with invalid syntax error for MySql

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25438?focusedWorklogId=636837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636837
 ]

ASF GitHub Bot logged work on HIVE-25438:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 09:53
Start Date: 11/Aug/21 09:53
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2573:
URL: https://github.com/apache/hive/pull/2573#discussion_r686678482



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlUpdateStat.java
##
@@ -647,6 +653,12 @@ public long getNextCSIdForMPartitionColumnStatistics(long 
numStats) throws MetaE
   jdoConn = pm.getDataStoreConnection();
   dbConn = (Connection) (jdoConn.getNativeConnection());
 
+  if (sqlGenerator.getDbProduct().isMYSQL()) {

Review comment:
   that will be costlier than setting the value in DB. As we dont know how 
many sql statements are getting executed in side a connection.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636837)
Time Spent: 50m  (was: 40m)

> Update partition column stats fails with invalid syntax error for MySql
> ---
>
> Key: HIVE-25438
> URL: https://issues.apache.org/jira/browse/HIVE-25438
> Project: Hive
>  Issue Type: Sub-task
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The quotes are not supported by mysql if  ANSI_QUOTES is not set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25438) Update partition column stats fails with invalid syntax error for MySql

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25438?focusedWorklogId=636835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636835
 ]

ASF GitHub Bot logged work on HIVE-25438:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 09:52
Start Date: 11/Aug/21 09:52
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2573:
URL: https://github.com/apache/hive/pull/2573#discussion_r686677911



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlUpdateStat.java
##
@@ -647,6 +653,12 @@ public long getNextCSIdForMPartitionColumnStatistics(long 
numStats) throws MetaE
   jdoConn = pm.getDataStoreConnection();
   dbConn = (Connection) (jdoConn.getNativeConnection());
 
+  if (sqlGenerator.getDbProduct().isMYSQL()) {
+try (Statement stmt = dbConn.createStatement()) {
+  stmt.execute("SET @@session.sql_mode=ANSI_QUOTES");
+}
+  }
+

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636835)
Time Spent: 40m  (was: 0.5h)

> Update partition column stats fails with invalid syntax error for MySql
> ---
>
> Key: HIVE-25438
> URL: https://issues.apache.org/jira/browse/HIVE-25438
> Project: Hive
>  Issue Type: Sub-task
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The quotes are not supported by mysql if  ANSI_QUOTES is not set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25375) Partition column rename support for Iceberg tables

2021-08-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita resolved HIVE-25375.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master, thanks for reviewing it [~kuczoram]!

> Partition column rename support for Iceberg tables
> --
>
> Key: HIVE-25375
> URL: https://issues.apache.org/jira/browse/HIVE-25375
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently ALTER TABLE CHANGE COLUMN statement only updates the Iceberg-backed 
> table's schema, but not it's partition spec. Updating the spec is required to 
> allow subsequent partition spec changes (like set partition spec calls) to 
> succeed.
> Note: to do this in HiveIcebergMetaHook class we can't just create an 
> updateSchema and an updatePartitionSpec object, do the modifications and 
> commit both of them, as the last commit will fail due to invalid base. We 
> might want to do this in two separate steps instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25375) Partition column rename support for Iceberg tables

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25375?focusedWorklogId=636832=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636832
 ]

ASF GitHub Bot logged work on HIVE-25375:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 09:42
Start Date: 11/Aug/21 09:42
Worklog Time Spent: 10m 
  Work Description: szlta merged pull request #2577:
URL: https://github.com/apache/hive/pull/2577


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636832)
Time Spent: 20m  (was: 10m)

> Partition column rename support for Iceberg tables
> --
>
> Key: HIVE-25375
> URL: https://issues.apache.org/jira/browse/HIVE-25375
> Project: Hive
>  Issue Type: Bug
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently ALTER TABLE CHANGE COLUMN statement only updates the Iceberg-backed 
> table's schema, but not it's partition spec. Updating the spec is required to 
> allow subsequent partition spec changes (like set partition spec calls) to 
> succeed.
> Note: to do this in HiveIcebergMetaHook class we can't just create an 
> updateSchema and an updatePartitionSpec object, do the modifications and 
> commit both of them, as the last commit will fail due to invalid base. We 
> might want to do this in two separate steps instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values

2021-08-11 Thread Syed Shameerur Rahman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman reassigned HIVE-25443:



> Arrow SerDe Cannot serialize/deserialize complex data types When there are 
> more than 1024 values
> 
>
> Key: HIVE-25443
> URL: https://issues.apache.org/jira/browse/HIVE-25443
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.2, 3.1.1, 3.0.0, 3.1.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
>
> Complex data types like MAP, STRUCT cannot be serialized/deserialzed using 
> Arrow SerDe when there are more than 1024 values. This happens due to 
> ColumnVector always being initialized with a size of 1024.
> Issue #1 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213
> Issue #2 : 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215
> Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe :
> {code:java}
> @Test
>public void testListBooleanWithMoreThan1024Values() throws SerDeException {
>  String[][] schema = {
>  {"boolean_list", "array"},
>  };
>   
>  Object[][] rows = new Object[1025][1];
>  for (int i = 0; i < 1025; i++) {
>rows[i][0] = new BooleanWritable(true);
>  }
>   
>  initAndSerializeAndDeserialize(schema, toList(rows));
>}
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=636829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636829
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 09:21
Start Date: 11/Aug/21 09:21
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r686623069



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -129,21 +131,16 @@ private boolean 
fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego
   private void addElement(ListColumnVector lcv, List elements, 
PrimitiveObjectInspector.PrimitiveCategory category, int index) throws 
IOException {
 lcv.offsets[index] = elements.size();
 
-// Return directly if last value is null
-if (definitionLevel < maxDefLevel) {
-  lcv.isNull[index] = true;
-  lcv.lengths[index] = 0;
-  // fetch the data from parquet data page for next call
-  fetchNextValue(category);
-  return;
-}
-
 do {
   // add all data for an element in ListColumnVector, get out the loop if 
there is no data or the data is for new element
+  if (definitionLevel < maxDefLevel) {
+lcv.lengths[index] = 0;
+lcv.isNull[index] = true;
+lcv.noNulls = false;
+  }
   elements.add(lastValue);
 } while (fetchNextValue(category) && (repetitionLevel != 0));
 
-lcv.isNull[index] = false;
 lcv.lengths[index] = elements.size() - lcv.offsets[index];

Review comment:
   lcv.lengths[index]  is over written ..in the loop for some condition its 
set to 0 

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -129,21 +131,16 @@ private boolean 
fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego
   private void addElement(ListColumnVector lcv, List elements, 
PrimitiveObjectInspector.PrimitiveCategory category, int index) throws 
IOException {
 lcv.offsets[index] = elements.size();
 
-// Return directly if last value is null
-if (definitionLevel < maxDefLevel) {
-  lcv.isNull[index] = true;
-  lcv.lengths[index] = 0;
-  // fetch the data from parquet data page for next call
-  fetchNextValue(category);
-  return;
-}
-
 do {
   // add all data for an element in ListColumnVector, get out the loop if 
there is no data or the data is for new element
+  if (definitionLevel < maxDefLevel) {
+lcv.lengths[index] = 0;
+lcv.isNull[index] = true;

Review comment:
   why this has to be done in a loop ? 

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -193,59 +190,59 @@ private List 
decodeDictionaryIds(PrimitiveObjectInspector.PrimitiveCategory cate
 case SHORT:
   resultList = new ArrayList(total);
   for (int i = 0; i < total; ++i) {
-resultList.add(dictionary.readInteger(intList.get(i)));
+resultList.add(intList.get(i) == null ? null : 
dictionary.readInteger(intList.get(i)));
   }
   break;
 case DATE:
 case INTERVAL_YEAR_MONTH:
 case LONG:
   resultList = new ArrayList(total);
   for (int i = 0; i < total; ++i) {
-resultList.add(dictionary.readLong(intList.get(i)));
+resultList.add(intList.get(i) == null ? null : 
dictionary.readLong(intList.get(i)));
   }
   break;
 case BOOLEAN:
   resultList = new ArrayList(total);
   for (int i = 0; i < total; ++i) {
-resultList.add(dictionary.readBoolean(intList.get(i)) ? 1 : 0);
+resultList.add(intList.get(i) == null ? null : 
dictionary.readBoolean(intList.get(i)));

Review comment:
   instead of 0 or 1 ..value returned by readBoolean is used 

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -129,21 +131,16 @@ private boolean 
fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego
   private void addElement(ListColumnVector lcv, List elements, 
PrimitiveObjectInspector.PrimitiveCategory category, int index) throws 
IOException {
 lcv.offsets[index] = elements.size();
 
-// Return directly if last value is null
-if (definitionLevel < maxDefLevel) {
-  lcv.isNull[index] = true;
-  lcv.lengths[index] = 0;
-  // fetch the data from parquet data page for next call
-  fetchNextValue(category);
-  return;
-}
-
 do {
   // add all data for an element in ListColumnVector, get out the loop if 
there is no data or the data is for new element
+  if (definitionLevel < maxDefLevel) {

Review comment:
   in fetchNextvalue ..if (definitionLevel != maxDefLevel) {

[jira] [Updated] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)

2021-08-11 Thread Sankar Hariappan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-25334:

Component/s: (was: HiveServer2)
 UDF

> Refactor UDF CAST( as TIMESTAMP)
> -
>
> Key: HIVE-25334
> URL: https://issues.apache.org/jira/browse/HIVE-25334
> Project: Hive
>  Issue Type: Sub-task
>  Components: UDF
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: UDF, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Description 
> Refactor GenericUDFTimestamp.class 
> DOD
> Refactor 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)

2021-08-11 Thread Sankar Hariappan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-25334:

Labels: UDF pull-request-available  (was: pull-request-available)

> Refactor UDF CAST( as TIMESTAMP)
> -
>
> Key: HIVE-25334
> URL: https://issues.apache.org/jira/browse/HIVE-25334
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: UDF, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Description 
> Refactor GenericUDFTimestamp.class 
> DOD
> Refactor 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)

2021-08-11 Thread Sankar Hariappan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan resolved HIVE-25334.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

PR merged to master.
Thanks [~ashish-kumar-sharma] for the contribution and [~adeshrao] for the 
review!

> Refactor UDF CAST( as TIMESTAMP)
> -
>
> Key: HIVE-25334
> URL: https://issues.apache.org/jira/browse/HIVE-25334
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Description 
> Refactor GenericUDFTimestamp.class 
> DOD
> Refactor 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)

2021-08-11 Thread Sankar Hariappan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-25334:

Component/s: HiveServer2

> Refactor UDF CAST( as TIMESTAMP)
> -
>
> Key: HIVE-25334
> URL: https://issues.apache.org/jira/browse/HIVE-25334
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: UDF, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Description 
> Refactor GenericUDFTimestamp.class 
> DOD
> Refactor 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25334?focusedWorklogId=636798=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636798
 ]

ASF GitHub Bot logged work on HIVE-25334:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 08:02
Start Date: 11/Aug/21 08:02
Worklog Time Spent: 10m 
  Work Description: sankarh merged pull request #2482:
URL: https://github.com/apache/hive/pull/2482


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636798)
Time Spent: 2h  (was: 1h 50m)

> Refactor UDF CAST( as TIMESTAMP)
> -
>
> Key: HIVE-25334
> URL: https://issues.apache.org/jira/browse/HIVE-25334
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Description 
> Refactor GenericUDFTimestamp.class 
> DOD
> Refactor 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25334?focusedWorklogId=636795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636795
 ]

ASF GitHub Bot logged work on HIVE-25334:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 07:59
Start Date: 11/Aug/21 07:59
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #2482:
URL: https://github.com/apache/hive/pull/2482#discussion_r686592413



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTimestamp.java
##
@@ -63,61 +71,44 @@
* otherwise, it's interpreted as timestamp in seconds.
*/
   private boolean intToTimestampInSeconds = false;
+  private boolean strict = true;
 
   @Override
   public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
-if (arguments.length < 1) {
-  throw new UDFArgumentLengthException(
-  "The function TIMESTAMP requires at least one argument, got "
-  + arguments.length);
-}
-
-SessionState ss = SessionState.get();
-if (ss != null) {
-  intToTimestampInSeconds = 
ss.getConf().getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS);
-}
+checkArgsSize(arguments, 1, 1);
+checkArgPrimitive(arguments, 0);
+checkArgGroups(arguments, 0, tsInputTypes, STRING_GROUP, DATE_GROUP, 
NUMERIC_GROUP, VOID_GROUP, BOOLEAN_GROUP);
 
-try {
-  argumentOI = (PrimitiveObjectInspector) arguments[0];
-} catch (ClassCastException e) {
-  throw new UDFArgumentException(
-  "The function TIMESTAMP takes only primitive types");
-}
+strict = SessionState.get() != null ? SessionState.get().getConf()
+.getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION) : new HiveConf()
+.getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION);
+intToTimestampInSeconds = SessionState.get() != null ? 
SessionState.get().getConf()
+.getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS) : new 
HiveConf()
+.getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS);
 
-if (ss != null && 
ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) {
-  PrimitiveCategory category = argumentOI.getPrimitiveCategory();
-  PrimitiveGrouping group = 
PrimitiveObjectInspectorUtils.getPrimitiveGrouping(category);
-  if (group == PrimitiveGrouping.NUMERIC_GROUP) {
+if (strict) {
+  if (PrimitiveObjectInspectorUtils.getPrimitiveGrouping(tsInputTypes[0]) 
== PrimitiveGrouping.NUMERIC_GROUP) {

Review comment:
   oh yes... my bad... It is disallowed only with strict=true.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636795)
Time Spent: 1h 50m  (was: 1h 40m)

> Refactor UDF CAST( as TIMESTAMP)
> -
>
> Key: HIVE-25334
> URL: https://issues.apache.org/jira/browse/HIVE-25334
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Description 
> Refactor GenericUDFTimestamp.class 
> DOD
> Refactor 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25334?focusedWorklogId=636792=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636792
 ]

ASF GitHub Bot logged work on HIVE-25334:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 07:56
Start Date: 11/Aug/21 07:56
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #2482:
URL: https://github.com/apache/hive/pull/2482#discussion_r686586962



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTimestamp.java
##
@@ -63,61 +71,44 @@
* otherwise, it's interpreted as timestamp in seconds.
*/
   private boolean intToTimestampInSeconds = false;
+  private boolean strict = true;
 
   @Override
   public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
-if (arguments.length < 1) {
-  throw new UDFArgumentLengthException(
-  "The function TIMESTAMP requires at least one argument, got "
-  + arguments.length);
-}
-
-SessionState ss = SessionState.get();
-if (ss != null) {
-  intToTimestampInSeconds = 
ss.getConf().getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS);
-}
+checkArgsSize(arguments, 1, 1);
+checkArgPrimitive(arguments, 0);
+checkArgGroups(arguments, 0, tsInputTypes, STRING_GROUP, DATE_GROUP, 
NUMERIC_GROUP, VOID_GROUP, BOOLEAN_GROUP);
 
-try {
-  argumentOI = (PrimitiveObjectInspector) arguments[0];
-} catch (ClassCastException e) {
-  throw new UDFArgumentException(
-  "The function TIMESTAMP takes only primitive types");
-}
+strict = SessionState.get() != null ? SessionState.get().getConf()
+.getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION) : new HiveConf()
+.getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION);
+intToTimestampInSeconds = SessionState.get() != null ? 
SessionState.get().getConf()
+.getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS) : new 
HiveConf()
+.getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS);
 
-if (ss != null && 
ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) {
-  PrimitiveCategory category = argumentOI.getPrimitiveCategory();
-  PrimitiveGrouping group = 
PrimitiveObjectInspectorUtils.getPrimitiveGrouping(category);
-  if (group == PrimitiveGrouping.NUMERIC_GROUP) {
+if (strict) {
+  if (PrimitiveObjectInspectorUtils.getPrimitiveGrouping(tsInputTypes[0]) 
== PrimitiveGrouping.NUMERIC_GROUP) {

Review comment:
   We do support timestamp to numeric conversion. checkout 
https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java#L1177.
 due to which we need to allow NUMERIC in checkArgGroups(). Also this behaviour 
is controlled by 
https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1827.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636792)
Time Spent: 1h 40m  (was: 1.5h)

> Refactor UDF CAST( as TIMESTAMP)
> -
>
> Key: HIVE-25334
> URL: https://issues.apache.org/jira/browse/HIVE-25334
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Description 
> Refactor GenericUDFTimestamp.class 
> DOD
> Refactor 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25334?focusedWorklogId=636791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636791
 ]

ASF GitHub Bot logged work on HIVE-25334:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 07:52
Start Date: 11/Aug/21 07:52
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #2482:
URL: https://github.com/apache/hive/pull/2482#discussion_r686586962



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTimestamp.java
##
@@ -63,61 +71,44 @@
* otherwise, it's interpreted as timestamp in seconds.
*/
   private boolean intToTimestampInSeconds = false;
+  private boolean strict = true;
 
   @Override
   public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
-if (arguments.length < 1) {
-  throw new UDFArgumentLengthException(
-  "The function TIMESTAMP requires at least one argument, got "
-  + arguments.length);
-}
-
-SessionState ss = SessionState.get();
-if (ss != null) {
-  intToTimestampInSeconds = 
ss.getConf().getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS);
-}
+checkArgsSize(arguments, 1, 1);
+checkArgPrimitive(arguments, 0);
+checkArgGroups(arguments, 0, tsInputTypes, STRING_GROUP, DATE_GROUP, 
NUMERIC_GROUP, VOID_GROUP, BOOLEAN_GROUP);
 
-try {
-  argumentOI = (PrimitiveObjectInspector) arguments[0];
-} catch (ClassCastException e) {
-  throw new UDFArgumentException(
-  "The function TIMESTAMP takes only primitive types");
-}
+strict = SessionState.get() != null ? SessionState.get().getConf()
+.getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION) : new HiveConf()
+.getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION);
+intToTimestampInSeconds = SessionState.get() != null ? 
SessionState.get().getConf()
+.getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS) : new 
HiveConf()
+.getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS);
 
-if (ss != null && 
ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) {
-  PrimitiveCategory category = argumentOI.getPrimitiveCategory();
-  PrimitiveGrouping group = 
PrimitiveObjectInspectorUtils.getPrimitiveGrouping(category);
-  if (group == PrimitiveGrouping.NUMERIC_GROUP) {
+if (strict) {
+  if (PrimitiveObjectInspectorUtils.getPrimitiveGrouping(tsInputTypes[0]) 
== PrimitiveGrouping.NUMERIC_GROUP) {

Review comment:
   We do support timestamp to numeric conversion. checkout 
https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java#L1177.
 due to which we need to allow NUMERIC in checkArgGroups().




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636791)
Time Spent: 1.5h  (was: 1h 20m)

> Refactor UDF CAST( as TIMESTAMP)
> -
>
> Key: HIVE-25334
> URL: https://issues.apache.org/jira/browse/HIVE-25334
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Description 
> Refactor GenericUDFTimestamp.class 
> DOD
> Refactor 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25334?focusedWorklogId=636790=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636790
 ]

ASF GitHub Bot logged work on HIVE-25334:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 07:50
Start Date: 11/Aug/21 07:50
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #2482:
URL: https://github.com/apache/hive/pull/2482#discussion_r686585429



##
File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java
##
@@ -434,10 +434,19 @@ protected void obtainTimestampConverter(ObjectInspector[] 
arguments, int i,
 case TIMESTAMP:
 case DATE:
 case TIMESTAMPLOCALTZ:
+case INT:

Review comment:
   There data are already implemented in 
https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java#L1177.
 I am just adding validation check to avoid runtime failures. So no test is 
required in this case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636790)
Time Spent: 1h 20m  (was: 1h 10m)

> Refactor UDF CAST( as TIMESTAMP)
> -
>
> Key: HIVE-25334
> URL: https://issues.apache.org/jira/browse/HIVE-25334
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Description 
> Refactor GenericUDFTimestamp.class 
> DOD
> Refactor 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25441) Incorrect deltas split for sub-compactions when using `hive.compactor.max.num.delta`

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25441?focusedWorklogId=636757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636757
 ]

ASF GitHub Bot logged work on HIVE-25441:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 06:45
Start Date: 11/Aug/21 06:45
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on pull request #2579:
URL: https://github.com/apache/hive/pull/2579#issuecomment-896546211


   recheck


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636757)
Time Spent: 20m  (was: 10m)

> Incorrect deltas split for sub-compactions when using 
> `hive.compactor.max.num.delta`
> 
>
> Key: HIVE-25441
> URL: https://issues.apache.org/jira/browse/HIVE-25441
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> #Repro steps:
> #1./ set hive.compactor.max.num.delta to 5 on HMS
> #2./ Set up the table
> set hive.merge.cardinality.check=false;
> create table test (k int);
> ALTER TABLE test SET TBLPROPERTIES ('NO_AUTO_COMPACTION'='true');
> insert into test values (1);
> alter table test compact 'major' and wait;
> dfs -ls '/warehouse/tablespace/managed/hive/test';
> # drwxrwx---+  - hive hive  0 2021-08-09 12:26 
> /warehouse/tablespace/managed/hive/test/base_008_v416
> select * from test;
> # k=1
> #run 3 times so there's enough delta dirs, ie. 6 (should just increase k by 1)
> #basically just removes the row and adds a new row with k+1 value
> MERGE INTO test AS T USING (select * from test union all select k+1 from 
> test) AS S
> ON T.k=s.k
> WHEN MATCHED THEN DELETE
> WHEN not MATCHED THEN INSERT values (s.k);
> select * from test;
> #k=4
> dfs -ls '/warehouse/tablespace/managed/hive/test';
> #drwxrwx---+  - hive hive  0 2021-08-09 12:26 
> /warehouse/tablespace/managed/hive/test/base_008_v416
> #drwxrwx---+  - hive hive  0 2021-08-09 12:28 
> /warehouse/tablespace/managed/hive/test/delete_delta_009_009_0001
> #drwxrwx---+  - hive hive  0 2021-08-09 12:29 
> /warehouse/tablespace/managed/hive/test/delete_delta_010_010_0001
> #drwxrwx---+  - hive hive  0 2021-08-09 12:29 
> /warehouse/tablespace/managed/hive/test/delete_delta_011_011_0001
> #drwxrwx---+  - hive hive  0 2021-08-09 12:28 
> /warehouse/tablespace/managed/hive/test/delta_009_009_0003
> #drwxrwx---+  - hive hive  0 2021-08-09 12:29 
> /warehouse/tablespace/managed/hive/test/delta_010_010_0003
> #drwxrwx---+  - hive hive  0 2021-08-09 12:29 
> /warehouse/tablespace/managed/hive/test/delta_011_011_0003
> alter table test compact 'major' and wait;
> select * from test;
> #result is empty
> dfs -ls '/warehouse/tablespace/managed/hive/test';
> #2drwxrwx---+  - hive hive  0 2021-08-09 12:31 
> /warehouse/tablespace/managed/hive/test/base_011_v428
> {code}
> Some logs from the above example: 
> {code}
> 2021-08-09 12:30:37,532 WARN  
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR: 
> [nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49_executor]: 6 delta files 
> found for default.test located at 
> hdfs://nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site:8020/warehouse/tablespace/managed/hive/test!
>  This is likely a sign of misconfiguration, especially if this message 
> repeats.  Check that compaction is running properly.  Check for any 
> runaway/mis-configured process writing to ACID tables, especially using 
> Streaming Ingest API.
> 2021-08-09 12:30:37,533 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR: 
> [nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49_executor]: Submitting 
> MINOR compaction job 
> 'nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49-compactor-default.test_0' 
> to default queue.  (current delta dirs count=5, obsolete delta dirs count=-1. 
> TxnIdRange[9,11]
> 2021-08-09 12:30:38,003 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR: 
> [nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49_executor]: Submitted 
> compaction job 
> 'nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49-compactor-default.test_0' 
> with jobID=job_1628497133224_0051 compaction ID=23
> #From app logs of the minor compaction, note that delta_011_011_0001 
> is missing from the list
> 2021-08-09

[jira] [Resolved] (HIVE-25409) group by the same column result sum result error

2021-08-11 Thread liuguanghua (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuguanghua resolved HIVE-25409.

Resolution: Fixed

> group by the same column result sum result error
> 
>
> Key: HIVE-25409
> URL: https://issues.apache.org/jira/browse/HIVE-25409
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1, 1.2.2
> Environment: hadoop 2.8.5
> hive 1.2.1 or hive 1.2.2
>Reporter: liuguanghua
>Priority: Major
> Fix For: 2.1.1
>
>
> create table test
> (
> a string,
> b string
> );
> insert into table test values ('a','1'),('b','2'),('c','3');
> select a,a,sum(b) from test group by a,a;
>  
> this will get the wrong answer.  We expect the sum(b),but is will compute the 
> sum(a).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25409) group by the same column result sum result error

2021-08-11 Thread liuguanghua (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397090#comment-17397090
 ] 

liuguanghua commented on HIVE-25409:


I have alreay verify the problem with HIVE-14715, and it worked.  Thank you 
very much.

> group by the same column result sum result error
> 
>
> Key: HIVE-25409
> URL: https://issues.apache.org/jira/browse/HIVE-25409
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1, 1.2.2
> Environment: hadoop 2.8.5
> hive 1.2.1 or hive 1.2.2
>Reporter: liuguanghua
>Priority: Major
> Fix For: 2.1.1
>
>
> create table test
> (
> a string,
> b string
> );
> insert into table test values ('a','1'),('b','2'),('c','3');
> select a,a,sum(b) from test group by a,a;
>  
> this will get the wrong answer.  We expect the sum(b),but is will compute the 
> sum(a).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25409) group by the same column result sum result error

2021-08-11 Thread liuguanghua (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuguanghua updated HIVE-25409:
---
Fix Version/s: 2.1.1

> group by the same column result sum result error
> 
>
> Key: HIVE-25409
> URL: https://issues.apache.org/jira/browse/HIVE-25409
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1, 1.2.2
> Environment: hadoop 2.8.5
> hive 1.2.1 or hive 1.2.2
>Reporter: liuguanghua
>Priority: Major
> Fix For: 2.1.1
>
>
> create table test
> (
> a string,
> b string
> );
> insert into table test values ('a','1'),('b','2'),('c','3');
> select a,a,sum(b) from test group by a,a;
>  
> this will get the wrong answer.  We expect the sum(b),but is will compute the 
> sum(a).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25403) Fix from_unixtime() to consider leap seconds

2021-08-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25403?focusedWorklogId=636749=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636749
 ]

ASF GitHub Bot logged work on HIVE-25403:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 06:10
Start Date: 11/Aug/21 06:10
Worklog Time Spent: 10m 
  Work Description: warriersruthi commented on pull request #2550:
URL: https://github.com/apache/hive/pull/2550#issuecomment-896530150


   Thank you @sankarh, @ashish-kumar-sharma & @adesh-rao for your detailed 
reviews and assistance! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636749)
Time Spent: 3h 50m  (was: 3h 40m)

> Fix from_unixtime() to consider leap seconds 
> -
>
> Key: HIVE-25403
> URL: https://issues.apache.org/jira/browse/HIVE-25403
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Sruthi Mooriyathvariam
>Assignee: Sruthi Mooriyathvariam
>Priority: Major
>  Labels: UDF, pull-request-available
> Fix For: 4.0.0
>
> Attachments: image-2021-07-29-14-42-49-806.png
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The Unix_timestamp() considers "leap second" while the from_unixtime is not; 
> which results in to wrong result as below:
> !image-2021-07-29-14-42-49-806.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25298) LAG function get java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveDecimal cannot be cast to org.apache.hadoop.hive.serde2.io.HiveDecimalWritable

2021-08-11 Thread Sankar Hariappan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan resolved HIVE-25298.
-
Resolution: Not A Bug

Already fixed by https://issues.apache.org/jira/browse/HIVE-21104.

> LAG function get java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.type.HiveDecimal cannot be cast to 
> org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
> 
>
> Key: HIVE-25298
> URL: https://issues.apache.org/jira/browse/HIVE-25298
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.2
> Environment: Hive 3.1
>Reporter: wenjun ma
>Priority: Major
>
> When we try to apply the LAG function with aggregation function (MAX), we got 
> ava.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveDecimal 
> cannot be cast to org.apache.hadoop.hive.serde2.io.HiveDecimalWritable.
> reproduce steps:
>  # create table:
> create table tbl1 (ACCT_NM string, ACCT_BAL decimal(15,2)) partitioned by 
> (DL_DATA_DT string)
>  #  insert sample data:insert into tbl1 values ('acct1', 1000.00, 
> '2020-01-01');
> insert into tbl1 values ('acct1', 800.00, '2020-01-02');
>  ## Run folowing SQL:   
> {code:java}
> select
> test.ACCT_NM,
> test.DL_DATA_DT,
> test.MAX_ACCT_BAL,
> LAG(test.MAX_ACCT_BAL,1,0) OVER (PARTITION BY test.ACCT_NM ORDER BY 
> test.DL_DATA_DT) AS PREV_USED_AMT
> from (
> select
> tbl1.ACCT_NM as ACCT_NM,
> tbl1.DL_DATA_DT as DL_DATA_DT,
> max(tbl1.ACCT_BAL) as MAX_ACCT_BAL
> from tbl1
> group by tbl1.ACCT_NM, tbl1.DL_DATA_DT
> ) test;
> {code}
> Full Stack:
> ERROR : FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
> 2, vertexId=vertex_1624984332939_0003_5_01, diagnostics=[Task failed, 
> taskId=task_1624984332939_0003_5_01_21, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1624984332939_0003_5_01_21_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>  at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>  at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>  at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>  at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:304)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>  ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:378)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:294)
>  ... 17 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveDecimal 
> cannot be cast to org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:795)
>  at 
>

[jira] [Updated] (HIVE-25298) LAG function get java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveDecimal cannot be cast to org.apache.hadoop.hive.serde2.io.HiveDecimalWritable

2021-08-11 Thread Sankar Hariappan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-25298:

Fix Version/s: 4.0.0

> LAG function get java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.type.HiveDecimal cannot be cast to 
> org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
> 
>
> Key: HIVE-25298
> URL: https://issues.apache.org/jira/browse/HIVE-25298
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.2
> Environment: Hive 3.1
>Reporter: wenjun ma
>Priority: Major
> Fix For: 4.0.0
>
>
> When we try to apply the LAG function with aggregation function (MAX), we got 
> ava.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveDecimal 
> cannot be cast to org.apache.hadoop.hive.serde2.io.HiveDecimalWritable.
> reproduce steps:
>  # create table:
> create table tbl1 (ACCT_NM string, ACCT_BAL decimal(15,2)) partitioned by 
> (DL_DATA_DT string)
>  #  insert sample data:insert into tbl1 values ('acct1', 1000.00, 
> '2020-01-01');
> insert into tbl1 values ('acct1', 800.00, '2020-01-02');
>  ## Run folowing SQL:   
> {code:java}
> select
> test.ACCT_NM,
> test.DL_DATA_DT,
> test.MAX_ACCT_BAL,
> LAG(test.MAX_ACCT_BAL,1,0) OVER (PARTITION BY test.ACCT_NM ORDER BY 
> test.DL_DATA_DT) AS PREV_USED_AMT
> from (
> select
> tbl1.ACCT_NM as ACCT_NM,
> tbl1.DL_DATA_DT as DL_DATA_DT,
> max(tbl1.ACCT_BAL) as MAX_ACCT_BAL
> from tbl1
> group by tbl1.ACCT_NM, tbl1.DL_DATA_DT
> ) test;
> {code}
> Full Stack:
> ERROR : FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
> 2, vertexId=vertex_1624984332939_0003_5_01, diagnostics=[Task failed, 
> taskId=task_1624984332939_0003_5_01_21, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1624984332939_0003_5_01_21_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>  at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>  at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>  at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>  at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:304)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>  ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:378)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:294)
>  ... 17 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveDecimal 
> cannot be cast to org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:795)
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:363)
>  ... 18 more

[jira] [Updated] (HIVE-25403) Fix from_unixtime() to consider leap seconds

2021-08-11 Thread Sankar Hariappan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-25403:

Affects Version/s: 3.1.0
   3.1.1

> Fix from_unixtime() to consider leap seconds 
> -
>
> Key: HIVE-25403
> URL: https://issues.apache.org/jira/browse/HIVE-25403
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Sruthi Mooriyathvariam
>Assignee: Sruthi Mooriyathvariam
>Priority: Major
>  Labels: UDF, pull-request-available
> Fix For: 4.0.0
>
> Attachments: image-2021-07-29-14-42-49-806.png
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The Unix_timestamp() considers "leap second" while the from_unixtime is not; 
> which results in to wrong result as below:
> !image-2021-07-29-14-42-49-806.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

51 matches

Mail list logo