[jira] [Updated] (HIVE-25446) VectorMapJoinFastHashTable.validateCapacity AssertionError: Capacity must be a power of two
[ https://issues.apache.org/jira/browse/HIVE-25446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-25446: Description: Encountered this in a very large query: Caused by: java.lang.AssertionError: Capacity must be a power of two at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.validateCapacity(VectorMapJoinFastHashTable.java:60) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.(VectorMapJoinFastHashTable.java:77) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashTable.(VectorMapJoinFastBytesHashTable.java:132) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.(VectorMapJoinFastBytesHashMap.java:166) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMap.(VectorMapJoinFastStringHashMap.java:43) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.createHashTable(VectorMapJoinFastTableContainer.java:137) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.(VectorMapJoinFastTableContainer.java:86) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:122) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:344) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:413) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.lambda$initializeOp$0(MapJoinOperator.java:215) at org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:96) at org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:113) at java.util.concurrent.FutureTask.run(FutureTask.java:266) > VectorMapJoinFastHashTable.validateCapacity AssertionError: Capacity must be > a power of two > --- > > Key: HIVE-25446 > URL: https://issues.apache.org/jira/browse/HIVE-25446 > Project: Hive > Issue Type: Bug >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Major > Fix For: 4.0.0 > > > Encountered this in a very large query: > Caused by: java.lang.AssertionError: Capacity must be a power of two > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.validateCapacity(VectorMapJoinFastHashTable.java:60) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.(VectorMapJoinFastHashTable.java:77) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashTable.(VectorMapJoinFastBytesHashTable.java:132) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.(VectorMapJoinFastBytesHashMap.java:166) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMap.(VectorMapJoinFastStringHashMap.java:43) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.createHashTable(VectorMapJoinFastTableContainer.java:137) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.(VectorMapJoinFastTableContainer.java:86) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:122) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:344) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:413) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.lambda$initializeOp$0(MapJoinOperator.java:215) > at > org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:96) > at > org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:113) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25446) VectorMapJoinFastHashTable.validateCapacity AssertionError: Capacity must be a power of two
[ https://issues.apache.org/jira/browse/HIVE-25446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-25446: Environment: (was: Encountered this in a very large query: Caused by: java.lang.AssertionError: Capacity must be a power of two at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.validateCapacity(VectorMapJoinFastHashTable.java:60) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.(VectorMapJoinFastHashTable.java:77) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashTable.(VectorMapJoinFastBytesHashTable.java:132) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.(VectorMapJoinFastBytesHashMap.java:166) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMap.(VectorMapJoinFastStringHashMap.java:43) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.createHashTable(VectorMapJoinFastTableContainer.java:137) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.(VectorMapJoinFastTableContainer.java:86) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:122) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:344) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:413) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.lambda$initializeOp$0(MapJoinOperator.java:215) at org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:96) at org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:113) at java.util.concurrent.FutureTask.run(FutureTask.java:266)) > VectorMapJoinFastHashTable.validateCapacity AssertionError: Capacity must be > a power of two > --- > > Key: HIVE-25446 > URL: https://issues.apache.org/jira/browse/HIVE-25446 > Project: Hive > Issue Type: Bug >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25446) VectorMapJoinFastHashTable.validateCapacity AssertionError: Capacity must be a power of two
[ https://issues.apache.org/jira/browse/HIVE-25446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline reassigned HIVE-25446: --- > VectorMapJoinFastHashTable.validateCapacity AssertionError: Capacity must be > a power of two > --- > > Key: HIVE-25446 > URL: https://issues.apache.org/jira/browse/HIVE-25446 > Project: Hive > Issue Type: Bug > Environment: Encountered this in a very large query: > Caused by: java.lang.AssertionError: Capacity must be a power of two > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.validateCapacity(VectorMapJoinFastHashTable.java:60) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTable.(VectorMapJoinFastHashTable.java:77) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashTable.(VectorMapJoinFastBytesHashTable.java:132) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.(VectorMapJoinFastBytesHashMap.java:166) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMap.(VectorMapJoinFastStringHashMap.java:43) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.createHashTable(VectorMapJoinFastTableContainer.java:137) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.(VectorMapJoinFastTableContainer.java:86) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:122) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:344) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:413) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.lambda$initializeOp$0(MapJoinOperator.java:215) > at > org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:96) > at > org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:113) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25445) Enable JdbcStorageHandler to get password from AWS Secrets Service.
[ https://issues.apache.org/jira/browse/HIVE-25445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish JP reassigned HIVE-25445: > Enable JdbcStorageHandler to get password from AWS Secrets Service. > --- > > Key: HIVE-25445 > URL: https://issues.apache.org/jira/browse/HIVE-25445 > Project: Hive > Issue Type: New Feature > Components: HiveServer2 >Reporter: Harish JP >Assignee: Harish JP >Priority: Major > > Currently, password for JdbcStorageHandler can be set only via the password > field or keystore. This Jira is to add framework to fetch password from any > source and implement AWS Secrets Manager as a source. > > The approach takes is to use a new table property dbcp.password.uri which > will be used if password and keyfile are not available. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25444) Make tables based on storage handlers authorization (HIVE-24705) configurable.
[ https://issues.apache.org/jira/browse/HIVE-25444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25444: -- Labels: pull-request-available (was: ) > Make tables based on storage handlers authorization (HIVE-24705) configurable. > -- > > Key: HIVE-25444 > URL: https://issues.apache.org/jira/browse/HIVE-25444 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Using a config "hive.security.authorization.tables.on.storagehandlers" with > default true, we'll enable the authorization on storage handlers by default. > Authorization is disabled if this config is set to false. > Background: Previously, whenever a user is trying to create a table based on > a storage handler, the end user we are seeing in the external storage (Ex: > hbase, kafka, and druid) is ‘hive’ so we cannot really enforce the condition > in ranger on the end-user. > https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, > by enforcing a check in Apache ranger for hive service. This patch had > changes in both hive and ranger. (ranger client depends on hive changes). Now > the reason why we to make this feature configurable is that users can update > hive code but not ranger code. In that case, users see a permission denied > error when executing a statement like: {{CREATE TABLE hive_table_0(key int, > value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} > but user/admin cannot add a ranger policy in the hive because ranger code is > not updated. By making this feature configurable, we’ll unblock users from > creating tables based on storage handlers as they were previously doing. > Users can turn 'off' this config if they don't have updated the ranger code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25444) Make tables based on storage handlers authorization (HIVE-24705) configurable.
[ https://issues.apache.org/jira/browse/HIVE-25444?focusedWorklogId=637118=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637118 ] ASF GitHub Bot logged work on HIVE-25444: - Author: ASF GitHub Bot Created on: 11/Aug/21 20:43 Start Date: 11/Aug/21 20:43 Worklog Time Spent: 10m Work Description: saihemanth-cloudera opened a new pull request #2583: URL: https://github.com/apache/hive/pull/2583 … configurable. ### What changes were proposed in this pull request? Making the tables based on storage handlers authorization configurable ### Why are the changes needed? Authorization may fail if ranger code doesn't have HIVE-24705 related patch. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Local machine, Remote cluster -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637118) Remaining Estimate: 0h Time Spent: 10m > Make tables based on storage handlers authorization (HIVE-24705) configurable. > -- > > Key: HIVE-25444 > URL: https://issues.apache.org/jira/browse/HIVE-25444 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Using a config "hive.security.authorization.tables.on.storagehandlers" with > default true, we'll enable the authorization on storage handlers by default. > Authorization is disabled if this config is set to false. > Background: Previously, whenever a user is trying to create a table based on > a storage handler, the end user we are seeing in the external storage (Ex: > hbase, kafka, and druid) is ‘hive’ so we cannot really enforce the condition > in ranger on the end-user. > https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, > by enforcing a check in Apache ranger for hive service. This patch had > changes in both hive and ranger. (ranger client depends on hive changes). Now > the reason why we to make this feature configurable is that users can update > hive code but not ranger code. In that case, users see a permission denied > error when executing a statement like: {{CREATE TABLE hive_table_0(key int, > value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} > but user/admin cannot add a ranger policy in the hive because ranger code is > not updated. By making this feature configurable, we’ll unblock users from > creating tables based on storage handlers as they were previously doing. > Users can turn 'off' this config if they don't have updated the ranger code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25444) Make tables based on storage handlers authorization (HIVE-24705) configurable.
[ https://issues.apache.org/jira/browse/HIVE-25444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala updated HIVE-25444: - Description: Using a config "hive.security.authorization.tables.on.storagehandlers" with default true, we'll enable the authorization on storage handlers by default. Authorization is disabled if this config is set to false. Background: Previously, whenever a user is trying to create a table based on a storage handler, the end user we are seeing in the external storage (Ex: hbase, kafka, and druid) is ‘hive’ so we cannot really enforce the condition in ranger on the end-user. https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, by enforcing a check in Apache ranger for hive service. This patch had changes in both hive and ranger. (ranger client depends on hive changes). Now the reason why we to make this feature configurable is that users can update hive code but not ranger code. In that case, users see a permission denied error when executing a statement like: {{CREATE TABLE hive_table_0(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} but user/admin cannot add a ranger policy in the hive because ranger code is not updated. By making this feature configurable, we’ll unblock users from creating tables based on storage handlers as they were previously doing. Users can turn 'off' this config if they don't have updated the ranger code. was: Using a config "hive.security.authorization.tables.on.storagehandlers" with default true, we'll enable the authorization on storage handlers by default. Authorization is disabled if this config is set to true. Background: Previously, whenever a user is trying to create a table based on a storage handler, the end user we are seeing in the external storage (Ex: hbase, kafka, and druid) is ‘hive’ so we cannot really enforce the condition in ranger on the end-user. https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, by enforcing a check in Apache ranger for hive service. This patch had changes in both hive and ranger. (ranger client depends on hive changes.)Now the reason why I’m disabling this feature by default is that users can updated hive code but not ranger code. In that case, users see a permission denied error when executing a statement like: {{CREATE TABLE hive_table_0(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} but user/admin cannot add a ranger policy in hive because ranger code is not updated. This way we’ll unblocked users from creating tables based on storage handlers as they were previously doing.Users can turn on this config if they have updated ranger code. > Make tables based on storage handlers authorization (HIVE-24705) configurable. > -- > > Key: HIVE-25444 > URL: https://issues.apache.org/jira/browse/HIVE-25444 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > > Using a config "hive.security.authorization.tables.on.storagehandlers" with > default true, we'll enable the authorization on storage handlers by default. > Authorization is disabled if this config is set to false. > Background: Previously, whenever a user is trying to create a table based on > a storage handler, the end user we are seeing in the external storage (Ex: > hbase, kafka, and druid) is ‘hive’ so we cannot really enforce the condition > in ranger on the end-user. > https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, > by enforcing a check in Apache ranger for hive service. This patch had > changes in both hive and ranger. (ranger client depends on hive changes). Now > the reason why we to make this feature configurable is that users can update > hive code but not ranger code. In that case, users see a permission denied > error when executing a statement like: {{CREATE TABLE hive_table_0(key int, > value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} > but user/admin cannot add a ranger policy in the hive because ranger code is > not updated. By making this feature configurable, we’ll unblock users from > creating tables based on storage handlers as they were previously doing. > Users can turn 'off' this config if they don't have updated the ranger code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21614) Derby does not support CLOB comparisons
[ https://issues.apache.org/jira/browse/HIVE-21614?focusedWorklogId=637098=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637098 ] ASF GitHub Bot logged work on HIVE-21614: - Author: ASF GitHub Bot Created on: 11/Aug/21 20:16 Start Date: 11/Aug/21 20:16 Worklog Time Spent: 10m Work Description: hankfanchiu commented on pull request #2484: URL: https://github.com/apache/hive/pull/2484#issuecomment-897122168 @pvary, is this good to go? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637098) Time Spent: 1h (was: 50m) > Derby does not support CLOB comparisons > --- > > Key: HIVE-21614 > URL: https://issues.apache.org/jira/browse/HIVE-21614 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.3.4, 3.0.0 >Reporter: Vlad Rozov >Assignee: Hank Fanchiu >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > HiveMetaStoreClient.listTableNamesByFilter() with non empty filter causes > exception with Derby DB: > {noformat} > Caused by: ERROR 42818: Comparisons between 'CLOB (UCS_BASIC)' and 'CLOB > (UCS_BASIC)' are not supported. Types must be comparable. String types must > also have matching collation. If collation does not match, a possible > solution is to cast operands to force them to the default collation (e.g. > SELECT tablename FROM sys.systables WHERE CAST(tablename AS VARCHAR(128)) = > 'T1') > at org.apache.derby.iapi.error.StandardException.newException(Unknown > Source) > at org.apache.derby.iapi.error.StandardException.newException(Unknown > Source) > at > org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindComparisonOperator(Unknown > Source) > at > org.apache.derby.impl.sql.compile.BinaryComparisonOperatorNode.bindExpression(Unknown > Source) > at > org.apache.derby.impl.sql.compile.BinaryOperatorNode.bindExpression(Unknown > Source) > at > org.apache.derby.impl.sql.compile.BinaryLogicalOperatorNode.bindExpression(Unknown > Source) > at org.apache.derby.impl.sql.compile.AndNode.bindExpression(Unknown > Source) > at org.apache.derby.impl.sql.compile.SelectNode.bindExpressions(Unknown > Source) > at > org.apache.derby.impl.sql.compile.DMLStatementNode.bindExpressions(Unknown > Source) > at org.apache.derby.impl.sql.compile.DMLStatementNode.bind(Unknown > Source) > at org.apache.derby.impl.sql.compile.CursorNode.bindStatement(Unknown > Source) > at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source) > at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source) > at > org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown > Source) > ... 42 more > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25275) OOM during query planning due to HiveJoinPushTransitivePredicatesRule matching infinitely
[ https://issues.apache.org/jira/browse/HIVE-25275?focusedWorklogId=637071=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637071 ] ASF GitHub Bot logged work on HIVE-25275: - Author: ASF GitHub Bot Created on: 11/Aug/21 18:54 Start Date: 11/Aug/21 18:54 Worklog Time Spent: 10m Work Description: asolimando opened a new pull request #2582: URL: https://github.com/apache/hive/pull/2582 ### Organization of the PRs commits: 1. qtest with reproducer added 2. fix for the HiveJoinAddNotNullRule rule 3. fix for the HiveFilterProjectTransposeRule FYI: few minor warnings were addressed in the commits for the two rules. ### What changes were proposed in this pull request? Redundant _IS NOT NULL_ predicates created by either _HiveJoinAddNotNullRule_ or _HiveFilterProjectTransposeRule_ are infinitely pulled up by _HiveJoinPushTransitivePredicatesRule_ and then pushed down again by the two aforementioned rules, the PR addresses the problem by fixing the two rules as follows: - HiveFilterProjectTransposeRule - do not push-down _IS NOT NULL(EXPR($i))_ if _IS NOT NULL($i)_ already exists down the subtree - HiveJoinAddNotNullRule - preventing _IS_NOT_NULL(EXPR($i))_ in presence of _IS_NOT_NULL($i)_ _IS NOT NULL(EXPR($i))_ predicates are generally redundant in presence of if _IS NOT NULL($i)_, and might even hinder performance when a filter tests complex expressions. ### Why are the changes needed? Query planning runs infinitely when some extra IS NOT NULL predicates are created. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The first commit adds a qtest with the reproducer attached to the ticket, all qtests were run locally with no issues identified. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637071) Remaining Estimate: 0h Time Spent: 10m > OOM during query planning due to HiveJoinPushTransitivePredicatesRule > matching infinitely > - > > Key: HIVE-25275 > URL: https://issues.apache.org/jira/browse/HIVE-25275 > Project: Hive > Issue Type: Bug >Reporter: László Pintér >Assignee: Stamatis Zampetakis >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > While running the following query OOM is raised during the planning phase > {code:sql} > CREATE TABLE A (`value_date` date) STORED AS ORC; > CREATE TABLE B (`business_date` date) STORED AS ORC; > SELECT A.VALUE_DATE > FROM A, B > WHERE A.VALUE_DATE = BUSINESS_DATE > AND A.VALUE_DATE = TRUNC(BUSINESS_DATE, 'MONTH'); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25275) OOM during query planning due to HiveJoinPushTransitivePredicatesRule matching infinitely
[ https://issues.apache.org/jira/browse/HIVE-25275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25275: -- Labels: pull-request-available (was: ) > OOM during query planning due to HiveJoinPushTransitivePredicatesRule > matching infinitely > - > > Key: HIVE-25275 > URL: https://issues.apache.org/jira/browse/HIVE-25275 > Project: Hive > Issue Type: Bug >Reporter: László Pintér >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > While running the following query OOM is raised during the planning phase > {code:sql} > CREATE TABLE A (`value_date` date) STORED AS ORC; > CREATE TABLE B (`business_date` date) STORED AS ORC; > SELECT A.VALUE_DATE > FROM A, B > WHERE A.VALUE_DATE = BUSINESS_DATE > AND A.VALUE_DATE = TRUNC(BUSINESS_DATE, 'MONTH'); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25444) Make tables based on storage handlers authorization (HIVE-24705) configurable.
[ https://issues.apache.org/jira/browse/HIVE-25444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala updated HIVE-25444: - Description: Using a config "hive.security.authorization.tables.on.storagehandlers" with default true, we'll enable the authorization on storage handlers by default. Authorization is disabled if this config is set to true. Background: Previously, whenever a user is trying to create a table based on a storage handler, the end user we are seeing in the external storage (Ex: hbase, kafka, and druid) is ‘hive’ so we cannot really enforce the condition in ranger on the end-user. https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, by enforcing a check in Apache ranger for hive service. This patch had changes in both hive and ranger. (ranger client depends on hive changes.)Now the reason why I’m disabling this feature by default is that users can updated hive code but not ranger code. In that case, users see a permission denied error when executing a statement like: {{CREATE TABLE hive_table_0(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} but user/admin cannot add a ranger policy in hive because ranger code is not updated. This way we’ll unblocked users from creating tables based on storage handlers as they were previously doing.Users can turn on this config if they have updated ranger code. was: Using a config "hive.security.authorization.tables.on.storagehandlers" with default false, we'll disable the authorization on storage handlers by default. Authorization is enabled if this config is set to true. Background: Previously, whenever a user is trying to create a table based on a storage handler, the end user we are seeing in the external storage (Ex: hbase, kafka, and druid) is ‘hive’ so we cannot really enforce the condition in ranger on the end-user. https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, by enforcing a check in Apache ranger for hive service. This patch had changes in both hive and ranger. (ranger client depends on hive changes.)Now the reason why I’m disabling this feature by default is that users can updated hive code but not ranger code. In that case, users see a permission denied error when executing a statement like: {{CREATE TABLE hive_table_0(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} but user/admin cannot add a ranger policy in hive because ranger code is not updated. This way we’ll unblocked users from creating tables based on storage handlers as they were previously doing.Users can turn on this config if they have updated ranger code. > Make tables based on storage handlers authorization (HIVE-24705) configurable. > -- > > Key: HIVE-25444 > URL: https://issues.apache.org/jira/browse/HIVE-25444 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > > Using a config "hive.security.authorization.tables.on.storagehandlers" with > default true, we'll enable the authorization on storage handlers by default. > Authorization is disabled if this config is set to true. > Background: Previously, whenever a user is trying to create a table based on > a storage handler, the end user we are seeing in the external storage (Ex: > hbase, kafka, and druid) is ‘hive’ so we cannot really enforce the condition > in ranger on the end-user. > https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, > by enforcing a check in Apache ranger for hive service. This patch had > changes in both hive and ranger. (ranger client depends on hive changes.)Now > the reason why I’m disabling this feature by default is that users can > updated hive code but not ranger code. In that case, users see a permission > denied error when executing a statement like: {{CREATE TABLE hive_table_0(key > int, value string) STORED BY > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} but user/admin cannot > add a ranger policy in hive because ranger code is not updated. This way > we’ll unblocked users from creating tables based on storage handlers as they > were previously doing.Users can turn on this config if they have updated > ranger code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25444) Make tables based on storage handlers authorization (HIVE-24705) configurable.
[ https://issues.apache.org/jira/browse/HIVE-25444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala updated HIVE-25444: - Summary: Make tables based on storage handlers authorization (HIVE-24705) configurable. (was: Use a config to disable authorization on tables based on storage handlers by default.) > Make tables based on storage handlers authorization (HIVE-24705) configurable. > -- > > Key: HIVE-25444 > URL: https://issues.apache.org/jira/browse/HIVE-25444 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > > Using a config "hive.security.authorization.tables.on.storagehandlers" with > default false, we'll disable the authorization on storage handlers by > default. Authorization is enabled if this config is set to true. > Background: Previously, whenever a user is trying to create a table based on > a storage handler, the end user we are seeing in the external storage (Ex: > hbase, kafka, and druid) is ‘hive’ so we cannot really enforce the condition > in ranger on the end-user. > https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, > by enforcing a check in Apache ranger for hive service. This patch had > changes in both hive and ranger. (ranger client depends on hive changes.)Now > the reason why I’m disabling this feature by default is that users can > updated hive code but not ranger code. In that case, users see a permission > denied error when executing a statement like: {{CREATE TABLE hive_table_0(key > int, value string) STORED BY > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} but user/admin cannot > add a ranger policy in hive because ranger code is not updated. This way > we’ll unblocked users from creating tables based on storage handlers as they > were previously doing.Users can turn on this config if they have updated > ranger code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25444) Use a config to disable authorization on tables based on storage handlers by default.
[ https://issues.apache.org/jira/browse/HIVE-25444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala updated HIVE-25444: - Description: Using a config "hive.security.authorization.tables.on.storagehandlers" with default false, we'll disable the authorization on storage handlers by default. Authorization is enabled if this config is set to true. Background: Previously, whenever a user is trying to create a table based on a storage handler, the end user we are seeing in the external storage (Ex: hbase, kafka, and druid) is ‘hive’ so we cannot really enforce the condition in ranger on the end-user. https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, by enforcing a check in Apache ranger for hive service. This patch had changes in both hive and ranger. (ranger client depends on hive changes.)Now the reason why I’m disabling this feature by default is that users can updated hive code but not ranger code. In that case, users see a permission denied error when executing a statement like: {{CREATE TABLE hive_table_0(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} but user/admin cannot add a ranger policy in hive because ranger code is not updated. This way we’ll unblocked users from creating tables based on storage handlers as they were previously doing.Users can turn on this config if they have updated ranger code. was: Using a config "hive.security.authorization.tables.on.storagehandlers" with a default false, we'll disable the authorization on storage handlers by default. Authorization is enabled if this config is set to true. Back > Use a config to disable authorization on tables based on storage handlers by > default. > - > > Key: HIVE-25444 > URL: https://issues.apache.org/jira/browse/HIVE-25444 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > > Using a config "hive.security.authorization.tables.on.storagehandlers" with > default false, we'll disable the authorization on storage handlers by > default. Authorization is enabled if this config is set to true. > Background: Previously, whenever a user is trying to create a table based on > a storage handler, the end user we are seeing in the external storage (Ex: > hbase, kafka, and druid) is ‘hive’ so we cannot really enforce the condition > in ranger on the end-user. > https://issues.apache.org/jira/browse/HIVE-24705 solved this security issue, > by enforcing a check in Apache ranger for hive service. This patch had > changes in both hive and ranger. (ranger client depends on hive changes.)Now > the reason why I’m disabling this feature by default is that users can > updated hive code but not ranger code. In that case, users see a permission > denied error when executing a statement like: {{CREATE TABLE hive_table_0(key > int, value string) STORED BY > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'}} but user/admin cannot > add a ranger policy in hive because ranger code is not updated. This way > we’ll unblocked users from creating tables based on storage handlers as they > were previously doing.Users can turn on this config if they have updated > ranger code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25444) Use a config to disable authorization on tables based on storage handlers by default.
[ https://issues.apache.org/jira/browse/HIVE-25444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala updated HIVE-25444: - Summary: Use a config to disable authorization on tables based on storage handlers by default. (was: Use a config to disable authorization on storage handlers by default.) > Use a config to disable authorization on tables based on storage handlers by > default. > - > > Key: HIVE-25444 > URL: https://issues.apache.org/jira/browse/HIVE-25444 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > > Using a config "hive.security.authorization.tables.on.storagehandlers" with a > default false, we'll disable the authorization on storage handlers by > default. Authorization is enabled if this config is set to true. > Back -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25444) Use a config to disable authorization on storage handlers by default.
[ https://issues.apache.org/jira/browse/HIVE-25444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala updated HIVE-25444: - Description: Using a config "hive.security.authorization.tables.on.storagehandlers" with a default false, we'll disable the authorization on storage handlers by default. Authorization is enabled if this config is set to true. Back was:Using a config "hive.security.authorization.tables.on.storagehandlers" with a default false, we'll enable the authorization on storage handlers by default. Authorization is enabled if this config is set to true. > Use a config to disable authorization on storage handlers by default. > - > > Key: HIVE-25444 > URL: https://issues.apache.org/jira/browse/HIVE-25444 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > > Using a config "hive.security.authorization.tables.on.storagehandlers" with a > default false, we'll disable the authorization on storage handlers by > default. Authorization is enabled if this config is set to true. > Back -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24705) Create/Alter/Drop tables based on storage handlers in HS2 should be authorized by Ranger/Sentry
[ https://issues.apache.org/jira/browse/HIVE-24705?focusedWorklogId=637003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637003 ] ASF GitHub Bot logged work on HIVE-24705: - Author: ASF GitHub Bot Created on: 11/Aug/21 17:37 Start Date: 11/Aug/21 17:37 Worklog Time Spent: 10m Work Description: saihemanth-cloudera closed pull request #1960: URL: https://github.com/apache/hive/pull/1960 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637003) Time Spent: 1h 50m (was: 1h 40m) > Create/Alter/Drop tables based on storage handlers in HS2 should be > authorized by Ranger/Sentry > --- > > Key: HIVE-24705 > URL: https://issues.apache.org/jira/browse/HIVE-24705 > Project: Hive > Issue Type: Improvement >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > With doAs=false in Hive3.x, whenever a user is trying to create a table based > on storage handlers on external storage for ex: HBase table, the end user we > are seeing is hive so we cannot really enforce the condition in Apache > Ranger/Sentry on the end-user. So, we need to enforce this condition in the > hive in the event of create/alter/drop tables based on storage handlers. > Built-in hive storage handlers like HbaseStorageHandler, KafkaStorageHandler > e.t.c should implement a method getURIForAuthentication() which returns a URI > that is formed from table properties. This URI can be sent for authorization > to Ranger/Sentry. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25444) Use a config to disable authorization on storage handlers by default.
[ https://issues.apache.org/jira/browse/HIVE-25444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala reassigned HIVE-25444: > Use a config to disable authorization on storage handlers by default. > - > > Key: HIVE-25444 > URL: https://issues.apache.org/jira/browse/HIVE-25444 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > > Using a config "hive.security.authorization.tables.on.storagehandlers" with a > default false, we'll enable the authorization on storage handlers by default. > Authorization is enabled if this config is set to true. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25441) Incorrect deltas split for sub-compactions when using `hive.compactor.max.num.delta`
[ https://issues.apache.org/jira/browse/HIVE-25441?focusedWorklogId=636954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636954 ] ASF GitHub Bot logged work on HIVE-25441: - Author: ASF GitHub Bot Created on: 11/Aug/21 15:52 Start Date: 11/Aug/21 15:52 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #2579: URL: https://github.com/apache/hive/pull/2579#discussion_r686962046 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java ## @@ -234,12 +234,24 @@ void run(HiveConf conf, String jobName, Table t, Partition p, StorageDescriptor "especially if this message repeats. Check that compaction is running properly. Check for any " + "runaway/mis-configured process writing to ACID tables, especially using Streaming Ingest API."); int numMinorCompactions = parsedDeltas.size() / maxDeltasToHandle; + parsedDeltas.sort(AcidUtils.ParsedDeltaLight::compareTo); + + int start = 0; + int end = maxDeltasToHandle; + for (int jobSubId = 0; jobSubId < numMinorCompactions; jobSubId++) { +while (parsedDeltas.get(end).getMinWriteId() == parsedDeltas.get(end - 1).getMinWriteId() && + parsedDeltas.get(end).getMaxWriteId() == parsedDeltas.get(end - 1).getMaxWriteId()) { Review comment: Oh, thank you for explaining! It might be nice to include a comment about this – but that's just a suggestion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636954) Time Spent: 1h (was: 50m) > Incorrect deltas split for sub-compactions when using > `hive.compactor.max.num.delta` > > > Key: HIVE-25441 > URL: https://issues.apache.org/jira/browse/HIVE-25441 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > {code} > #Repro steps: > #1./ set hive.compactor.max.num.delta to 5 on HMS > #2./ Set up the table > set hive.merge.cardinality.check=false; > create table test (k int); > ALTER TABLE test SET TBLPROPERTIES ('NO_AUTO_COMPACTION'='true'); > insert into test values (1); > alter table test compact 'major' and wait; > dfs -ls '/warehouse/tablespace/managed/hive/test'; > # drwxrwx---+ - hive hive 0 2021-08-09 12:26 > /warehouse/tablespace/managed/hive/test/base_008_v416 > select * from test; > # k=1 > #run 3 times so there's enough delta dirs, ie. 6 (should just increase k by 1) > #basically just removes the row and adds a new row with k+1 value > MERGE INTO test AS T USING (select * from test union all select k+1 from > test) AS S > ON T.k=s.k > WHEN MATCHED THEN DELETE > WHEN not MATCHED THEN INSERT values (s.k); > select * from test; > #k=4 > dfs -ls '/warehouse/tablespace/managed/hive/test'; > #drwxrwx---+ - hive hive 0 2021-08-09 12:26 > /warehouse/tablespace/managed/hive/test/base_008_v416 > #drwxrwx---+ - hive hive 0 2021-08-09 12:28 > /warehouse/tablespace/managed/hive/test/delete_delta_009_009_0001 > #drwxrwx---+ - hive hive 0 2021-08-09 12:29 > /warehouse/tablespace/managed/hive/test/delete_delta_010_010_0001 > #drwxrwx---+ - hive hive 0 2021-08-09 12:29 > /warehouse/tablespace/managed/hive/test/delete_delta_011_011_0001 > #drwxrwx---+ - hive hive 0 2021-08-09 12:28 > /warehouse/tablespace/managed/hive/test/delta_009_009_0003 > #drwxrwx---+ - hive hive 0 2021-08-09 12:29 > /warehouse/tablespace/managed/hive/test/delta_010_010_0003 > #drwxrwx---+ - hive hive 0 2021-08-09 12:29 > /warehouse/tablespace/managed/hive/test/delta_011_011_0003 > alter table test compact 'major' and wait; > select * from test; > #result is empty > dfs -ls '/warehouse/tablespace/managed/hive/test'; > #2drwxrwx---+ - hive hive 0 2021-08-09 12:31 > /warehouse/tablespace/managed/hive/test/base_011_v428 > {code} > Some logs from the above example: > {code} > 2021-08-09 12:30:37,532 WARN > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR: > [nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49_executor]: 6 delta files > found for default.test located at > hdfs://nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site:8020/warehouse/tablespace/managed/hive/test! > This is likely a sign
[jira] [Work logged] (HIVE-25441) Incorrect deltas split for sub-compactions when using `hive.compactor.max.num.delta`
[ https://issues.apache.org/jira/browse/HIVE-25441?focusedWorklogId=636952=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636952 ] ASF GitHub Bot logged work on HIVE-25441: - Author: ASF GitHub Bot Created on: 11/Aug/21 15:50 Start Date: 11/Aug/21 15:50 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2579: URL: https://github.com/apache/hive/pull/2579#discussion_r686960430 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java ## @@ -781,6 +782,67 @@ public void autoCompactOnStreamingIngestWithDynamicPartition() throws Exception } } + @Test + public void testNoDataLossWhenMaxNumDeltaIsUsed() throws Exception { +String dbName = "default"; +String tblName = "cws"; +executeStatementOnDriver("drop table if exists " + tblName, driver); + +executeStatementOnDriver("CREATE TABLE " + tblName + "(a INT, b STRING) " + + " STORED AS ORC TBLPROPERTIES ('transactional'='true')", driver); +executeStatementOnDriver("insert into " + tblName + " values (1, 'a')", driver); +executeStatementOnDriver("insert into " + tblName + " values (3, 'b')", driver); + +runMajorCompaction(dbName, tblName); +runCleaner(conf); + +for (int i = 0; i < 3; i++) { + executeStatementOnDriver("MERGE INTO " + tblName + " AS T USING (" + +"select * from " + tblName + " union all select a+1, b from " + tblName + ") AS S " + +"ON T.a=s.a " + +"WHEN MATCHED THEN DELETE " + +"WHEN not MATCHED THEN INSERT values (s.a, s.b)", driver); +} + +driver.run("select a from " + tblName); +List res = new ArrayList<>(); +driver.getFetchTask().fetch(res); +Assert.assertEquals(res, Arrays.asList("4", "6")); + +conf.setIntVar(HiveConf.ConfVars.COMPACTOR_MAX_NUM_DELTA, 5); +runMajorCompaction(dbName, tblName); + +List matchesNotFound = new ArrayList<>(5); +matchesNotFound.add(AcidUtils.deleteDeltaSubdir(3, 4) + VISIBILITY_PATTERN); +matchesNotFound.add(AcidUtils.deltaSubdir(3, 4) + VISIBILITY_PATTERN); +matchesNotFound.add(AcidUtils.deleteDeltaSubdir(5, 5, 0)); +matchesNotFound.add(AcidUtils.deltaSubdir(5, 5, 1)); +matchesNotFound.add(AcidUtils.baseDir(5) + VISIBILITY_PATTERN); + +IMetaStoreClient msClient = new HiveMetaStoreClient(conf); +Table table = msClient.getTable(dbName, tblName); +msClient.close(); + +FileSystem fs = FileSystem.get(conf); +FileStatus[] stat = fs.listStatus(new Path(table.getSd().getLocation())); + +for (FileStatus f : stat) { + for (int j = 0; j < matchesNotFound.size(); j++) { +if (f.getPath().getName().matches(matchesNotFound.get(j))) { + matchesNotFound.remove(j); + break; +} + } +} +Assert.assertEquals("Matches Not Found: " + matchesNotFound.toArray(), 0, matchesNotFound.size()); Review comment: fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636952) Time Spent: 50m (was: 40m) > Incorrect deltas split for sub-compactions when using > `hive.compactor.max.num.delta` > > > Key: HIVE-25441 > URL: https://issues.apache.org/jira/browse/HIVE-25441 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > {code} > #Repro steps: > #1./ set hive.compactor.max.num.delta to 5 on HMS > #2./ Set up the table > set hive.merge.cardinality.check=false; > create table test (k int); > ALTER TABLE test SET TBLPROPERTIES ('NO_AUTO_COMPACTION'='true'); > insert into test values (1); > alter table test compact 'major' and wait; > dfs -ls '/warehouse/tablespace/managed/hive/test'; > # drwxrwx---+ - hive hive 0 2021-08-09 12:26 > /warehouse/tablespace/managed/hive/test/base_008_v416 > select * from test; > # k=1 > #run 3 times so there's enough delta dirs, ie. 6 (should just increase k by 1) > #basically just removes the row and adds a new row with k+1 value > MERGE INTO test AS T USING (select * from test union all select k+1 from > test) AS S > ON T.k=s.k > WHEN MATCHED THEN DELETE > WHEN not MATCHED THEN INSERT values (s.k); > select * from test; > #k=4 > dfs -ls '/warehouse/tablespace/managed/hive/test'; > #drwxrwx---+ - hive hive 0
[jira] [Work logged] (HIVE-25441) Incorrect deltas split for sub-compactions when using `hive.compactor.max.num.delta`
[ https://issues.apache.org/jira/browse/HIVE-25441?focusedWorklogId=636951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636951 ] ASF GitHub Bot logged work on HIVE-25441: - Author: ASF GitHub Bot Created on: 11/Aug/21 15:44 Start Date: 11/Aug/21 15:44 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #2579: URL: https://github.com/apache/hive/pull/2579#discussion_r686955507 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java ## @@ -234,12 +234,24 @@ void run(HiveConf conf, String jobName, Table t, Partition p, StorageDescriptor "especially if this message repeats. Check that compaction is running properly. Check for any " + "runaway/mis-configured process writing to ACID tables, especially using Streaming Ingest API."); int numMinorCompactions = parsedDeltas.size() / maxDeltasToHandle; + parsedDeltas.sort(AcidUtils.ParsedDeltaLight::compareTo); + + int start = 0; + int end = maxDeltasToHandle; + for (int jobSubId = 0; jobSubId < numMinorCompactions; jobSubId++) { +while (parsedDeltas.get(end).getMinWriteId() == parsedDeltas.get(end - 1).getMinWriteId() && + parsedDeltas.get(end).getMaxWriteId() == parsedDeltas.get(end - 1).getMaxWriteId()) { Review comment: it removes deltas for statements that went of the batch size. ( [... delta_5_5_1 ] [delta_5_5_2, ...]) - we should process multiple statements for the same writeIds range in the same batch, in this scenario delta_5_5_1 should be included in the next batch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636951) Time Spent: 40m (was: 0.5h) > Incorrect deltas split for sub-compactions when using > `hive.compactor.max.num.delta` > > > Key: HIVE-25441 > URL: https://issues.apache.org/jira/browse/HIVE-25441 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > {code} > #Repro steps: > #1./ set hive.compactor.max.num.delta to 5 on HMS > #2./ Set up the table > set hive.merge.cardinality.check=false; > create table test (k int); > ALTER TABLE test SET TBLPROPERTIES ('NO_AUTO_COMPACTION'='true'); > insert into test values (1); > alter table test compact 'major' and wait; > dfs -ls '/warehouse/tablespace/managed/hive/test'; > # drwxrwx---+ - hive hive 0 2021-08-09 12:26 > /warehouse/tablespace/managed/hive/test/base_008_v416 > select * from test; > # k=1 > #run 3 times so there's enough delta dirs, ie. 6 (should just increase k by 1) > #basically just removes the row and adds a new row with k+1 value > MERGE INTO test AS T USING (select * from test union all select k+1 from > test) AS S > ON T.k=s.k > WHEN MATCHED THEN DELETE > WHEN not MATCHED THEN INSERT values (s.k); > select * from test; > #k=4 > dfs -ls '/warehouse/tablespace/managed/hive/test'; > #drwxrwx---+ - hive hive 0 2021-08-09 12:26 > /warehouse/tablespace/managed/hive/test/base_008_v416 > #drwxrwx---+ - hive hive 0 2021-08-09 12:28 > /warehouse/tablespace/managed/hive/test/delete_delta_009_009_0001 > #drwxrwx---+ - hive hive 0 2021-08-09 12:29 > /warehouse/tablespace/managed/hive/test/delete_delta_010_010_0001 > #drwxrwx---+ - hive hive 0 2021-08-09 12:29 > /warehouse/tablespace/managed/hive/test/delete_delta_011_011_0001 > #drwxrwx---+ - hive hive 0 2021-08-09 12:28 > /warehouse/tablespace/managed/hive/test/delta_009_009_0003 > #drwxrwx---+ - hive hive 0 2021-08-09 12:29 > /warehouse/tablespace/managed/hive/test/delta_010_010_0003 > #drwxrwx---+ - hive hive 0 2021-08-09 12:29 > /warehouse/tablespace/managed/hive/test/delta_011_011_0003 > alter table test compact 'major' and wait; > select * from test; > #result is empty > dfs -ls '/warehouse/tablespace/managed/hive/test'; > #2drwxrwx---+ - hive hive 0 2021-08-09 12:31 > /warehouse/tablespace/managed/hive/test/base_011_v428 > {code} > Some logs from the above example: > {code} > 2021-08-09 12:30:37,532 WARN > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR: > [nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49_executor]: 6 delta files > found
[jira] [Work logged] (HIVE-25441) Incorrect deltas split for sub-compactions when using `hive.compactor.max.num.delta`
[ https://issues.apache.org/jira/browse/HIVE-25441?focusedWorklogId=636928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636928 ] ASF GitHub Bot logged work on HIVE-25441: - Author: ASF GitHub Bot Created on: 11/Aug/21 15:10 Start Date: 11/Aug/21 15:10 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #2579: URL: https://github.com/apache/hive/pull/2579#discussion_r686914879 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java ## @@ -781,6 +782,67 @@ public void autoCompactOnStreamingIngestWithDynamicPartition() throws Exception } } + @Test + public void testNoDataLossWhenMaxNumDeltaIsUsed() throws Exception { +String dbName = "default"; +String tblName = "cws"; +executeStatementOnDriver("drop table if exists " + tblName, driver); + +executeStatementOnDriver("CREATE TABLE " + tblName + "(a INT, b STRING) " + + " STORED AS ORC TBLPROPERTIES ('transactional'='true')", driver); +executeStatementOnDriver("insert into " + tblName + " values (1, 'a')", driver); +executeStatementOnDriver("insert into " + tblName + " values (3, 'b')", driver); + +runMajorCompaction(dbName, tblName); +runCleaner(conf); + +for (int i = 0; i < 3; i++) { + executeStatementOnDriver("MERGE INTO " + tblName + " AS T USING (" + +"select * from " + tblName + " union all select a+1, b from " + tblName + ") AS S " + +"ON T.a=s.a " + +"WHEN MATCHED THEN DELETE " + +"WHEN not MATCHED THEN INSERT values (s.a, s.b)", driver); +} + +driver.run("select a from " + tblName); +List res = new ArrayList<>(); +driver.getFetchTask().fetch(res); +Assert.assertEquals(res, Arrays.asList("4", "6")); + +conf.setIntVar(HiveConf.ConfVars.COMPACTOR_MAX_NUM_DELTA, 5); +runMajorCompaction(dbName, tblName); + +List matchesNotFound = new ArrayList<>(5); +matchesNotFound.add(AcidUtils.deleteDeltaSubdir(3, 4) + VISIBILITY_PATTERN); +matchesNotFound.add(AcidUtils.deltaSubdir(3, 4) + VISIBILITY_PATTERN); +matchesNotFound.add(AcidUtils.deleteDeltaSubdir(5, 5, 0)); +matchesNotFound.add(AcidUtils.deltaSubdir(5, 5, 1)); +matchesNotFound.add(AcidUtils.baseDir(5) + VISIBILITY_PATTERN); + +IMetaStoreClient msClient = new HiveMetaStoreClient(conf); +Table table = msClient.getTable(dbName, tblName); +msClient.close(); + +FileSystem fs = FileSystem.get(conf); +FileStatus[] stat = fs.listStatus(new Path(table.getSd().getLocation())); + +for (FileStatus f : stat) { + for (int j = 0; j < matchesNotFound.size(); j++) { +if (f.getPath().getName().matches(matchesNotFound.get(j))) { + matchesNotFound.remove(j); + break; +} + } +} +Assert.assertEquals("Matches Not Found: " + matchesNotFound.toArray(), 0, matchesNotFound.size()); Review comment: matchesNotFound.toArray() should be Arrays.toString(matchesNotFound.toArray()) to display contents ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java ## @@ -234,12 +234,24 @@ void run(HiveConf conf, String jobName, Table t, Partition p, StorageDescriptor "especially if this message repeats. Check that compaction is running properly. Check for any " + "runaway/mis-configured process writing to ACID tables, especially using Streaming Ingest API."); int numMinorCompactions = parsedDeltas.size() / maxDeltasToHandle; + parsedDeltas.sort(AcidUtils.ParsedDeltaLight::compareTo); + + int start = 0; + int end = maxDeltasToHandle; + for (int jobSubId = 0; jobSubId < numMinorCompactions; jobSubId++) { +while (parsedDeltas.get(end).getMinWriteId() == parsedDeltas.get(end - 1).getMinWriteId() && + parsedDeltas.get(end).getMaxWriteId() == parsedDeltas.get(end - 1).getMaxWriteId()) { Review comment: Sorry I don't get this part. What does it do? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636928) Time Spent: 0.5h (was: 20m) > Incorrect deltas split for sub-compactions when using > `hive.compactor.max.num.delta` > > > Key: HIVE-25441 > URL: https://issues.apache.org/jira/browse/HIVE-25441 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Priority: Major >
[jira] [Work logged] (HIVE-25429) Delta metrics collection may cause number of tez counters to exceed tez.counters.max limit
[ https://issues.apache.org/jira/browse/HIVE-25429?focusedWorklogId=636917=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636917 ] ASF GitHub Bot logged work on HIVE-25429: - Author: ASF GitHub Bot Created on: 11/Aug/21 14:19 Start Date: 11/Aug/21 14:19 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #2563: URL: https://github.com/apache/hive/pull/2563#discussion_r686876814 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java ## @@ -136,67 +133,77 @@ public static synchronized void init(HiveConf conf) throws Exception { private void configure(HiveConf conf) throws Exception { acidMetricsExtEnabled = MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON); +if (acidMetricsExtEnabled) { -deltasThreshold = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD); -obsoleteDeltasThreshold = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD); - -initCachesForMetrics(conf); -initObjectsForMetrics(); + initCachesForMetrics(conf); + initObjectsForMetrics(); -long reportingInterval = HiveConf.getTimeVar(conf, - HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, TimeUnit.SECONDS); + long reportingInterval = + HiveConf.getTimeVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, TimeUnit.SECONDS); -ThreadFactory threadFactory = - new ThreadFactoryBuilder() -.setDaemon(true) -.setNameFormat("DeltaFilesMetricReporter %d") -.build(); -executorService = Executors.newSingleThreadScheduledExecutor(threadFactory); -executorService.scheduleAtFixedRate( - new ReportingTask(), 0, reportingInterval, TimeUnit.SECONDS); + ThreadFactory threadFactory = + new ThreadFactoryBuilder().setDaemon(true).setNameFormat("DeltaFilesMetricReporter %d").build(); + executorService = Executors.newSingleThreadScheduledExecutor(threadFactory); + executorService.scheduleAtFixedRate(new ReportingTask(), 0, reportingInterval, TimeUnit.SECONDS); -LOG.info("Started DeltaFilesMetricReporter thread"); + LOG.info("Started DeltaFilesMetricReporter thread"); +} } public void submit(TezCounters counters) { if (acidMetricsExtEnabled) { - updateMetrics(NUM_OBSOLETE_DELTAS, -obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, -counters); - updateMetrics(NUM_DELTAS, -deltaCache, deltaTopN, deltasThreshold, -counters); - updateMetrics(NUM_SMALL_DELTAS, -smallDeltaCache, smallDeltaTopN, deltasThreshold, -counters); + updateMetrics(NUM_OBSOLETE_DELTAS, obsoleteDeltaCache, obsoleteDeltaTopN, counters); + updateMetrics(NUM_DELTAS, deltaCache, deltaTopN, counters); + updateMetrics(NUM_SMALL_DELTAS, smallDeltaCache, smallDeltaTopN, counters); } } + /** + * Copy counters to caches. + */ private void updateMetrics(DeltaFilesMetricType metric, Cache cache, Queue> topN, -int threshold, TezCounters counters) { -counters.getGroup(metric.value).forEach(counter -> { - Integer prev = cache.getIfPresent(counter.getName()); - if (prev != null && prev != counter.getValue()) { -cache.invalidate(counter.getName()); + TezCounters counters) { +try { + CounterGroup group = counters.getGroup(metric.value); + // if the group is empty, clear the cache + if (group.size() == 0) { +cache.invalidateAll(); + } else { +// if there is no counter corresponding to a cache entry, remove from cache +ConcurrentMap cacheMap = cache.asMap(); +cacheMap.keySet().stream().filter(key -> counters.findCounter(group.getName(), key).getValue() == 0) +.forEach(cache::invalidate); } - if (counter.getValue() > threshold) { -if (topN.size() == maxCacheSize) { - Pair lowest = topN.peek(); - if (lowest != null && counter.getValue() > lowest.getValue()) { -cache.invalidate(lowest.getKey()); - } -} -if (topN.size() < maxCacheSize) { - topN.add(Pair.of(counter.getName(), (int) counter.getValue())); - cache.put(counter.getName(), (int) counter.getValue()); + // update existing cache entries or add new entries + for (TezCounter counter : group) { +Integer prev = cache.getIfPresent(counter.getName()); +if (prev != null && prev != counter.getValue()) { + cache.invalidate(counter.getName()); } +topN.add(Pair.of(counter.getName(), (int) counter.getValue())); Review comment: mergeDeltaFilesStats filters for whether the partition
[jira] [Work started] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values
[ https://issues.apache.org/jira/browse/HIVE-25443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25443 started by Syed Shameerur Rahman. > Arrow SerDe Cannot serialize/deserialize complex data types When there are > more than 1024 values > > > Key: HIVE-25443 > URL: https://issues.apache.org/jira/browse/HIVE-25443 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Complex data types like MAP, STRUCT cannot be serialized/deserialzed using > Arrow SerDe when there are more than 1024 values. This happens due to > ColumnVector always being initialized with a size of 1024. > Issue #1 : > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213 > Issue #2 : > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215 > Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe : > {code:java} > @Test >public void testListBooleanWithMoreThan1024Values() throws SerDeException { > String[][] schema = { > {"boolean_list", "array"}, > }; > > Object[][] rows = new Object[1025][1]; > for (int i = 0; i < 1025; i++) { >rows[i][0] = new BooleanWritable(true); > } > > initAndSerializeAndDeserialize(schema, toList(rows)); >} > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values
[ https://issues.apache.org/jira/browse/HIVE-25443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397249#comment-17397249 ] Syed Shameerur Rahman commented on HIVE-25443: -- [~kgyrtkirk] [~pgaref] Could you please review the pull request ? Thanks > Arrow SerDe Cannot serialize/deserialize complex data types When there are > more than 1024 values > > > Key: HIVE-25443 > URL: https://issues.apache.org/jira/browse/HIVE-25443 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Complex data types like MAP, STRUCT cannot be serialized/deserialzed using > Arrow SerDe when there are more than 1024 values. This happens due to > ColumnVector always being initialized with a size of 1024. > Issue #1 : > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213 > Issue #2 : > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215 > Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe : > {code:java} > @Test >public void testListBooleanWithMoreThan1024Values() throws SerDeException { > String[][] schema = { > {"boolean_list", "array"}, > }; > > Object[][] rows = new Object[1025][1]; > for (int i = 0; i < 1025; i++) { >rows[i][0] = new BooleanWritable(true); > } > > initAndSerializeAndDeserialize(schema, toList(rows)); >} > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values
[ https://issues.apache.org/jira/browse/HIVE-25443?focusedWorklogId=636846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636846 ] ASF GitHub Bot logged work on HIVE-25443: - Author: ASF GitHub Bot Created on: 11/Aug/21 10:12 Start Date: 11/Aug/21 10:12 Worklog Time Spent: 10m Work Description: shameersss1 opened a new pull request #2581: URL: https://github.com/apache/hive/pull/2581 …pes When there are more than 1024 values ### What changes were proposed in this pull request? Instead of initializing the ColumnVector with default size which is 1024, Initialize it with the the size of record size required. ### Why are the changes needed? Changes are needed to allow Arrow SerDe to Serialize/deserialize complex data types When there are more than 1024 values ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test were added to confirm the behaviour -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636846) Remaining Estimate: 0h Time Spent: 10m > Arrow SerDe Cannot serialize/deserialize complex data types When there are > more than 1024 values > > > Key: HIVE-25443 > URL: https://issues.apache.org/jira/browse/HIVE-25443 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Complex data types like MAP, STRUCT cannot be serialized/deserialzed using > Arrow SerDe when there are more than 1024 values. This happens due to > ColumnVector always being initialized with a size of 1024. > Issue #1 : > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213 > Issue #2 : > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215 > Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe : > {code:java} > @Test >public void testListBooleanWithMoreThan1024Values() throws SerDeException { > String[][] schema = { > {"boolean_list", "array"}, > }; > > Object[][] rows = new Object[1025][1]; > for (int i = 0; i < 1025; i++) { >rows[i][0] = new BooleanWritable(true); > } > > initAndSerializeAndDeserialize(schema, toList(rows)); >} > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values
[ https://issues.apache.org/jira/browse/HIVE-25443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25443: -- Labels: pull-request-available (was: ) > Arrow SerDe Cannot serialize/deserialize complex data types When there are > more than 1024 values > > > Key: HIVE-25443 > URL: https://issues.apache.org/jira/browse/HIVE-25443 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Complex data types like MAP, STRUCT cannot be serialized/deserialzed using > Arrow SerDe when there are more than 1024 values. This happens due to > ColumnVector always being initialized with a size of 1024. > Issue #1 : > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213 > Issue #2 : > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215 > Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe : > {code:java} > @Test >public void testListBooleanWithMoreThan1024Values() throws SerDeException { > String[][] schema = { > {"boolean_list", "array"}, > }; > > Object[][] rows = new Object[1025][1]; > for (int i = 0; i < 1025; i++) { >rows[i][0] = new BooleanWritable(true); > } > > initAndSerializeAndDeserialize(schema, toList(rows)); >} > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25429) Delta metrics collection may cause number of tez counters to exceed tez.counters.max limit
[ https://issues.apache.org/jira/browse/HIVE-25429?focusedWorklogId=636845=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636845 ] ASF GitHub Bot logged work on HIVE-25429: - Author: ASF GitHub Bot Created on: 11/Aug/21 10:12 Start Date: 11/Aug/21 10:12 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #2563: URL: https://github.com/apache/hive/pull/2563#discussion_r686691938 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/metrics/DeltaFilesMetricReporter.java ## @@ -136,67 +133,77 @@ public static synchronized void init(HiveConf conf) throws Exception { private void configure(HiveConf conf) throws Exception { acidMetricsExtEnabled = MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.METASTORE_ACIDMETRICS_EXT_ON); +if (acidMetricsExtEnabled) { -deltasThreshold = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_DELTA_NUM_THRESHOLD); -obsoleteDeltasThreshold = HiveConf.getIntVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_OBSOLETE_DELTA_NUM_THRESHOLD); - -initCachesForMetrics(conf); -initObjectsForMetrics(); + initCachesForMetrics(conf); + initObjectsForMetrics(); -long reportingInterval = HiveConf.getTimeVar(conf, - HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, TimeUnit.SECONDS); + long reportingInterval = + HiveConf.getTimeVar(conf, HiveConf.ConfVars.HIVE_TXN_ACID_METRICS_REPORTING_INTERVAL, TimeUnit.SECONDS); -ThreadFactory threadFactory = - new ThreadFactoryBuilder() -.setDaemon(true) -.setNameFormat("DeltaFilesMetricReporter %d") -.build(); -executorService = Executors.newSingleThreadScheduledExecutor(threadFactory); -executorService.scheduleAtFixedRate( - new ReportingTask(), 0, reportingInterval, TimeUnit.SECONDS); + ThreadFactory threadFactory = + new ThreadFactoryBuilder().setDaemon(true).setNameFormat("DeltaFilesMetricReporter %d").build(); + executorService = Executors.newSingleThreadScheduledExecutor(threadFactory); + executorService.scheduleAtFixedRate(new ReportingTask(), 0, reportingInterval, TimeUnit.SECONDS); -LOG.info("Started DeltaFilesMetricReporter thread"); + LOG.info("Started DeltaFilesMetricReporter thread"); +} } public void submit(TezCounters counters) { if (acidMetricsExtEnabled) { - updateMetrics(NUM_OBSOLETE_DELTAS, -obsoleteDeltaCache, obsoleteDeltaTopN, obsoleteDeltasThreshold, -counters); - updateMetrics(NUM_DELTAS, -deltaCache, deltaTopN, deltasThreshold, -counters); - updateMetrics(NUM_SMALL_DELTAS, -smallDeltaCache, smallDeltaTopN, deltasThreshold, -counters); + updateMetrics(NUM_OBSOLETE_DELTAS, obsoleteDeltaCache, obsoleteDeltaTopN, counters); + updateMetrics(NUM_DELTAS, deltaCache, deltaTopN, counters); + updateMetrics(NUM_SMALL_DELTAS, smallDeltaCache, smallDeltaTopN, counters); } } + /** + * Copy counters to caches. + */ private void updateMetrics(DeltaFilesMetricType metric, Cache cache, Queue> topN, -int threshold, TezCounters counters) { -counters.getGroup(metric.value).forEach(counter -> { - Integer prev = cache.getIfPresent(counter.getName()); - if (prev != null && prev != counter.getValue()) { -cache.invalidate(counter.getName()); + TezCounters counters) { +try { + CounterGroup group = counters.getGroup(metric.value); + // if the group is empty, clear the cache + if (group.size() == 0) { +cache.invalidateAll(); + } else { +// if there is no counter corresponding to a cache entry, remove from cache +ConcurrentMap cacheMap = cache.asMap(); Review comment: As discussed offline, we will also collect input ReadEntities and update metrics based on those -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636845) Time Spent: 40m (was: 0.5h) > Delta metrics collection may cause number of tez counters to exceed > tez.counters.max limit > -- > > Key: HIVE-25429 > URL: https://issues.apache.org/jira/browse/HIVE-25429 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 4.0.0 >Reporter: Karen Coppage >Assignee: Karen Coppage >
[jira] [Work logged] (HIVE-25346) cleanTxnToWriteIdTable breaks SNAPSHOT isolation
[ https://issues.apache.org/jira/browse/HIVE-25346?focusedWorklogId=636842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636842 ] ASF GitHub Bot logged work on HIVE-25346: - Author: ASF GitHub Bot Created on: 11/Aug/21 10:08 Start Date: 11/Aug/21 10:08 Worklog Time Spent: 10m Work Description: deniskuzZ commented on pull request #2547: URL: https://github.com/apache/hive/pull/2547#issuecomment-896692730 > > > > Would be great if you could run the HMS benchmark and see if that has affected the commit step performance. See if the index on opType improves the situation. > > > > > > > > > There is no benchmark currently for the commitTxn() call. > > > > > > Could we create one or at least perform similar to https://issues.apache.org/jira/browse/HIVE-23104?focusedCommentId=17083005=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17083005 test? > > Based on the linked comment it is not clear to me what exactly was tested and how or what tools were used for it or what environment the tests were run in. I found no mention of the above details in the jira comments. You are not limited in a set of tools or approaches, free to choose any. This patch changes the way how INSERTs are handled at the commit step, so it would make sense to test the throughput of INSERT/UPDATE operations in multithreaded env. AFAIK in the above JIRA JMH tool was used and testing was done on a local env. We cannot merge this PR until we know the performance effect of the proposed change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636842) Time Spent: 4.5h (was: 4h 20m) > cleanTxnToWriteIdTable breaks SNAPSHOT isolation > > > Key: HIVE-25346 > URL: https://issues.apache.org/jira/browse/HIVE-25346 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25438) Update partition column stats fails with invalid syntax error for MySql
[ https://issues.apache.org/jira/browse/HIVE-25438?focusedWorklogId=636837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636837 ] ASF GitHub Bot logged work on HIVE-25438: - Author: ASF GitHub Bot Created on: 11/Aug/21 09:53 Start Date: 11/Aug/21 09:53 Worklog Time Spent: 10m Work Description: maheshk114 commented on a change in pull request #2573: URL: https://github.com/apache/hive/pull/2573#discussion_r686678482 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlUpdateStat.java ## @@ -647,6 +653,12 @@ public long getNextCSIdForMPartitionColumnStatistics(long numStats) throws MetaE jdoConn = pm.getDataStoreConnection(); dbConn = (Connection) (jdoConn.getNativeConnection()); + if (sqlGenerator.getDbProduct().isMYSQL()) { Review comment: that will be costlier than setting the value in DB. As we dont know how many sql statements are getting executed in side a connection. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636837) Time Spent: 50m (was: 40m) > Update partition column stats fails with invalid syntax error for MySql > --- > > Key: HIVE-25438 > URL: https://issues.apache.org/jira/browse/HIVE-25438 > Project: Hive > Issue Type: Sub-task >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > The quotes are not supported by mysql if ANSI_QUOTES is not set. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25438) Update partition column stats fails with invalid syntax error for MySql
[ https://issues.apache.org/jira/browse/HIVE-25438?focusedWorklogId=636835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636835 ] ASF GitHub Bot logged work on HIVE-25438: - Author: ASF GitHub Bot Created on: 11/Aug/21 09:52 Start Date: 11/Aug/21 09:52 Worklog Time Spent: 10m Work Description: maheshk114 commented on a change in pull request #2573: URL: https://github.com/apache/hive/pull/2573#discussion_r686677911 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlUpdateStat.java ## @@ -647,6 +653,12 @@ public long getNextCSIdForMPartitionColumnStatistics(long numStats) throws MetaE jdoConn = pm.getDataStoreConnection(); dbConn = (Connection) (jdoConn.getNativeConnection()); + if (sqlGenerator.getDbProduct().isMYSQL()) { +try (Statement stmt = dbConn.createStatement()) { + stmt.execute("SET @@session.sql_mode=ANSI_QUOTES"); +} + } + Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636835) Time Spent: 40m (was: 0.5h) > Update partition column stats fails with invalid syntax error for MySql > --- > > Key: HIVE-25438 > URL: https://issues.apache.org/jira/browse/HIVE-25438 > Project: Hive > Issue Type: Sub-task >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > The quotes are not supported by mysql if ANSI_QUOTES is not set. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25375) Partition column rename support for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ádám Szita resolved HIVE-25375. --- Fix Version/s: 4.0.0 Resolution: Fixed Committed to master, thanks for reviewing it [~kuczoram]! > Partition column rename support for Iceberg tables > -- > > Key: HIVE-25375 > URL: https://issues.apache.org/jira/browse/HIVE-25375 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently ALTER TABLE CHANGE COLUMN statement only updates the Iceberg-backed > table's schema, but not it's partition spec. Updating the spec is required to > allow subsequent partition spec changes (like set partition spec calls) to > succeed. > Note: to do this in HiveIcebergMetaHook class we can't just create an > updateSchema and an updatePartitionSpec object, do the modifications and > commit both of them, as the last commit will fail due to invalid base. We > might want to do this in two separate steps instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25375) Partition column rename support for Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25375?focusedWorklogId=636832=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636832 ] ASF GitHub Bot logged work on HIVE-25375: - Author: ASF GitHub Bot Created on: 11/Aug/21 09:42 Start Date: 11/Aug/21 09:42 Worklog Time Spent: 10m Work Description: szlta merged pull request #2577: URL: https://github.com/apache/hive/pull/2577 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636832) Time Spent: 20m (was: 10m) > Partition column rename support for Iceberg tables > -- > > Key: HIVE-25375 > URL: https://issues.apache.org/jira/browse/HIVE-25375 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Currently ALTER TABLE CHANGE COLUMN statement only updates the Iceberg-backed > table's schema, but not it's partition spec. Updating the spec is required to > allow subsequent partition spec changes (like set partition spec calls) to > succeed. > Note: to do this in HiveIcebergMetaHook class we can't just create an > updateSchema and an updatePartitionSpec object, do the modifications and > commit both of them, as the last commit will fail due to invalid base. We > might want to do this in two separate steps instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25443) Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values
[ https://issues.apache.org/jira/browse/HIVE-25443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman reassigned HIVE-25443: > Arrow SerDe Cannot serialize/deserialize complex data types When there are > more than 1024 values > > > Key: HIVE-25443 > URL: https://issues.apache.org/jira/browse/HIVE-25443 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 3.1.2, 3.1.1, 3.0.0, 3.1.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Fix For: 4.0.0 > > > Complex data types like MAP, STRUCT cannot be serialized/deserialzed using > Arrow SerDe when there are more than 1024 values. This happens due to > ColumnVector always being initialized with a size of 1024. > Issue #1 : > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213 > Issue #2 : > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215 > Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe : > {code:java} > @Test >public void testListBooleanWithMoreThan1024Values() throws SerDeException { > String[][] schema = { > {"boolean_list", "array"}, > }; > > Object[][] rows = new Object[1025][1]; > for (int i = 0; i < 1025; i++) { >rows[i][0] = new BooleanWritable(true); > } > > initAndSerializeAndDeserialize(schema, toList(rows)); >} > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=636829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636829 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 11/Aug/21 09:21 Start Date: 11/Aug/21 09:21 Worklog Time Spent: 10m Work Description: maheshk114 commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r686623069 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -129,21 +131,16 @@ private boolean fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego private void addElement(ListColumnVector lcv, List elements, PrimitiveObjectInspector.PrimitiveCategory category, int index) throws IOException { lcv.offsets[index] = elements.size(); -// Return directly if last value is null -if (definitionLevel < maxDefLevel) { - lcv.isNull[index] = true; - lcv.lengths[index] = 0; - // fetch the data from parquet data page for next call - fetchNextValue(category); - return; -} - do { // add all data for an element in ListColumnVector, get out the loop if there is no data or the data is for new element + if (definitionLevel < maxDefLevel) { +lcv.lengths[index] = 0; +lcv.isNull[index] = true; +lcv.noNulls = false; + } elements.add(lastValue); } while (fetchNextValue(category) && (repetitionLevel != 0)); -lcv.isNull[index] = false; lcv.lengths[index] = elements.size() - lcv.offsets[index]; Review comment: lcv.lengths[index] is over written ..in the loop for some condition its set to 0 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -129,21 +131,16 @@ private boolean fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego private void addElement(ListColumnVector lcv, List elements, PrimitiveObjectInspector.PrimitiveCategory category, int index) throws IOException { lcv.offsets[index] = elements.size(); -// Return directly if last value is null -if (definitionLevel < maxDefLevel) { - lcv.isNull[index] = true; - lcv.lengths[index] = 0; - // fetch the data from parquet data page for next call - fetchNextValue(category); - return; -} - do { // add all data for an element in ListColumnVector, get out the loop if there is no data or the data is for new element + if (definitionLevel < maxDefLevel) { +lcv.lengths[index] = 0; +lcv.isNull[index] = true; Review comment: why this has to be done in a loop ? ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -193,59 +190,59 @@ private List decodeDictionaryIds(PrimitiveObjectInspector.PrimitiveCategory cate case SHORT: resultList = new ArrayList(total); for (int i = 0; i < total; ++i) { -resultList.add(dictionary.readInteger(intList.get(i))); +resultList.add(intList.get(i) == null ? null : dictionary.readInteger(intList.get(i))); } break; case DATE: case INTERVAL_YEAR_MONTH: case LONG: resultList = new ArrayList(total); for (int i = 0; i < total; ++i) { -resultList.add(dictionary.readLong(intList.get(i))); +resultList.add(intList.get(i) == null ? null : dictionary.readLong(intList.get(i))); } break; case BOOLEAN: resultList = new ArrayList(total); for (int i = 0; i < total; ++i) { -resultList.add(dictionary.readBoolean(intList.get(i)) ? 1 : 0); +resultList.add(intList.get(i) == null ? null : dictionary.readBoolean(intList.get(i))); Review comment: instead of 0 or 1 ..value returned by readBoolean is used ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -129,21 +131,16 @@ private boolean fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego private void addElement(ListColumnVector lcv, List elements, PrimitiveObjectInspector.PrimitiveCategory category, int index) throws IOException { lcv.offsets[index] = elements.size(); -// Return directly if last value is null -if (definitionLevel < maxDefLevel) { - lcv.isNull[index] = true; - lcv.lengths[index] = 0; - // fetch the data from parquet data page for next call - fetchNextValue(category); - return; -} - do { // add all data for an element in ListColumnVector, get out the loop if there is no data or the data is for new element + if (definitionLevel < maxDefLevel) { Review comment: in fetchNextvalue ..if (definitionLevel != maxDefLevel) {
[jira] [Updated] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-25334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-25334: Component/s: (was: HiveServer2) UDF > Refactor UDF CAST( as TIMESTAMP) > - > > Key: HIVE-25334 > URL: https://issues.apache.org/jira/browse/HIVE-25334 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: UDF, pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Description > Refactor GenericUDFTimestamp.class > DOD > Refactor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-25334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-25334: Labels: UDF pull-request-available (was: pull-request-available) > Refactor UDF CAST( as TIMESTAMP) > - > > Key: HIVE-25334 > URL: https://issues.apache.org/jira/browse/HIVE-25334 > Project: Hive > Issue Type: Sub-task >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: UDF, pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Description > Refactor GenericUDFTimestamp.class > DOD > Refactor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-25334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan resolved HIVE-25334. - Fix Version/s: 4.0.0 Resolution: Fixed PR merged to master. Thanks [~ashish-kumar-sharma] for the contribution and [~adeshrao] for the review! > Refactor UDF CAST( as TIMESTAMP) > - > > Key: HIVE-25334 > URL: https://issues.apache.org/jira/browse/HIVE-25334 > Project: Hive > Issue Type: Sub-task >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Description > Refactor GenericUDFTimestamp.class > DOD > Refactor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-25334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-25334: Component/s: HiveServer2 > Refactor UDF CAST( as TIMESTAMP) > - > > Key: HIVE-25334 > URL: https://issues.apache.org/jira/browse/HIVE-25334 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: UDF, pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Description > Refactor GenericUDFTimestamp.class > DOD > Refactor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-25334?focusedWorklogId=636798=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636798 ] ASF GitHub Bot logged work on HIVE-25334: - Author: ASF GitHub Bot Created on: 11/Aug/21 08:02 Start Date: 11/Aug/21 08:02 Worklog Time Spent: 10m Work Description: sankarh merged pull request #2482: URL: https://github.com/apache/hive/pull/2482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636798) Time Spent: 2h (was: 1h 50m) > Refactor UDF CAST( as TIMESTAMP) > - > > Key: HIVE-25334 > URL: https://issues.apache.org/jira/browse/HIVE-25334 > Project: Hive > Issue Type: Sub-task >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > Description > Refactor GenericUDFTimestamp.class > DOD > Refactor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-25334?focusedWorklogId=636795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636795 ] ASF GitHub Bot logged work on HIVE-25334: - Author: ASF GitHub Bot Created on: 11/Aug/21 07:59 Start Date: 11/Aug/21 07:59 Worklog Time Spent: 10m Work Description: sankarh commented on a change in pull request #2482: URL: https://github.com/apache/hive/pull/2482#discussion_r686592413 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTimestamp.java ## @@ -63,61 +71,44 @@ * otherwise, it's interpreted as timestamp in seconds. */ private boolean intToTimestampInSeconds = false; + private boolean strict = true; @Override public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException { -if (arguments.length < 1) { - throw new UDFArgumentLengthException( - "The function TIMESTAMP requires at least one argument, got " - + arguments.length); -} - -SessionState ss = SessionState.get(); -if (ss != null) { - intToTimestampInSeconds = ss.getConf().getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS); -} +checkArgsSize(arguments, 1, 1); +checkArgPrimitive(arguments, 0); +checkArgGroups(arguments, 0, tsInputTypes, STRING_GROUP, DATE_GROUP, NUMERIC_GROUP, VOID_GROUP, BOOLEAN_GROUP); -try { - argumentOI = (PrimitiveObjectInspector) arguments[0]; -} catch (ClassCastException e) { - throw new UDFArgumentException( - "The function TIMESTAMP takes only primitive types"); -} +strict = SessionState.get() != null ? SessionState.get().getConf() +.getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION) : new HiveConf() +.getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION); +intToTimestampInSeconds = SessionState.get() != null ? SessionState.get().getConf() +.getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS) : new HiveConf() +.getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS); -if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) { - PrimitiveCategory category = argumentOI.getPrimitiveCategory(); - PrimitiveGrouping group = PrimitiveObjectInspectorUtils.getPrimitiveGrouping(category); - if (group == PrimitiveGrouping.NUMERIC_GROUP) { +if (strict) { + if (PrimitiveObjectInspectorUtils.getPrimitiveGrouping(tsInputTypes[0]) == PrimitiveGrouping.NUMERIC_GROUP) { Review comment: oh yes... my bad... It is disallowed only with strict=true. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636795) Time Spent: 1h 50m (was: 1h 40m) > Refactor UDF CAST( as TIMESTAMP) > - > > Key: HIVE-25334 > URL: https://issues.apache.org/jira/browse/HIVE-25334 > Project: Hive > Issue Type: Sub-task >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Description > Refactor GenericUDFTimestamp.class > DOD > Refactor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-25334?focusedWorklogId=636792=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636792 ] ASF GitHub Bot logged work on HIVE-25334: - Author: ASF GitHub Bot Created on: 11/Aug/21 07:56 Start Date: 11/Aug/21 07:56 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #2482: URL: https://github.com/apache/hive/pull/2482#discussion_r686586962 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTimestamp.java ## @@ -63,61 +71,44 @@ * otherwise, it's interpreted as timestamp in seconds. */ private boolean intToTimestampInSeconds = false; + private boolean strict = true; @Override public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException { -if (arguments.length < 1) { - throw new UDFArgumentLengthException( - "The function TIMESTAMP requires at least one argument, got " - + arguments.length); -} - -SessionState ss = SessionState.get(); -if (ss != null) { - intToTimestampInSeconds = ss.getConf().getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS); -} +checkArgsSize(arguments, 1, 1); +checkArgPrimitive(arguments, 0); +checkArgGroups(arguments, 0, tsInputTypes, STRING_GROUP, DATE_GROUP, NUMERIC_GROUP, VOID_GROUP, BOOLEAN_GROUP); -try { - argumentOI = (PrimitiveObjectInspector) arguments[0]; -} catch (ClassCastException e) { - throw new UDFArgumentException( - "The function TIMESTAMP takes only primitive types"); -} +strict = SessionState.get() != null ? SessionState.get().getConf() +.getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION) : new HiveConf() +.getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION); +intToTimestampInSeconds = SessionState.get() != null ? SessionState.get().getConf() +.getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS) : new HiveConf() +.getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS); -if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) { - PrimitiveCategory category = argumentOI.getPrimitiveCategory(); - PrimitiveGrouping group = PrimitiveObjectInspectorUtils.getPrimitiveGrouping(category); - if (group == PrimitiveGrouping.NUMERIC_GROUP) { +if (strict) { + if (PrimitiveObjectInspectorUtils.getPrimitiveGrouping(tsInputTypes[0]) == PrimitiveGrouping.NUMERIC_GROUP) { Review comment: We do support timestamp to numeric conversion. checkout https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java#L1177. due to which we need to allow NUMERIC in checkArgGroups(). Also this behaviour is controlled by https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1827. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636792) Time Spent: 1h 40m (was: 1.5h) > Refactor UDF CAST( as TIMESTAMP) > - > > Key: HIVE-25334 > URL: https://issues.apache.org/jira/browse/HIVE-25334 > Project: Hive > Issue Type: Sub-task >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Description > Refactor GenericUDFTimestamp.class > DOD > Refactor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-25334?focusedWorklogId=636791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636791 ] ASF GitHub Bot logged work on HIVE-25334: - Author: ASF GitHub Bot Created on: 11/Aug/21 07:52 Start Date: 11/Aug/21 07:52 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #2482: URL: https://github.com/apache/hive/pull/2482#discussion_r686586962 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTimestamp.java ## @@ -63,61 +71,44 @@ * otherwise, it's interpreted as timestamp in seconds. */ private boolean intToTimestampInSeconds = false; + private boolean strict = true; @Override public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException { -if (arguments.length < 1) { - throw new UDFArgumentLengthException( - "The function TIMESTAMP requires at least one argument, got " - + arguments.length); -} - -SessionState ss = SessionState.get(); -if (ss != null) { - intToTimestampInSeconds = ss.getConf().getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS); -} +checkArgsSize(arguments, 1, 1); +checkArgPrimitive(arguments, 0); +checkArgGroups(arguments, 0, tsInputTypes, STRING_GROUP, DATE_GROUP, NUMERIC_GROUP, VOID_GROUP, BOOLEAN_GROUP); -try { - argumentOI = (PrimitiveObjectInspector) arguments[0]; -} catch (ClassCastException e) { - throw new UDFArgumentException( - "The function TIMESTAMP takes only primitive types"); -} +strict = SessionState.get() != null ? SessionState.get().getConf() +.getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION) : new HiveConf() +.getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION); +intToTimestampInSeconds = SessionState.get() != null ? SessionState.get().getConf() +.getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS) : new HiveConf() +.getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS); -if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) { - PrimitiveCategory category = argumentOI.getPrimitiveCategory(); - PrimitiveGrouping group = PrimitiveObjectInspectorUtils.getPrimitiveGrouping(category); - if (group == PrimitiveGrouping.NUMERIC_GROUP) { +if (strict) { + if (PrimitiveObjectInspectorUtils.getPrimitiveGrouping(tsInputTypes[0]) == PrimitiveGrouping.NUMERIC_GROUP) { Review comment: We do support timestamp to numeric conversion. checkout https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java#L1177. due to which we need to allow NUMERIC in checkArgGroups(). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636791) Time Spent: 1.5h (was: 1h 20m) > Refactor UDF CAST( as TIMESTAMP) > - > > Key: HIVE-25334 > URL: https://issues.apache.org/jira/browse/HIVE-25334 > Project: Hive > Issue Type: Sub-task >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Description > Refactor GenericUDFTimestamp.class > DOD > Refactor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25334) Refactor UDF CAST( as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-25334?focusedWorklogId=636790=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636790 ] ASF GitHub Bot logged work on HIVE-25334: - Author: ASF GitHub Bot Created on: 11/Aug/21 07:50 Start Date: 11/Aug/21 07:50 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #2482: URL: https://github.com/apache/hive/pull/2482#discussion_r686585429 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java ## @@ -434,10 +434,19 @@ protected void obtainTimestampConverter(ObjectInspector[] arguments, int i, case TIMESTAMP: case DATE: case TIMESTAMPLOCALTZ: +case INT: Review comment: There data are already implemented in https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java#L1177. I am just adding validation check to avoid runtime failures. So no test is required in this case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636790) Time Spent: 1h 20m (was: 1h 10m) > Refactor UDF CAST( as TIMESTAMP) > - > > Key: HIVE-25334 > URL: https://issues.apache.org/jira/browse/HIVE-25334 > Project: Hive > Issue Type: Sub-task >Reporter: Ashish Sharma >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Description > Refactor GenericUDFTimestamp.class > DOD > Refactor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25441) Incorrect deltas split for sub-compactions when using `hive.compactor.max.num.delta`
[ https://issues.apache.org/jira/browse/HIVE-25441?focusedWorklogId=636757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636757 ] ASF GitHub Bot logged work on HIVE-25441: - Author: ASF GitHub Bot Created on: 11/Aug/21 06:45 Start Date: 11/Aug/21 06:45 Worklog Time Spent: 10m Work Description: deniskuzZ commented on pull request #2579: URL: https://github.com/apache/hive/pull/2579#issuecomment-896546211 recheck -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636757) Time Spent: 20m (was: 10m) > Incorrect deltas split for sub-compactions when using > `hive.compactor.max.num.delta` > > > Key: HIVE-25441 > URL: https://issues.apache.org/jira/browse/HIVE-25441 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > {code} > #Repro steps: > #1./ set hive.compactor.max.num.delta to 5 on HMS > #2./ Set up the table > set hive.merge.cardinality.check=false; > create table test (k int); > ALTER TABLE test SET TBLPROPERTIES ('NO_AUTO_COMPACTION'='true'); > insert into test values (1); > alter table test compact 'major' and wait; > dfs -ls '/warehouse/tablespace/managed/hive/test'; > # drwxrwx---+ - hive hive 0 2021-08-09 12:26 > /warehouse/tablespace/managed/hive/test/base_008_v416 > select * from test; > # k=1 > #run 3 times so there's enough delta dirs, ie. 6 (should just increase k by 1) > #basically just removes the row and adds a new row with k+1 value > MERGE INTO test AS T USING (select * from test union all select k+1 from > test) AS S > ON T.k=s.k > WHEN MATCHED THEN DELETE > WHEN not MATCHED THEN INSERT values (s.k); > select * from test; > #k=4 > dfs -ls '/warehouse/tablespace/managed/hive/test'; > #drwxrwx---+ - hive hive 0 2021-08-09 12:26 > /warehouse/tablespace/managed/hive/test/base_008_v416 > #drwxrwx---+ - hive hive 0 2021-08-09 12:28 > /warehouse/tablespace/managed/hive/test/delete_delta_009_009_0001 > #drwxrwx---+ - hive hive 0 2021-08-09 12:29 > /warehouse/tablespace/managed/hive/test/delete_delta_010_010_0001 > #drwxrwx---+ - hive hive 0 2021-08-09 12:29 > /warehouse/tablespace/managed/hive/test/delete_delta_011_011_0001 > #drwxrwx---+ - hive hive 0 2021-08-09 12:28 > /warehouse/tablespace/managed/hive/test/delta_009_009_0003 > #drwxrwx---+ - hive hive 0 2021-08-09 12:29 > /warehouse/tablespace/managed/hive/test/delta_010_010_0003 > #drwxrwx---+ - hive hive 0 2021-08-09 12:29 > /warehouse/tablespace/managed/hive/test/delta_011_011_0003 > alter table test compact 'major' and wait; > select * from test; > #result is empty > dfs -ls '/warehouse/tablespace/managed/hive/test'; > #2drwxrwx---+ - hive hive 0 2021-08-09 12:31 > /warehouse/tablespace/managed/hive/test/base_011_v428 > {code} > Some logs from the above example: > {code} > 2021-08-09 12:30:37,532 WARN > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR: > [nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49_executor]: 6 delta files > found for default.test located at > hdfs://nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site:8020/warehouse/tablespace/managed/hive/test! > This is likely a sign of misconfiguration, especially if this message > repeats. Check that compaction is running properly. Check for any > runaway/mis-configured process writing to ACID tables, especially using > Streaming Ingest API. > 2021-08-09 12:30:37,533 INFO > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR: > [nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49_executor]: Submitting > MINOR compaction job > 'nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49-compactor-default.test_0' > to default queue. (current delta dirs count=5, obsolete delta dirs count=-1. > TxnIdRange[9,11] > 2021-08-09 12:30:38,003 INFO > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR: > [nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49_executor]: Submitted > compaction job > 'nightly-7x-us-2-2.nightly-7x-us-2.root.hwx.site-49-compactor-default.test_0' > with jobID=job_1628497133224_0051 compaction ID=23 > #From app logs of the minor compaction, note that delta_011_011_0001 > is missing from the list > 2021-08-09
[jira] [Resolved] (HIVE-25409) group by the same column result sum result error
[ https://issues.apache.org/jira/browse/HIVE-25409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua resolved HIVE-25409. Resolution: Fixed > group by the same column result sum result error > > > Key: HIVE-25409 > URL: https://issues.apache.org/jira/browse/HIVE-25409 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1, 1.2.2 > Environment: hadoop 2.8.5 > hive 1.2.1 or hive 1.2.2 >Reporter: liuguanghua >Priority: Major > Fix For: 2.1.1 > > > create table test > ( > a string, > b string > ); > insert into table test values ('a','1'),('b','2'),('c','3'); > select a,a,sum(b) from test group by a,a; > > this will get the wrong answer. We expect the sum(b),but is will compute the > sum(a). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25409) group by the same column result sum result error
[ https://issues.apache.org/jira/browse/HIVE-25409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397090#comment-17397090 ] liuguanghua commented on HIVE-25409: I have alreay verify the problem with HIVE-14715, and it worked. Thank you very much. > group by the same column result sum result error > > > Key: HIVE-25409 > URL: https://issues.apache.org/jira/browse/HIVE-25409 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1, 1.2.2 > Environment: hadoop 2.8.5 > hive 1.2.1 or hive 1.2.2 >Reporter: liuguanghua >Priority: Major > Fix For: 2.1.1 > > > create table test > ( > a string, > b string > ); > insert into table test values ('a','1'),('b','2'),('c','3'); > select a,a,sum(b) from test group by a,a; > > this will get the wrong answer. We expect the sum(b),but is will compute the > sum(a). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25409) group by the same column result sum result error
[ https://issues.apache.org/jira/browse/HIVE-25409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuguanghua updated HIVE-25409: --- Fix Version/s: 2.1.1 > group by the same column result sum result error > > > Key: HIVE-25409 > URL: https://issues.apache.org/jira/browse/HIVE-25409 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1, 1.2.2 > Environment: hadoop 2.8.5 > hive 1.2.1 or hive 1.2.2 >Reporter: liuguanghua >Priority: Major > Fix For: 2.1.1 > > > create table test > ( > a string, > b string > ); > insert into table test values ('a','1'),('b','2'),('c','3'); > select a,a,sum(b) from test group by a,a; > > this will get the wrong answer. We expect the sum(b),but is will compute the > sum(a). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25403) Fix from_unixtime() to consider leap seconds
[ https://issues.apache.org/jira/browse/HIVE-25403?focusedWorklogId=636749=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636749 ] ASF GitHub Bot logged work on HIVE-25403: - Author: ASF GitHub Bot Created on: 11/Aug/21 06:10 Start Date: 11/Aug/21 06:10 Worklog Time Spent: 10m Work Description: warriersruthi commented on pull request #2550: URL: https://github.com/apache/hive/pull/2550#issuecomment-896530150 Thank you @sankarh, @ashish-kumar-sharma & @adesh-rao for your detailed reviews and assistance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636749) Time Spent: 3h 50m (was: 3h 40m) > Fix from_unixtime() to consider leap seconds > - > > Key: HIVE-25403 > URL: https://issues.apache.org/jira/browse/HIVE-25403 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 3.1.0, 3.1.1, 3.1.2 >Reporter: Sruthi Mooriyathvariam >Assignee: Sruthi Mooriyathvariam >Priority: Major > Labels: UDF, pull-request-available > Fix For: 4.0.0 > > Attachments: image-2021-07-29-14-42-49-806.png > > Time Spent: 3h 50m > Remaining Estimate: 0h > > The Unix_timestamp() considers "leap second" while the from_unixtime is not; > which results in to wrong result as below: > !image-2021-07-29-14-42-49-806.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25298) LAG function get java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveDecimal cannot be cast to org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
[ https://issues.apache.org/jira/browse/HIVE-25298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan resolved HIVE-25298. - Resolution: Not A Bug Already fixed by https://issues.apache.org/jira/browse/HIVE-21104. > LAG function get java.lang.ClassCastException: > org.apache.hadoop.hive.common.type.HiveDecimal cannot be cast to > org.apache.hadoop.hive.serde2.io.HiveDecimalWritable > > > Key: HIVE-25298 > URL: https://issues.apache.org/jira/browse/HIVE-25298 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.2 > Environment: Hive 3.1 >Reporter: wenjun ma >Priority: Major > > When we try to apply the LAG function with aggregation function (MAX), we got > ava.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveDecimal > cannot be cast to org.apache.hadoop.hive.serde2.io.HiveDecimalWritable. > reproduce steps: > # create table: > create table tbl1 (ACCT_NM string, ACCT_BAL decimal(15,2)) partitioned by > (DL_DATA_DT string) > # insert sample data:insert into tbl1 values ('acct1', 1000.00, > '2020-01-01'); > insert into tbl1 values ('acct1', 800.00, '2020-01-02'); > ## Run folowing SQL: > {code:java} > select > test.ACCT_NM, > test.DL_DATA_DT, > test.MAX_ACCT_BAL, > LAG(test.MAX_ACCT_BAL,1,0) OVER (PARTITION BY test.ACCT_NM ORDER BY > test.DL_DATA_DT) AS PREV_USED_AMT > from ( > select > tbl1.ACCT_NM as ACCT_NM, > tbl1.DL_DATA_DT as DL_DATA_DT, > max(tbl1.ACCT_BAL) as MAX_ACCT_BAL > from tbl1 > group by tbl1.ACCT_NM, tbl1.DL_DATA_DT > ) test; > {code} > Full Stack: > ERROR : FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer > 2, vertexId=vertex_1624984332939_0003_5_01, diagnostics=[Task failed, > taskId=task_1624984332939_0003_5_01_21, diagnostics=[TaskAttempt 0 > failed, info=[Error: Error while running task ( failure ) : > attempt_1624984332939_0003_5_01_21_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:304) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > ... 15 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:378) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:294) > ... 17 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveDecimal > cannot be cast to org.apache.hadoop.hive.serde2.io.HiveDecimalWritable > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:795) > at >
[jira] [Updated] (HIVE-25298) LAG function get java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveDecimal cannot be cast to org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
[ https://issues.apache.org/jira/browse/HIVE-25298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-25298: Fix Version/s: 4.0.0 > LAG function get java.lang.ClassCastException: > org.apache.hadoop.hive.common.type.HiveDecimal cannot be cast to > org.apache.hadoop.hive.serde2.io.HiveDecimalWritable > > > Key: HIVE-25298 > URL: https://issues.apache.org/jira/browse/HIVE-25298 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.2 > Environment: Hive 3.1 >Reporter: wenjun ma >Priority: Major > Fix For: 4.0.0 > > > When we try to apply the LAG function with aggregation function (MAX), we got > ava.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveDecimal > cannot be cast to org.apache.hadoop.hive.serde2.io.HiveDecimalWritable. > reproduce steps: > # create table: > create table tbl1 (ACCT_NM string, ACCT_BAL decimal(15,2)) partitioned by > (DL_DATA_DT string) > # insert sample data:insert into tbl1 values ('acct1', 1000.00, > '2020-01-01'); > insert into tbl1 values ('acct1', 800.00, '2020-01-02'); > ## Run folowing SQL: > {code:java} > select > test.ACCT_NM, > test.DL_DATA_DT, > test.MAX_ACCT_BAL, > LAG(test.MAX_ACCT_BAL,1,0) OVER (PARTITION BY test.ACCT_NM ORDER BY > test.DL_DATA_DT) AS PREV_USED_AMT > from ( > select > tbl1.ACCT_NM as ACCT_NM, > tbl1.DL_DATA_DT as DL_DATA_DT, > max(tbl1.ACCT_BAL) as MAX_ACCT_BAL > from tbl1 > group by tbl1.ACCT_NM, tbl1.DL_DATA_DT > ) test; > {code} > Full Stack: > ERROR : FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer > 2, vertexId=vertex_1624984332939_0003_5_01, diagnostics=[Task failed, > taskId=task_1624984332939_0003_5_01_21, diagnostics=[TaskAttempt 0 > failed, info=[Error: Error while running task ( failure ) : > attempt_1624984332939_0003_5_01_21_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:304) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > ... 15 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:378) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:294) > ... 17 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveDecimal > cannot be cast to org.apache.hadoop.hive.serde2.io.HiveDecimalWritable > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:795) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:363) > ... 18 more
[jira] [Updated] (HIVE-25403) Fix from_unixtime() to consider leap seconds
[ https://issues.apache.org/jira/browse/HIVE-25403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-25403: Affects Version/s: 3.1.0 3.1.1 > Fix from_unixtime() to consider leap seconds > - > > Key: HIVE-25403 > URL: https://issues.apache.org/jira/browse/HIVE-25403 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 3.1.0, 3.1.1, 3.1.2 >Reporter: Sruthi Mooriyathvariam >Assignee: Sruthi Mooriyathvariam >Priority: Major > Labels: UDF, pull-request-available > Fix For: 4.0.0 > > Attachments: image-2021-07-29-14-42-49-806.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > The Unix_timestamp() considers "leap second" while the from_unixtime is not; > which results in to wrong result as below: > !image-2021-07-29-14-42-49-806.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)