[jira] [Updated] (HIVE-24434) Filter out materialized views for rewriting if plan pattern is not allowed
[ https://issues.apache.org/jira/browse/HIVE-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-24434: -- Component/s: Materialized views > Filter out materialized views for rewriting if plan pattern is not allowed > -- > > Key: HIVE-24434 > URL: https://issues.apache.org/jira/browse/HIVE-24434 > Project: Hive > Issue Type: Improvement > Components: Materialized views >Affects Versions: 4.0.0 >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > > Some materialized views are not enabled for Calcite based rewriting. Rules > for validating materialized views are implemented by HIVE-20748. > Since text based materialized view query rewrite doesn't have such > limitations some logic must be implemented to flag materialized view whether > they are enabled to text based rewrite only or both. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24434) Filter out materialized views for rewriting if plan pattern is not allowed
[ https://issues.apache.org/jira/browse/HIVE-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa reassigned HIVE-24434: - > Filter out materialized views for rewriting if plan pattern is not allowed > -- > > Key: HIVE-24434 > URL: https://issues.apache.org/jira/browse/HIVE-24434 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > > Some materialized views are not enabled for Calcite based rewriting. Rules > for validating materialized views are implemented by HIVE-20748. > Since text based materialized view query rewrite doesn't have such > limitations some logic must be implemented to flag materialized view whether > they are enabled to text based rewrite only or both. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24397) Add the projection specification to the table request object and add placeholders in ObjectStore.java
[ https://issues.apache.org/jira/browse/HIVE-24397?focusedWorklogId=516926&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516926 ] ASF GitHub Bot logged work on HIVE-24397: - Author: ASF GitHub Bot Created on: 26/Nov/20 06:45 Start Date: 26/Nov/20 06:45 Worklog Time Spent: 10m Work Description: vnhive commented on a change in pull request #1681: URL: https://github.com/apache/hive/pull/1681#discussion_r530803306 ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java ## @@ -2360,6 +2360,20 @@ public Table getTable(String catName, String dbName, String tableName, String va return deepCopyTables(FilterUtils.filterTablesIfEnabled(isClientFilterEnabled, filterHook, tabs)); } + @Override + public List getTableObjectsByRequest(GetTablesRequest req) throws TException { Review comment: You are referring to SessionHiveMetaStoreClient right ? SessionHiveMetaStoreClient does not have an implementation for get_partitions_with_specs and piggybacks on the implementation in its superclass (HiveMetaStoreClient) from the inheritance hierarchy. I just followed the same pattern here. Also it just returns a list of table objects, basically a read query and should work the same across sessions, since, it just returns persisted session information. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516926) Time Spent: 20m (was: 10m) > Add the projection specification to the table request object and add > placeholders in ObjectStore.java > - > > Key: HIVE-24397 > URL: https://issues.apache.org/jira/browse/HIVE-24397 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Narayanan Venkateswaran >Assignee: Narayanan Venkateswaran >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24144) getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value
[ https://issues.apache.org/jira/browse/HIVE-24144?focusedWorklogId=516902&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516902 ] ASF GitHub Bot logged work on HIVE-24144: - Author: ASF GitHub Bot Created on: 26/Nov/20 02:40 Start Date: 26/Nov/20 02:40 Worklog Time Spent: 10m Work Description: jcamachor opened a new pull request #1487: URL: https://github.com/apache/hive/pull/1487 …incorrect value ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516902) Time Spent: 50m (was: 40m) > getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value > > > Key: HIVE-24144 > URL: https://issues.apache.org/jira/browse/HIVE-24144 > Project: Hive > Issue Type: Bug > Components: JDBC, JDBC storage handler >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > {code} > public String getIdentifierQuoteString() throws SQLException { > return " "; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values
[ https://issues.apache.org/jira/browse/HIVE-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-24433: -- Description: PartionKeyValue is getting converted into lowerCase in below 2 places. [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851] Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries from proper partition values. When query completes, the entry moves from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the partition & considers it as invalid partition {code:java} create table abc(name string) partitioned by(city string) stored as orc tblproperties('transactional'='true'); insert into abc partition(city='Bangalore') values('aaa'); {code} Example entry in COMPLETED_TXN_COMPONENTS {noformat} +---+--++---+-+-+---+ | CTC_TXNID | CTC_DATABASE | CTC_TABLE | CTC_PARTITION | CTC_TIMESTAMP | CTC_WRITEID | CTC_UPDATE_DELETE | +---+--++---+-+-+---+ | 2 | default | abc | city=bangalore | 2020-11-25 09:26:59 | 1 | N | +---+--++---+-+-+---+ {noformat} AutoCompaction fails to get triggered with below error {code:java} 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator (Initiator.java:run(98)) - Checking to see if we should compact default.abc.city=bangalore 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator (Initiator.java:run(155)) - Can't find partition default.compaction_test.city=bhubaneshwar, assuming it has been dropped and moving on{code} I verifed below 4 SQL's with my PR, those all produced correct PartitionKeyValue i.e, COMPLETED_TXN_COMPONENTS.CTC_PARTITION="city=Bangalore" {code:java} insert into table abc PARTITION(CitY='Bangalore') values('Dan'); insert overwrite table abc partition(CiTy='Bangalore') select Name from abc; update table abc set Name='xy' where CiTy='Bangalore'; delete from abc where CiTy='Bangalore';{code} was: PartionKeyValue is getting converted into lowerCase in below 2 places. [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851] Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries from proper partition values. When query completes, the entry moves from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the partition & considers it as invalid partition {code:java} create table abc(name string) partitioned by(city string) stored as orc tblproperties('transactional'='true'); insert into abc partition(city='Bangalore') values('aaa'); {code} Example entry in COMPLETED_TXN_COMPONENTS {noformat} +---+--++---+-+-+---+ | CTC_TXNID | CTC_DATABASE | CTC_TABLE | CTC_PARTITION | CTC_TIMESTAMP | CTC_WRITEID | CTC_UPDATE_DELETE | +---+--++---+-+-+---+ | 2 | default | abc | city=bangalore | 2020-11-25 09:26:59 | 1 | N | +---+--++---+-+-+---+ {noformat} AutoCompaction fails to get triggered with below error {code:java} 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator (Initiator.java:run(98)) - Checking to see if we should compact default.abc.city=bangalore 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator (Initiator.java:run(155)) - Can't find partition default.compaction_test.city=bhubaneshwar, assuming it has been dropped and moving on{code} > AutoCompaction is not getting triggered for CamelCase Partition Values > -- > > Key: HIVE-24433 > URL: https://issues.apache.org/jira/browse/HIVE-24433 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-
[jira] [Updated] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values
[ https://issues.apache.org/jira/browse/HIVE-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24433: -- Labels: pull-request-available (was: ) > AutoCompaction is not getting triggered for CamelCase Partition Values > -- > > Key: HIVE-24433 > URL: https://issues.apache.org/jira/browse/HIVE-24433 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > PartionKeyValue is getting converted into lowerCase in below 2 places. > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851] > Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries > from proper partition values. > When query completes, the entry moves from TXN_COMPONENTS to > COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the > partition & considers it as invalid partition > {code:java} > create table abc(name string) partitioned by(city string) stored as orc > tblproperties('transactional'='true'); > insert into abc partition(city='Bangalore') values('aaa'); > {code} > Example entry in COMPLETED_TXN_COMPONENTS > {noformat} > +---+--++---+-+-+---+ > | CTC_TXNID | CTC_DATABASE | CTC_TABLE | CTC_PARTITION | > CTC_TIMESTAMP | CTC_WRITEID | CTC_UPDATE_DELETE | > +---+--++---+-+-+---+ > | 2 | default | abc | city=bangalore | 2020-11-25 09:26:59 > | 1 | N | > +---+--++---+-+-+---+ > {noformat} > > AutoCompaction fails to get triggered with below error > {code:java} > 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator > (Initiator.java:run(98)) - Checking to see if we should compact > default.abc.city=bangalore > 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator > (Initiator.java:run(155)) - Can't find partition > default.compaction_test.city=bhubaneshwar, assuming it has been dropped and > moving on{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values
[ https://issues.apache.org/jira/browse/HIVE-24433?focusedWorklogId=516894&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516894 ] ASF GitHub Bot logged work on HIVE-24433: - Author: ASF GitHub Bot Created on: 26/Nov/20 02:03 Start Date: 26/Nov/20 02:03 Worklog Time Spent: 10m Work Description: nareshpr opened a new pull request #1712: URL: https://github.com/apache/hive/pull/1712 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516894) Remaining Estimate: 0h Time Spent: 10m > AutoCompaction is not getting triggered for CamelCase Partition Values > -- > > Key: HIVE-24433 > URL: https://issues.apache.org/jira/browse/HIVE-24433 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > PartionKeyValue is getting converted into lowerCase in below 2 places. > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851] > Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries > from proper partition values. > When query completes, the entry moves from TXN_COMPONENTS to > COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the > partition & considers it as invalid partition > {code:java} > create table abc(name string) partitioned by(city string) stored as orc > tblproperties('transactional'='true'); > insert into abc partition(city='Bangalore') values('aaa'); > {code} > Example entry in COMPLETED_TXN_COMPONENTS > {noformat} > +---+--++---+-+-+---+ > | CTC_TXNID | CTC_DATABASE | CTC_TABLE | CTC_PARTITION | > CTC_TIMESTAMP | CTC_WRITEID | CTC_UPDATE_DELETE | > +---+--++---+-+-+---+ > | 2 | default | abc | city=bangalore | 2020-11-25 09:26:59 > | 1 | N | > +---+--++---+-+-+---+ > {noformat} > > AutoCompaction fails to get triggered with below error > {code:java} > 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator > (Initiator.java:run(98)) - Checking to see if we should compact > default.abc.city=bangalore > 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator > (Initiator.java:run(155)) - Can't find partition > default.compaction_test.city=bhubaneshwar, assuming it has been dropped and > moving on{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values
[ https://issues.apache.org/jira/browse/HIVE-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-24433: -- Description: PartionKeyValue is getting converted into lowerCase in below 2 places. [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851] Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries from proper partition values. When query completes, the entry moves from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the partition & considers it as invalid partition {code:java} create table abc(name string) partitioned by(city string) stored as orc tblproperties('transactional'='true'); insert into abc partition(city='Bangalore') values('aaa'); {code} Example entry in COMPLETED_TXN_COMPONENTS {noformat} +---+--++---+-+-+---+ | CTC_TXNID | CTC_DATABASE | CTC_TABLE | CTC_PARTITION | CTC_TIMESTAMP | CTC_WRITEID | CTC_UPDATE_DELETE | +---+--++---+-+-+---+ | 2 | default | abc | city=bangalore | 2020-11-25 09:26:59 | 1 | N | +---+--++---+-+-+---+ {noformat} AutoCompaction fails to get triggered with below error {code:java} 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator (Initiator.java:run(98)) - Checking to see if we should compact default.abc.city=bangalore 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator (Initiator.java:run(155)) - Can't find partition default.compaction_test.city=bhubaneshwar, assuming it has been dropped and moving on{code} was: PartionKeyValue is getting converted into lowerCase in below 2 places. [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851] Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries from proper partition values. When query completes, the entry moves from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the partition & considers it as invalid partition {code:java} create table abc(name string) partitioned by(city string) stored as orc tblproperties('transactional'='true'); insert into abc partition(city='Bangalore') values('aaa'); {code} Example entry in COMPLETED_TXN_COMPONENTS {noformat} +---+--++---+-+-+---+ | CTC_TXNID | CTC_DATABASE | CTC_TABLE | CTC_PARTITION | CTC_TIMESTAMP | CTC_WRITEID | CTC_UPDATE_DELETE | +---+--++---+-+-+---+ | 2 | default | abc | city=bangalore | 2020-11-25 09:26:59 | 1 | N | +---+--++---+-+-+---+ {noformat} AutoCompaction fails to get triggered with below error {code:java} 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator (Initiator.java:run(98)) - Checking to see if we should compact default.abc.city=bangalore 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator (Initiator.java:run(155)) - Can't find partition default.compaction_test.city=bhubaneshwar, assuming it has been dropped and moving on{code} > AutoCompaction is not getting triggered for CamelCase Partition Values > -- > > Key: HIVE-24433 > URL: https://issues.apache.org/jira/browse/HIVE-24433 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > > PartionKeyValue is getting converted into lowerCase in below 2 places. > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851] > Because of which TXN_COM
[jira] [Updated] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values
[ https://issues.apache.org/jira/browse/HIVE-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-24433: -- Description: PartionKeyValue is getting converted into lowerCase in below 2 places. [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851] Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries from proper partition values. When query completes, the entry moves from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the partition & considers it as invalid partition {code:java} create table abc(name string) partitioned by(city string) stored as orc tblproperties('transactional'='true'); insert into abc partition(city='Bangalore') values('aaa'); {code} Example entry in COMPLETED_TXN_COMPONENTS {noformat} +---+--++---+-+-+---+ | CTC_TXNID | CTC_DATABASE | CTC_TABLE | CTC_PARTITION | CTC_TIMESTAMP | CTC_WRITEID | CTC_UPDATE_DELETE | +---+--++---+-+-+---+ | 2 | default | abc | city=bangalore | 2020-11-25 09:26:59 | 1 | N | +---+--++---+-+-+---+ {noformat} AutoCompaction fails to get triggered with below error {code:java} 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator (Initiator.java:run(98)) - Checking to see if we should compact default.abc.city=bangalore 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator (Initiator.java:run(155)) - Can't find partition default.compaction_test.city=bhubaneshwar, assuming it has been dropped and moving on{code} was: PartionKeyValue is getting converted into lowerCase in below 2 places. [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851] Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries from proper partition values. When query completes, the entry moves from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the partition & considers it as invalid partition create table abc(name string) partitioned by(city string) stored as orc tblproperties('transactional'='true'); insert into abc partition(city='Bangalore') values('aaa'); Example entry in COMPLETED_TXN_COMPONENTS {noformat} +---+--++---+-+-+---+ | CTC_TXNID | CTC_DATABASE | CTC_TABLE | CTC_PARTITION | CTC_TIMESTAMP | CTC_WRITEID | CTC_UPDATE_DELETE | +---+--++---+-+-+---+ | 2 | default | abc | city=bangalore | 2020-11-25 09:26:59 | 1 | N | +---+--++---+-+-+---+ {noformat} AutoCompaction fails to get triggered with below error {code:java} 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator (Initiator.java:run(98)) - Checking to see if we should compact default.abc.city=bangalore 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator (Initiator.java:run(155)) - Can't find partition default.compaction_test.city=bhubaneshwar, assuming it has been dropped and moving on{code} > AutoCompaction is not getting triggered for CamelCase Partition Values > -- > > Key: HIVE-24433 > URL: https://issues.apache.org/jira/browse/HIVE-24433 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > > PartionKeyValue is getting converted into lowerCase in below 2 places. > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851] > Because of which TXN_COMPONENTS & HI
[jira] [Updated] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values
[ https://issues.apache.org/jira/browse/HIVE-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-24433: -- Description: PartionKeyValue is getting converted into lowerCase in below 2 places. [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851] Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries from proper partition values. When query completes, the entry moves from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the partition & considers it as invalid partition create table abc(name string) partitioned by(city string) stored as orc tblproperties('transactional'='true'); insert into abc partition(city='Bangalore') values('aaa'); Example entry in COMPLETED_TXN_COMPONENTS {noformat} +---+--++---+-+-+---+ | CTC_TXNID | CTC_DATABASE | CTC_TABLE | CTC_PARTITION | CTC_TIMESTAMP | CTC_WRITEID | CTC_UPDATE_DELETE | +---+--++---+-+-+---+ | 2 | default | abc | city=bangalore | 2020-11-25 09:26:59 | 1 | N | +---+--++---+-+-+---+ {noformat} AutoCompaction fails to get triggered with below error {code:java} 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator (Initiator.java:run(98)) - Checking to see if we should compact default.abc.city=bangalore 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator (Initiator.java:run(155)) - Can't find partition default.compaction_test.city=bhubaneshwar, assuming it has been dropped and moving on{code} was: partionKey=paritionValue is getting converted into lowerCase in below 2 places. [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851 Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries from proper partition values. When query completes, the entry moves from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the partition & considers it as invalid partition create table abc(name string) partitioned by(city string) stored as orc tblproperties('transactional'='true'); insert into abc partition(city='Bangalore') values('aaa'); Example entry in COMPLETED_TXN_COMPONENTS {noformat} +---+--++---+-+-+---+ | CTC_TXNID | CTC_DATABASE | CTC_TABLE | CTC_PARTITION | CTC_TIMESTAMP | CTC_WRITEID | CTC_UPDATE_DELETE | +---+--++---+-+-+---+ | 2 | default | abc | city=bangalore | 2020-11-25 09:26:59 | 1 | N | +---+--++---+-+-+---+ {noformat} AutoCompaction fails to get triggered with below error 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator (Initiator.java:run(98)) - Checking to see if we should compact default.abc.city=bangalore 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator (Initiator.java:run(155)) - Can't find partition default.compaction_test.city=bhubaneshwar, assuming it has been dropped and moving on > AutoCompaction is not getting triggered for CamelCase Partition Values > -- > > Key: HIVE-24433 > URL: https://issues.apache.org/jira/browse/HIVE-24433 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > > PartionKeyValue is getting converted into lowerCase in below 2 places. > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851] > Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having
[jira] [Assigned] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values
[ https://issues.apache.org/jira/browse/HIVE-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R reassigned HIVE-24433: - > AutoCompaction is not getting triggered for CamelCase Partition Values > -- > > Key: HIVE-24433 > URL: https://issues.apache.org/jira/browse/HIVE-24433 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > > partionKey=paritionValue is getting converted into lowerCase in below 2 > places. > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851 > Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries > from proper partition values. > When query completes, the entry moves from TXN_COMPONENTS to > COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the > partition & considers it as invalid partition > create table abc(name string) partitioned by(city string) stored as orc > tblproperties('transactional'='true'); > insert into abc partition(city='Bangalore') values('aaa'); > Example entry in COMPLETED_TXN_COMPONENTS > > {noformat} > +---+--++---+-+-+---+ > | CTC_TXNID | CTC_DATABASE | CTC_TABLE | CTC_PARTITION | > CTC_TIMESTAMP | CTC_WRITEID | CTC_UPDATE_DELETE | > +---+--++---+-+-+---+ > | 2 | default | abc | city=bangalore | 2020-11-25 09:26:59 > | 1 | N | > +---+--++---+-+-+---+ > {noformat} > > AutoCompaction fails to get triggered with below error > 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator > (Initiator.java:run(98)) - Checking to see if we should compact > default.abc.city=bangalore > 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator > (Initiator.java:run(155)) - Can't find partition > default.compaction_test.city=bhubaneshwar, assuming it has been dropped and > moving on -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24144) getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value
[ https://issues.apache.org/jira/browse/HIVE-24144?focusedWorklogId=516872&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516872 ] ASF GitHub Bot logged work on HIVE-24144: - Author: ASF GitHub Bot Created on: 26/Nov/20 00:42 Start Date: 26/Nov/20 00:42 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1487: URL: https://github.com/apache/hive/pull/1487 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516872) Time Spent: 40m (was: 0.5h) > getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value > > > Key: HIVE-24144 > URL: https://issues.apache.org/jira/browse/HIVE-24144 > Project: Hive > Issue Type: Bug > Components: JDBC, JDBC storage handler >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > {code} > public String getIdentifierQuoteString() throws SQLException { > return " "; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24073) Execution exception in sort-merge semijoin
[ https://issues.apache.org/jira/browse/HIVE-24073?focusedWorklogId=516871&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516871 ] ASF GitHub Bot logged work on HIVE-24073: - Author: ASF GitHub Bot Created on: 26/Nov/20 00:42 Start Date: 26/Nov/20 00:42 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1476: URL: https://github.com/apache/hive/pull/1476#issuecomment-734010074 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516871) Time Spent: 20m (was: 10m) > Execution exception in sort-merge semijoin > -- > > Key: HIVE-24073 > URL: https://issues.apache.org/jira/browse/HIVE-24073 > Project: Hive > Issue Type: Bug > Components: Operators >Reporter: Jesus Camacho Rodriguez >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Working on HIVE-24041, we trigger an additional SJ conversion that leads to > this exception at execution time: > {code} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite > nextKeyWritables[1] > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1063) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:685) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:462) > ... 16 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite > nextKeyWritables[1] > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1037) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1060) > ... 22 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to > overwrite nextKeyWritables[1] > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.processKey(CommonMergeJoinOperator.java:564) > at > org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:243) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887) > at > org.apache.hadoop.hive.ql.exec.TezDummyStoreOperator.process(TezDummyStoreOperator.java:49) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1003) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1020) > ... 23 more > {code} > To reproduce, just set {{hive.auto.convert.sortmerge.join}} to {{true}} in > the last query in {{auto_sortmerge_join_10.q}} after HIVE-24041 has been > merged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24324) Remove deprecated API usage from Avro
[ https://issues.apache.org/jira/browse/HIVE-24324?focusedWorklogId=516835&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516835 ] ASF GitHub Bot logged work on HIVE-24324: - Author: ASF GitHub Bot Created on: 25/Nov/20 21:19 Start Date: 25/Nov/20 21:19 Worklog Time Spent: 10m Work Description: sunchao opened a new pull request #1711: URL: https://github.com/apache/hive/pull/1711 ### What changes were proposed in this pull request? This backports #1621 to branch-3.1. This mainly replace `JsonProperties.getJsonProp` with `JsonProperties.getObjectProp`. Note that there's one place in `SchemaToTypeInfo` where we explicitly call `getIntValue` to forbid string as precision/scale values (see [HIVE-7174](https://issues.apache.org/jira/browse/HIVE-7174)). To retain the old behavior, we check if the returned object is integer type, and if not, return a default 0 following `JsonNode` implementation. ### Why are the changes needed? `JsonProperties#getJsonProp` has been marked as deprecated in Avro 1.8 and removed since Avro 1.9. This replaces the API usage for this with `getObjectProp` which doesn't leak Json node from jackson. This will help downstream apps to depend on Hive while using higher version of Avro, and also help Hive to upgrade Avro version itself. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516835) Time Spent: 1h 10m (was: 1h) > Remove deprecated API usage from Avro > - > > Key: HIVE-24324 > URL: https://issues.apache.org/jira/browse/HIVE-24324 > Project: Hive > Issue Type: Improvement > Components: Avro >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Fix For: 2.3.8, 4.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > {{JsonProperties#getJsonProp}} has been marked as deprecated in Avro 1.8 and > removed since Avro 1.9. This replaces the API usage for this with > {{getObjectProp}} which doesn't leak Json node from jackson. This will help > downstream apps to depend on Hive while using higher version of Avro, and > also help Hive to upgrade Avro version itself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24414) Backport HIVE-19662 to branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-24414: Fix Version/s: 3.1.3 > Backport HIVE-19662 to branch-3.1 > - > > Key: HIVE-24414 > URL: https://issues.apache.org/jira/browse/HIVE-24414 > Project: Hive > Issue Type: Improvement >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Fix For: 3.1.3 > > Time Spent: 50m > Remaining Estimate: 0h > > This JIRA proposes to backport HIVE-19662 to branch-3.1 and upgrade Avro to > 1.8.2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24414) Backport HIVE-19662 to branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24414?focusedWorklogId=516832&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516832 ] ASF GitHub Bot logged work on HIVE-24414: - Author: ASF GitHub Bot Created on: 25/Nov/20 21:15 Start Date: 25/Nov/20 21:15 Worklog Time Spent: 10m Work Description: sunchao merged pull request #1698: URL: https://github.com/apache/hive/pull/1698 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516832) Time Spent: 40m (was: 0.5h) > Backport HIVE-19662 to branch-3.1 > - > > Key: HIVE-24414 > URL: https://issues.apache.org/jira/browse/HIVE-24414 > Project: Hive > Issue Type: Improvement >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > This JIRA proposes to backport HIVE-19662 to branch-3.1 and upgrade Avro to > 1.8.2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24414) Backport HIVE-19662 to branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved HIVE-24414. - Hadoop Flags: Reviewed Resolution: Fixed > Backport HIVE-19662 to branch-3.1 > - > > Key: HIVE-24414 > URL: https://issues.apache.org/jira/browse/HIVE-24414 > Project: Hive > Issue Type: Improvement >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > This JIRA proposes to backport HIVE-19662 to branch-3.1 and upgrade Avro to > 1.8.2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24414) Backport HIVE-19662 to branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24414?focusedWorklogId=516833&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516833 ] ASF GitHub Bot logged work on HIVE-24414: - Author: ASF GitHub Bot Created on: 25/Nov/20 21:15 Start Date: 25/Nov/20 21:15 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #1698: URL: https://github.com/apache/hive/pull/1698#issuecomment-733948911 Thanks @aihuaxu ! merged to branch-3.1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516833) Time Spent: 50m (was: 40m) > Backport HIVE-19662 to branch-3.1 > - > > Key: HIVE-24414 > URL: https://issues.apache.org/jira/browse/HIVE-24414 > Project: Hive > Issue Type: Improvement >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > This JIRA proposes to backport HIVE-19662 to branch-3.1 and upgrade Avro to > 1.8.2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24431) Null Pointer exception while sending data to jdbc
[ https://issues.apache.org/jira/browse/HIVE-24431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabien Carrion updated HIVE-24431: -- Description: I was receiving some null pointer while writing in db: {quote}{{ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 3, vertexId=vertex_1604850281565_5081_1_02, diagnostics=[Task failed, taskId=task_1604850281565_5081_1_02_01, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1604850281565_5081_1_02_01_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row}} {{ at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)}} {{ at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)}} {{ at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)}} {{ at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)}} {{ at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)}} {{ at java.security.AccessController.doPrivileged(Native Method)}} {{ at javax.security.auth.Subject.doAs(Subject.java:422)}} {{ at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)}} {{ at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)}} {{ at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)}} {{ at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)}} {{ at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)}} {{ at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)}} {{ at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)}} {{ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)}} {{ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)}} {{ at java.lang.Thread.run(Thread.java:745)}} {{Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row}} {{ at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:304)}} {{ at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318)}} {{ at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)}} {{ ... 16 more}} {{Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row}} {{ at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:378)}} {{ at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:294)}} {{ ... 18 more}} {{Caused by: java.lang.NullPointerException}} {{ at org.apache.hive.storage.jdbc.JdbcSerDe.serialize(JdbcSerDe.java:166)}} {{ at org.apache.hive.storage.jdbc.JdbcSerDe.serialize(JdbcSerDe.java:59)}} {{ at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:961)}} {{ at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)}} {{ at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)}} {{ at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)}} {{ at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)}} {{ at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)}} {{ at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)}} {{ at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)}} {{ at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.handleOutputRows(PTFOperator.java:337)}} {{ at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.processRow(PTFOperator.java:325)}} {{ at org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:139)}} {{ at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)}} {{ at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)}} {{ at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)}} {{ at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)}} {{ at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:363)}} {{ ... 19 more}} {quote} I just add a check in the patch. was: I was receiving some null pointer while writing in db. I just add a check. > Null Pointer exception while sending data to jdbc > - > > Key: HIVE-24431
[jira] [Updated] (HIVE-24431) Null Pointer exception while sending data to jdbc
[ https://issues.apache.org/jira/browse/HIVE-24431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabien Carrion updated HIVE-24431: -- Description: I was receiving some null pointer while writing in db: {quote}ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 3, vertexId=vertex_1604850281565_5081_1_02, diagnostics=[Task failed, taskId=task_1604850281565_5081_1_02_01, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1604850281565_5081_1_02_01_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:304) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:378) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:294) ... 18 more Caused by: java.lang.NullPointerException at org.apache.hive.storage.jdbc.JdbcSerDe.serialize(JdbcSerDe.java:166) at org.apache.hive.storage.jdbc.JdbcSerDe.serialize(JdbcSerDe.java:59) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:961) at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927) at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.handleOutputRows(PTFOperator.java:337) at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.processRow(PTFOperator.java:325) at org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:139) at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:363) ... 19 more {quote} I just add a check in the patch. was: I was receiving some null pointer while writing in db: {quote}{{ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 3, vertexId=vertex_1604850281565_5081_1_02, diagnostics=[Task failed, taskId=task_1604850281565_5081_1_02_01, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt
[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches
[ https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=516827&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516827 ] ASF GitHub Bot logged work on HIVE-24432: - Author: ASF GitHub Bot Created on: 25/Nov/20 20:55 Start Date: 25/Nov/20 20:55 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1710: URL: https://github.com/apache/hive/pull/1710 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516827) Remaining Estimate: 0h Time Spent: 10m > Delete Notification Events in Batches > - > > Key: HIVE-24432 > URL: https://issues.apache.org/jira/browse/HIVE-24432 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Notification events are loaded in batches (reduces memory pressure on the > HMS), but all of the deletes happen under a single transactions and, when > deleting many records, can put a lot of pressure on the backend database. > Instead, delete events in batches (in different transactions) as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24432) Delete Notification Events in Batches
[ https://issues.apache.org/jira/browse/HIVE-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24432: -- Labels: pull-request-available (was: ) > Delete Notification Events in Batches > - > > Key: HIVE-24432 > URL: https://issues.apache.org/jira/browse/HIVE-24432 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Notification events are loaded in batches (reduces memory pressure on the > HMS), but all of the deletes happen under a single transactions and, when > deleting many records, can put a lot of pressure on the backend database. > Instead, delete events in batches (in different transactions) as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24432) Delete Notification Events in Batches
[ https://issues.apache.org/jira/browse/HIVE-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor reassigned HIVE-24432: - > Delete Notification Events in Batches > - > > Key: HIVE-24432 > URL: https://issues.apache.org/jira/browse/HIVE-24432 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > Notification events are loaded in batches (reduces memory pressure on the > HMS), but all of the deletes happen under a single transactions and, when > deleting many records, can put a lot of pressure on the backend database. > Instead, delete events in batches (in different transactions) as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24431) Null Pointer exception while sending data to jdbc
[ https://issues.apache.org/jira/browse/HIVE-24431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabien Carrion updated HIVE-24431: -- Attachment: check_null.patch > Null Pointer exception while sending data to jdbc > - > > Key: HIVE-24431 > URL: https://issues.apache.org/jira/browse/HIVE-24431 > Project: Hive > Issue Type: Bug > Components: JDBC storage handler >Affects Versions: All Versions >Reporter: Fabien Carrion >Priority: Trivial > Attachments: check_null.patch > > > I was receiving some null pointer while writing in db. > I just add a check. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24408) Upgrade Parquet to 1.11.1
[ https://issues.apache.org/jira/browse/HIVE-24408?focusedWorklogId=516803&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516803 ] ASF GitHub Bot logged work on HIVE-24408: - Author: ASF GitHub Bot Created on: 25/Nov/20 18:39 Start Date: 25/Nov/20 18:39 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #1692: URL: https://github.com/apache/hive/pull/1692#issuecomment-733884583 Thanks @jcamachor for merging! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516803) Time Spent: 1h 40m (was: 1.5h) > Upgrade Parquet to 1.11.1 > - > > Key: HIVE-24408 > URL: https://issues.apache.org/jira/browse/HIVE-24408 > Project: Hive > Issue Type: Improvement >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Parquet 1.11.1 has some bug fixes so Hive should consider to upgrade to it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24430) DiskRangeInfo should make use of DiskRangeList
[ https://issues.apache.org/jira/browse/HIVE-24430?focusedWorklogId=516797&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516797 ] ASF GitHub Bot logged work on HIVE-24430: - Author: ASF GitHub Bot Created on: 25/Nov/20 17:45 Start Date: 25/Nov/20 17:45 Worklog Time Spent: 10m Work Description: pgaref closed pull request #1707: URL: https://github.com/apache/hive/pull/1707 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516797) Time Spent: 0.5h (was: 20m) > DiskRangeInfo should make use of DiskRangeList > -- > > Key: HIVE-24430 > URL: https://issues.apache.org/jira/browse/HIVE-24430 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Trivial > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > DiskRangeInfo should make user of DiskRangeList instead of List – > this will help us transition to ORC 1.6. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24430) DiskRangeInfo should make use of DiskRangeList
[ https://issues.apache.org/jira/browse/HIVE-24430?focusedWorklogId=516796&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516796 ] ASF GitHub Bot logged work on HIVE-24430: - Author: ASF GitHub Bot Created on: 25/Nov/20 17:45 Start Date: 25/Nov/20 17:45 Worklog Time Spent: 10m Work Description: pgaref opened a new pull request #1709: URL: https://github.com/apache/hive/pull/1709 ### What changes were proposed in this pull request? Change DiskRangeInfo to use DiskRangeList instead of DiskRange ### Why are the changes needed? Transition to ORC 1.6 where DiskRangeList is the main class used. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516796) Time Spent: 20m (was: 10m) > DiskRangeInfo should make use of DiskRangeList > -- > > Key: HIVE-24430 > URL: https://issues.apache.org/jira/browse/HIVE-24430 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Trivial > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > DiskRangeInfo should make user of DiskRangeList instead of List – > this will help us transition to ORC 1.6. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24409) Use LazyBinarySerDe2 in PlanUtils::getReduceValueTableDesc
[ https://issues.apache.org/jira/browse/HIVE-24409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24409: -- Labels: pull-request-available (was: ) > Use LazyBinarySerDe2 in PlanUtils::getReduceValueTableDesc > -- > > Key: HIVE-24409 > URL: https://issues.apache.org/jira/browse/HIVE-24409 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2020-11-23 at 10.52.49 AM.png > > Time Spent: 10m > Remaining Estimate: 0h > > !Screenshot 2020-11-23 at 10.52.49 AM.png|width=858,height=493! > Lines of interest: > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java#L535] > (non-vectorized path due to stats) > > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java#L581] > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24409) Use LazyBinarySerDe2 in PlanUtils::getReduceValueTableDesc
[ https://issues.apache.org/jira/browse/HIVE-24409?focusedWorklogId=516787&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516787 ] ASF GitHub Bot logged work on HIVE-24409: - Author: ASF GitHub Bot Created on: 25/Nov/20 17:33 Start Date: 25/Nov/20 17:33 Worklog Time Spent: 10m Work Description: maheshk114 opened a new pull request #1708: URL: https://github.com/apache/hive/pull/1708 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516787) Remaining Estimate: 0h Time Spent: 10m > Use LazyBinarySerDe2 in PlanUtils::getReduceValueTableDesc > -- > > Key: HIVE-24409 > URL: https://issues.apache.org/jira/browse/HIVE-24409 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Attachments: Screenshot 2020-11-23 at 10.52.49 AM.png > > Time Spent: 10m > Remaining Estimate: 0h > > !Screenshot 2020-11-23 at 10.52.49 AM.png|width=858,height=493! > Lines of interest: > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java#L535] > (non-vectorized path due to stats) > > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java#L581] > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24424) Use PreparedStatements in DbNotificationListener getNextNLId
[ https://issues.apache.org/jira/browse/HIVE-24424?focusedWorklogId=516777&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516777 ] ASF GitHub Bot logged work on HIVE-24424: - Author: ASF GitHub Bot Created on: 25/Nov/20 17:01 Start Date: 25/Nov/20 17:01 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1704: URL: https://github.com/apache/hive/pull/1704 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516777) Time Spent: 0.5h (was: 20m) > Use PreparedStatements in DbNotificationListener getNextNLId > > > Key: HIVE-24424 > URL: https://issues.apache.org/jira/browse/HIVE-24424 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Simplify the code, remove debug logging concatenation, and make it more > readable, -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24424) Use PreparedStatements in DbNotificationListener getNextNLId
[ https://issues.apache.org/jira/browse/HIVE-24424?focusedWorklogId=516776&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516776 ] ASF GitHub Bot logged work on HIVE-24424: - Author: ASF GitHub Bot Created on: 25/Nov/20 17:00 Start Date: 25/Nov/20 17:00 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #1704: URL: https://github.com/apache/hive/pull/1704 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516776) Time Spent: 20m (was: 10m) > Use PreparedStatements in DbNotificationListener getNextNLId > > > Key: HIVE-24424 > URL: https://issues.apache.org/jira/browse/HIVE-24424 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Simplify the code, remove debug logging concatenation, and make it more > readable, -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-19253) HMS ignores tableType property for external tables
[ https://issues.apache.org/jira/browse/HIVE-19253?focusedWorklogId=516771&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516771 ] ASF GitHub Bot logged work on HIVE-19253: - Author: ASF GitHub Bot Created on: 25/Nov/20 16:57 Start Date: 25/Nov/20 16:57 Worklog Time Spent: 10m Work Description: szehon-ho edited a comment on pull request #1537: URL: https://github.com/apache/hive/pull/1537#issuecomment-733051181 Sorry I've been awhile. Thanks Naveen for taking a look. I had some free time today to take a look at the TestHiveMetstoreTransformer but am still a bit lost. I tried to set the table type to ManagedTable as you suggest, but the MetastoreDefaultTransformer actually transforms it back to External table by the time the assert happens (actually this should probably do it not via the properties but by the modeled TableType, but in this case it doesn't matter). Code: https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDefaultTransformer.java#L596. So then after that , it run the asserts which fail as they seem to be testing for ManagedTable, unless I am mistaken. If you have some time to let me know anything else to try, would appreciate it. I haven't taken a look at Miguel's comments yet. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516771) Time Spent: 2h 10m (was: 2h) > HMS ignores tableType property for external tables > -- > > Key: HIVE-19253 > URL: https://issues.apache.org/jira/browse/HIVE-19253 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.1.0, 3.0.0, 4.0.0 >Reporter: Alex Kolbasov >Assignee: Vihang Karajgaonkar >Priority: Major > Labels: newbie, pull-request-available > Attachments: HIVE-19253.01.patch, HIVE-19253.02.patch, > HIVE-19253.03.patch, HIVE-19253.03.patch, HIVE-19253.04.patch, > HIVE-19253.05.patch, HIVE-19253.06.patch, HIVE-19253.07.patch, > HIVE-19253.08.patch, HIVE-19253.09.patch, HIVE-19253.10.patch, > HIVE-19253.11.patch, HIVE-19253.12.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > When someone creates a table using Thrift API they may think that setting > tableType to {{EXTERNAL_TABLE}} creates an external table. And boom - their > table is gone later because HMS will silently change it to managed table. > here is the offending code: > {code:java} > private MTable convertToMTable(Table tbl) throws InvalidObjectException, > MetaException { > ... > // If the table has property EXTERNAL set, update table type > // accordingly > String tableType = tbl.getTableType(); > boolean isExternal = > Boolean.parseBoolean(tbl.getParameters().get("EXTERNAL")); > if (TableType.MANAGED_TABLE.toString().equals(tableType)) { > if (isExternal) { > tableType = TableType.EXTERNAL_TABLE.toString(); > } > } > if (TableType.EXTERNAL_TABLE.toString().equals(tableType)) { > if (!isExternal) { // Here! > tableType = TableType.MANAGED_TABLE.toString(); > } > } > {code} > So if the EXTERNAL parameter is not set, table type is changed to managed > even if it was external in the first place - which is wrong. > More over, in other places code looks at the table property to decide table > type and some places look at parameter. HMS should really make its mind which > one to use. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24051) Hive lineage information exposed in ExecuteWithHookContext
[ https://issues.apache.org/jira/browse/HIVE-24051?focusedWorklogId=516769&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516769 ] ASF GitHub Bot logged work on HIVE-24051: - Author: ASF GitHub Bot Created on: 25/Nov/20 16:56 Start Date: 25/Nov/20 16:56 Worklog Time Spent: 10m Work Description: szehon-ho commented on pull request #1413: URL: https://github.com/apache/hive/pull/1413#issuecomment-733828648 Thanks a lot @sunchao ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516769) Time Spent: 2h (was: 1h 50m) > Hive lineage information exposed in ExecuteWithHookContext > -- > > Key: HIVE-24051 > URL: https://issues.apache.org/jira/browse/HIVE-24051 > Project: Hive > Issue Type: Improvement > Components: Configuration >Reporter: Szehon Ho >Assignee: Szehon Ho >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-24051.patch > > Time Spent: 2h > Remaining Estimate: 0h > > The lineage information is not populated unless certain hooks are enabled. > However, this is a bit fragile, and no way for another hook that we write to > get this information. This proposes a flag to enable this instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-24430) DiskRangeInfo should make use of DiskRangeList
[ https://issues.apache.org/jira/browse/HIVE-24430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-24430 started by Panagiotis Garefalakis. - > DiskRangeInfo should make use of DiskRangeList > -- > > Key: HIVE-24430 > URL: https://issues.apache.org/jira/browse/HIVE-24430 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Trivial > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > DiskRangeInfo should make user of DiskRangeList instead of List – > this will help us transition to ORC 1.6. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24430) DiskRangeInfo should make use of DiskRangeList
[ https://issues.apache.org/jira/browse/HIVE-24430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24430: -- Labels: pull-request-available (was: ) > DiskRangeInfo should make use of DiskRangeList > -- > > Key: HIVE-24430 > URL: https://issues.apache.org/jira/browse/HIVE-24430 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Trivial > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > DiskRangeInfo should make user of DiskRangeList instead of List – > this will help us transition to ORC 1.6. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24430) DiskRangeInfo should make use of DiskRangeList
[ https://issues.apache.org/jira/browse/HIVE-24430?focusedWorklogId=516759&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516759 ] ASF GitHub Bot logged work on HIVE-24430: - Author: ASF GitHub Bot Created on: 25/Nov/20 16:46 Start Date: 25/Nov/20 16:46 Worklog Time Spent: 10m Work Description: pgaref opened a new pull request #1707: URL: https://github.com/apache/hive/pull/1707 ### What changes were proposed in this pull request? Change DiskRangeInfo to use DiskRangeList instead of DiskRange ### Why are the changes needed? Transition to ORC 1.6 where DiskRangeList is the main class used. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516759) Remaining Estimate: 0h Time Spent: 10m > DiskRangeInfo should make use of DiskRangeList > -- > > Key: HIVE-24430 > URL: https://issues.apache.org/jira/browse/HIVE-24430 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Trivial > Time Spent: 10m > Remaining Estimate: 0h > > DiskRangeInfo should make user of DiskRangeList instead of List – > this will help us transition to ORC 1.6. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24430) DiskRangeInfo should make use of DiskRangeList
[ https://issues.apache.org/jira/browse/HIVE-24430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis reassigned HIVE-24430: - > DiskRangeInfo should make use of DiskRangeList > -- > > Key: HIVE-24430 > URL: https://issues.apache.org/jira/browse/HIVE-24430 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Trivial > > DiskRangeInfo should make user of DiskRangeList instead of List – > this will help us transition to ORC 1.6. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24414) Backport HIVE-19662 to branch-3.1
[ https://issues.apache.org/jira/browse/HIVE-24414?focusedWorklogId=516754&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516754 ] ASF GitHub Bot logged work on HIVE-24414: - Author: ASF GitHub Bot Created on: 25/Nov/20 16:31 Start Date: 25/Nov/20 16:31 Worklog Time Spent: 10m Work Description: aihuaxu commented on pull request #1698: URL: https://github.com/apache/hive/pull/1698#issuecomment-733814196 Thanks @sunchao to work on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516754) Time Spent: 0.5h (was: 20m) > Backport HIVE-19662 to branch-3.1 > - > > Key: HIVE-24414 > URL: https://issues.apache.org/jira/browse/HIVE-24414 > Project: Hive > Issue Type: Improvement >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > This JIRA proposes to backport HIVE-19662 to branch-3.1 and upgrade Avro to > 1.8.2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23553) Bump ORC version to 1.6
[ https://issues.apache.org/jira/browse/HIVE-23553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-23553: -- Description: Apache Hive is currently on 1.5.X version and in order to take advantage of the latest ORC improvements such as column encryption we have to bump to 1.6.X. https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin Even though ORC reader could work out of the box, HIVE LLAP is heavily depending on internal ORC APIs e.g., to retrieve and store File Footers, Tails, streams – un/compress RG data etc. As there ware many internal changes from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the upgrade is not straightforward. This Umbrella Jira tracks this upgrade effort. was: Apache Hive is currently on 1.5.X version and in order to take advantage of the latest ORC improvements such as column encryption we have to bump to 1.6.X. Even though ORC reader could work out of the box, HIVE LLAP is heavily depending on internal ORC APIs e.g., to retrieve and store File Footers, Tails, streams – un/compress RG data etc. As there ware many internal changes from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the upgrade is not straightforward. This Umbrella Jira tracks this upgrade effort. > Bump ORC version to 1.6 > --- > > Key: HIVE-23553 > URL: https://issues.apache.org/jira/browse/HIVE-23553 > Project: Hive > Issue Type: Improvement >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > > Apache Hive is currently on 1.5.X version and in order to take advantage of > the latest ORC improvements such as column encryption we have to bump to > 1.6.X. > https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin > Even though ORC reader could work out of the box, HIVE LLAP is heavily > depending on internal ORC APIs e.g., to retrieve and store File Footers, > Tails, streams – un/compress RG data etc. As there ware many internal changes > from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the > upgrade is not straightforward. > This Umbrella Jira tracks this upgrade effort. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23553) Bump ORC version to 1.6
[ https://issues.apache.org/jira/browse/HIVE-23553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-23553: -- Description: Apache Hive is currently on 1.5.X version and in order to take advantage of the latest ORC improvements such as column encryption we have to bump to 1.6.X. Even though ORC reader could work out of the box, HIVE LLAP is heavily depending on internal ORC APIs e.g., to retrieve and store File Footers, Tails, streams – un/compress RG data etc. As there ware many internal changes from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the upgrade is not straightforward. This Umbrella Jira tracks this upgrade effort. was: > Bump ORC version to 1.6 > --- > > Key: HIVE-23553 > URL: https://issues.apache.org/jira/browse/HIVE-23553 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > > Apache Hive is currently on 1.5.X version and in order to take advantage of > the latest ORC improvements such as column encryption we have to bump to > 1.6.X. > Even though ORC reader could work out of the box, HIVE LLAP is heavily > depending on internal ORC APIs e.g., to retrieve and store File Footers, > Tails, streams – un/compress RG data etc. As there ware many internal changes > from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the > upgrade is not straightforward. > This Umbrella Jira tracks this upgrade effort. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-23553) Bump ORC version to 1.6
[ https://issues.apache.org/jira/browse/HIVE-23553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-23553 started by Panagiotis Garefalakis. - > Bump ORC version to 1.6 > --- > > Key: HIVE-23553 > URL: https://issues.apache.org/jira/browse/HIVE-23553 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > > Apache Hive is currently on 1.5.X version and in order to take advantage of > the latest ORC improvements such as column encryption we have to bump to > 1.6.X. > Even though ORC reader could work out of the box, HIVE LLAP is heavily > depending on internal ORC APIs e.g., to retrieve and store File Footers, > Tails, streams – un/compress RG data etc. As there ware many internal changes > from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the > upgrade is not straightforward. > This Umbrella Jira tracks this upgrade effort. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23553) Bump ORC version to 1.6
[ https://issues.apache.org/jira/browse/HIVE-23553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-23553: -- Parent: (was: HIVE-22731) Issue Type: Improvement (was: Sub-task) > Bump ORC version to 1.6 > --- > > Key: HIVE-23553 > URL: https://issues.apache.org/jira/browse/HIVE-23553 > Project: Hive > Issue Type: Improvement >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > > Apache Hive is currently on 1.5.X version and in order to take advantage of > the latest ORC improvements such as column encryption we have to bump to > 1.6.X. > Even though ORC reader could work out of the box, HIVE LLAP is heavily > depending on internal ORC APIs e.g., to retrieve and store File Footers, > Tails, streams – un/compress RG data etc. As there ware many internal changes > from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the > upgrade is not straightforward. > This Umbrella Jira tracks this upgrade effort. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24410) Query-based compaction hangs because of doAs
[ https://issues.apache.org/jira/browse/HIVE-24410?focusedWorklogId=516730&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516730 ] ASF GitHub Bot logged work on HIVE-24410: - Author: ASF GitHub Bot Created on: 25/Nov/20 15:18 Start Date: 25/Nov/20 15:18 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1693: URL: https://github.com/apache/hive/pull/1693#discussion_r530450573 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java ## @@ -531,29 +531,26 @@ protected Boolean findNextCompactionAndExecute(boolean computeStats) throws Inte final StatsUpdater su = computeStats ? StatsUpdater.init(ci, msc.findColumnsWithStats( CompactionInfo.compactionInfoToStruct(ci)), conf, runJobAsSelf(ci.runAs) ? ci.runAs : t.getOwner()) : null; - final CompactorMR mr = new CompactorMR(); + try { -if (runJobAsSelf(ci.runAs)) { - mr.run(conf, jobName.toString(), t, p, sd, tblValidWriteIds, ci, su, msc, dir); +failCompactionIfSetForTest(); Review comment: Thanks @klcopp! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516730) Time Spent: 2h 50m (was: 2h 40m) > Query-based compaction hangs because of doAs > > > Key: HIVE-24410 > URL: https://issues.apache.org/jira/browse/HIVE-24410 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > QB compaction runs within a doas +and+ hive.server2.enable.doAs is set to > true (as of HIVE-24089). On a secure cluster with Worker threads running in > HS2, this results in HMS client not receiving a login context during > compaction queries, so kerberos prompts for a login via stdin which causes > the worker thread to hang until it times out: > {code:java} > "node-x.com-44_executor" #47 daemon prio=1 os_prio=0 tid=0x01506000 > nid=0x1348 runnable [0x7f1beea95000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x9fa38c90> (a java.io.BufferedInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0x8c7d5010> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.readLine(BufferedReader.java:324) > - locked <0x8c7d5010> (a java.io.InputStreamReader) > at java.io.BufferedReader.readLine(BufferedReader.java:389) > at > com.sun.security.auth.callback.TextCallbackHandler.readLine(TextCallbackHandler.java:153) > at > com.sun.security.auth.callback.TextCallbackHandler.handle(TextCallbackHandler.java:120) > at > com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:862) > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:708) > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.security.auth.login.LoginContex
[jira] [Work logged] (HIVE-24410) Query-based compaction hangs because of doAs
[ https://issues.apache.org/jira/browse/HIVE-24410?focusedWorklogId=516721&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516721 ] ASF GitHub Bot logged work on HIVE-24410: - Author: ASF GitHub Bot Created on: 25/Nov/20 14:54 Start Date: 25/Nov/20 14:54 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #1693: URL: https://github.com/apache/hive/pull/1693#discussion_r530432552 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java ## @@ -531,29 +531,26 @@ protected Boolean findNextCompactionAndExecute(boolean computeStats) throws Inte final StatsUpdater su = computeStats ? StatsUpdater.init(ci, msc.findColumnsWithStats( CompactionInfo.compactionInfoToStruct(ci)), conf, runJobAsSelf(ci.runAs) ? ci.runAs : t.getOwner()) : null; - final CompactorMR mr = new CompactorMR(); + try { -if (runJobAsSelf(ci.runAs)) { - mr.run(conf, jobName.toString(), t, p, sd, tblValidWriteIds, ci, su, msc, dir); +failCompactionIfSetForTest(); Review comment: Done: HIVE-24429 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516721) Time Spent: 2h 40m (was: 2.5h) > Query-based compaction hangs because of doAs > > > Key: HIVE-24410 > URL: https://issues.apache.org/jira/browse/HIVE-24410 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > QB compaction runs within a doas +and+ hive.server2.enable.doAs is set to > true (as of HIVE-24089). On a secure cluster with Worker threads running in > HS2, this results in HMS client not receiving a login context during > compaction queries, so kerberos prompts for a login via stdin which causes > the worker thread to hang until it times out: > {code:java} > "node-x.com-44_executor" #47 daemon prio=1 os_prio=0 tid=0x01506000 > nid=0x1348 runnable [0x7f1beea95000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x9fa38c90> (a java.io.BufferedInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0x8c7d5010> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.readLine(BufferedReader.java:324) > - locked <0x8c7d5010> (a java.io.InputStreamReader) > at java.io.BufferedReader.readLine(BufferedReader.java:389) > at > com.sun.security.auth.callback.TextCallbackHandler.readLine(TextCallbackHandler.java:153) > at > com.sun.security.auth.callback.TextCallbackHandler.handle(TextCallbackHandler.java:120) > at > com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:862) > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:708) > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.security.auth.login.LoginContex
[jira] [Commented] (HIVE-24429) Figure out a better way to test failed compactions
[ https://issues.apache.org/jira/browse/HIVE-24429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238778#comment-17238778 ] Karen Coppage commented on HIVE-24429: -- Requested by [~pvary] at https://github.com/apache/hive/pull/1693 > Figure out a better way to test failed compactions > -- > > Key: HIVE-24429 > URL: https://issues.apache.org/jira/browse/HIVE-24429 > Project: Hive > Issue Type: Improvement >Reporter: Karen Coppage >Priority: Major > > This block is executed during compaction: > {code:java} > if(conf.getBoolVar(HiveConf.ConfVars.HIVE_IN_TEST) && > conf.getBoolVar(HiveConf.ConfVars.HIVETESTMODEFAILCOMPACTION)) { > throw new > RuntimeException(HiveConf.ConfVars.HIVETESTMODEFAILCOMPACTION.name() + > "=true"); > }{code} > We should figure out a better way to test failed compaction than including > test code in the source. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24418) there is an error "java.lang.IllegalArgumentException: No columns to insert" when the result data is empty
[ https://issues.apache.org/jira/browse/HIVE-24418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HuiyuZhou updated HIVE-24418: - Description: i created the external hive table to link hbase, when i use hsql to insert data into hbase,there is an error "java.lang.IllegalArgumentException: No columns to insert", i search for the reason and found hbase client does not allow all empty column except rowkey to insert hbase. please following the link for hbase validatePut funtion. [https://stackoverflow.com/questions/56073332/why-hbase-client-put-object-expecting-at-least-a-column-to-be-added-before-subm] i want to find any configuration to skip the error for my hsql, it seems there is no configuration for it. [https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HBaseStorageHandler] was: i created the external hive table to link hbase, when i use hsql to insert data into hbase,there is an error "java.lang.IllegalArgumentException: No columns to insert", i search for the reason and found hbase client does not allow all empty column except rowkey to insert hbase. i also try to use "set hyperbase.fill.null.enable=true" to skip the error for my hsql, but it doest't work, how to avoid the error? is it a bug for this? > there is an error "java.lang.IllegalArgumentException: No columns to insert" > when the result data is empty > -- > > Key: HIVE-24418 > URL: https://issues.apache.org/jira/browse/HIVE-24418 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Affects Versions: 1.1.0 >Reporter: HuiyuZhou >Priority: Major > > i created the external hive table to link hbase, when i use hsql to insert > data into hbase,there is an error "java.lang.IllegalArgumentException: No > columns to insert", i search for the reason and found hbase client does not > allow all empty column except rowkey to insert hbase. > please following the link for hbase validatePut funtion. > [https://stackoverflow.com/questions/56073332/why-hbase-client-put-object-expecting-at-least-a-column-to-be-added-before-subm] > > i want to find any configuration to skip the error for my hsql, it seems > there is no configuration for it. > [https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HBaseStorageHandler] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24410) Query-based compaction hangs because of doAs
[ https://issues.apache.org/jira/browse/HIVE-24410?focusedWorklogId=516713&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516713 ] ASF GitHub Bot logged work on HIVE-24410: - Author: ASF GitHub Bot Created on: 25/Nov/20 14:39 Start Date: 25/Nov/20 14:39 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1693: URL: https://github.com/apache/hive/pull/1693#discussion_r530421562 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java ## @@ -531,29 +531,26 @@ protected Boolean findNextCompactionAndExecute(boolean computeStats) throws Inte final StatsUpdater su = computeStats ? StatsUpdater.init(ci, msc.findColumnsWithStats( CompactionInfo.compactionInfoToStruct(ci)), conf, runJobAsSelf(ci.runAs) ? ci.runAs : t.getOwner()) : null; - final CompactorMR mr = new CompactorMR(); + try { -if (runJobAsSelf(ci.runAs)) { - mr.run(conf, jobName.toString(), t, p, sd, tblValidWriteIds, ci, su, msc, dir); +failCompactionIfSetForTest(); Review comment: Yes, please raise a jira! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516713) Time Spent: 2.5h (was: 2h 20m) > Query-based compaction hangs because of doAs > > > Key: HIVE-24410 > URL: https://issues.apache.org/jira/browse/HIVE-24410 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > QB compaction runs within a doas +and+ hive.server2.enable.doAs is set to > true (as of HIVE-24089). On a secure cluster with Worker threads running in > HS2, this results in HMS client not receiving a login context during > compaction queries, so kerberos prompts for a login via stdin which causes > the worker thread to hang until it times out: > {code:java} > "node-x.com-44_executor" #47 daemon prio=1 os_prio=0 tid=0x01506000 > nid=0x1348 runnable [0x7f1beea95000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x9fa38c90> (a java.io.BufferedInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0x8c7d5010> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.readLine(BufferedReader.java:324) > - locked <0x8c7d5010> (a java.io.InputStreamReader) > at java.io.BufferedReader.readLine(BufferedReader.java:389) > at > com.sun.security.auth.callback.TextCallbackHandler.readLine(TextCallbackHandler.java:153) > at > com.sun.security.auth.callback.TextCallbackHandler.handle(TextCallbackHandler.java:120) > at > com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:862) > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:708) > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.security.auth.login.Login
[jira] [Updated] (HIVE-24418) there is an error "java.lang.IllegalArgumentException: No columns to insert" when the result data is empty
[ https://issues.apache.org/jira/browse/HIVE-24418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HuiyuZhou updated HIVE-24418: - Affects Version/s: (was: 1.1.1) 1.1.0 > there is an error "java.lang.IllegalArgumentException: No columns to insert" > when the result data is empty > -- > > Key: HIVE-24418 > URL: https://issues.apache.org/jira/browse/HIVE-24418 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Affects Versions: 1.1.0 >Reporter: HuiyuZhou >Priority: Major > > i created the external hive table to link hbase, when i use hsql to insert > data into hbase,there is an error "java.lang.IllegalArgumentException: No > columns to insert", i search for the reason and found hbase client does not > allow all empty column except rowkey to insert hbase. > i also try to use "set hyperbase.fill.null.enable=true" to skip the error for > my hsql, but it doest't work, how to avoid the error? > is it a bug for this? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24410) Query-based compaction hangs because of doAs
[ https://issues.apache.org/jira/browse/HIVE-24410?focusedWorklogId=516681&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516681 ] ASF GitHub Bot logged work on HIVE-24410: - Author: ASF GitHub Bot Created on: 25/Nov/20 13:32 Start Date: 25/Nov/20 13:32 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #1693: URL: https://github.com/apache/hive/pull/1693#discussion_r530375823 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java ## @@ -531,29 +531,26 @@ protected Boolean findNextCompactionAndExecute(boolean computeStats) throws Inte final StatsUpdater su = computeStats ? StatsUpdater.init(ci, msc.findColumnsWithStats( CompactionInfo.compactionInfoToStruct(ci)), conf, runJobAsSelf(ci.runAs) ? ci.runAs : t.getOwner()) : null; - final CompactorMR mr = new CompactorMR(); + try { -if (runJobAsSelf(ci.runAs)) { - mr.run(conf, jobName.toString(), t, p, sd, tblValidWriteIds, ci, su, msc, dir); +failCompactionIfSetForTest(); Review comment: Yes, it's test code and it used to be in CompactorMr#run, I just refactored it here. Great question, shall I raise a jira for it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516681) Time Spent: 2h 20m (was: 2h 10m) > Query-based compaction hangs because of doAs > > > Key: HIVE-24410 > URL: https://issues.apache.org/jira/browse/HIVE-24410 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > QB compaction runs within a doas +and+ hive.server2.enable.doAs is set to > true (as of HIVE-24089). On a secure cluster with Worker threads running in > HS2, this results in HMS client not receiving a login context during > compaction queries, so kerberos prompts for a login via stdin which causes > the worker thread to hang until it times out: > {code:java} > "node-x.com-44_executor" #47 daemon prio=1 os_prio=0 tid=0x01506000 > nid=0x1348 runnable [0x7f1beea95000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x9fa38c90> (a java.io.BufferedInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0x8c7d5010> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.readLine(BufferedReader.java:324) > - locked <0x8c7d5010> (a java.io.InputStreamReader) > at java.io.BufferedReader.readLine(BufferedReader.java:389) > at > com.sun.security.auth.callback.TextCallbackHandler.readLine(TextCallbackHandler.java:153) > at > com.sun.security.auth.callback.TextCallbackHandler.handle(TextCallbackHandler.java:120) > at > com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:862) > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:708) > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) >
[jira] [Assigned] (HIVE-24428) Concurrent add_partitions requests may lead to data loss
[ https://issues.apache.org/jira/browse/HIVE-24428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-24428: --- > Concurrent add_partitions requests may lead to data loss > > > Key: HIVE-24428 > URL: https://issues.apache.org/jira/browse/HIVE-24428 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > in case multiple clients are adding partitions to the same table - when the > same partition is being added there is a chance that the data dir is removed > after the other client have already written its data > https://github.com/apache/hive/blob/5e96b14a2357c66a0640254d5414bc706d8be852/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3958 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24410) Query-based compaction hangs because of doAs
[ https://issues.apache.org/jira/browse/HIVE-24410?focusedWorklogId=516677&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516677 ] ASF GitHub Bot logged work on HIVE-24410: - Author: ASF GitHub Bot Created on: 25/Nov/20 13:27 Start Date: 25/Nov/20 13:27 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1693: URL: https://github.com/apache/hive/pull/1693#discussion_r530372646 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java ## @@ -531,29 +531,26 @@ protected Boolean findNextCompactionAndExecute(boolean computeStats) throws Inte final StatsUpdater su = computeStats ? StatsUpdater.init(ci, msc.findColumnsWithStats( CompactionInfo.compactionInfoToStruct(ci)), conf, runJobAsSelf(ci.runAs) ? ci.runAs : t.getOwner()) : null; - final CompactorMR mr = new CompactorMR(); + try { -if (runJobAsSelf(ci.runAs)) { - mr.run(conf, jobName.toString(), t, p, sd, tblValidWriteIds, ci, su, msc, dir); +failCompactionIfSetForTest(); Review comment: Is this test code? Could we find another way to test this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516677) Time Spent: 2h 10m (was: 2h) > Query-based compaction hangs because of doAs > > > Key: HIVE-24410 > URL: https://issues.apache.org/jira/browse/HIVE-24410 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > QB compaction runs within a doas +and+ hive.server2.enable.doAs is set to > true (as of HIVE-24089). On a secure cluster with Worker threads running in > HS2, this results in HMS client not receiving a login context during > compaction queries, so kerberos prompts for a login via stdin which causes > the worker thread to hang until it times out: > {code:java} > "node-x.com-44_executor" #47 daemon prio=1 os_prio=0 tid=0x01506000 > nid=0x1348 runnable [0x7f1beea95000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x9fa38c90> (a java.io.BufferedInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0x8c7d5010> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.readLine(BufferedReader.java:324) > - locked <0x8c7d5010> (a java.io.InputStreamReader) > at java.io.BufferedReader.readLine(BufferedReader.java:389) > at > com.sun.security.auth.callback.TextCallbackHandler.readLine(TextCallbackHandler.java:153) > at > com.sun.security.auth.callback.TextCallbackHandler.handle(TextCallbackHandler.java:120) > at > com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:862) > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:708) > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) > at java.security.AccessController.doPrivileged(Native Method) > a
[jira] [Work logged] (HIVE-24274) Implement Query Text based MaterializedView rewrite
[ https://issues.apache.org/jira/browse/HIVE-24274?focusedWorklogId=516640&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516640 ] ASF GitHub Bot logged work on HIVE-24274: - Author: ASF GitHub Bot Created on: 25/Nov/20 12:22 Start Date: 25/Nov/20 12:22 Worklog Time Spent: 10m Work Description: kasakrisz opened a new pull request #1706: URL: https://github.com/apache/hive/pull/1706 ### What changes were proposed in this pull request? * Add feature: Enable materialized view rewrite of a query if the query text is the same as the query defined in the materialized view. * Enable unparsing for all queries in order to generate the expanded query text for comparison. * Refactor and extend the `HiveMaterializedViewsRegistry` with the lookup by query text functionality. ### Why are the changes needed? This patch provides an alternative way to rewrite queries using materialized views. Materialized view query definitions has some limitations like can't have `UNION`, `SORT BY` operator. These are enabled when using the text based rewrite. ### Does this PR introduce _any_ user-facing change? In some cases when rewrite was not possible because of the limitations mentioned above. With this patch the rewriting will be executed and it will have an effect of the output of `EXPLAIN`, `EXPLAIN CBO` commands: instead of the original query plan a scan on the materialized view will appear. ### How was this patch tested? ``` mvn test -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=mv_rewrite_by_text.q,masking_14.q,masking_mv.q,schq_materialized.q,sketches_materialized_view_safety.q -pl itests/qtest -Pitests mvn test -Dtest=TestMaterializedViewsCache -pl ql ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516640) Time Spent: 20m (was: 10m) > Implement Query Text based MaterializedView rewrite > --- > > Key: HIVE-24274 > URL: https://issues.apache.org/jira/browse/HIVE-24274 > Project: Hive > Issue Type: Improvement >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Besides the way queries are currently rewritten to use materialized views in > Hive this project provides an alternative: > Compare the query text with the materialized views query text stored. If we > found a match the original query's logical plan can be replaced by a scan on > the materialized view. > - Only materialized views which are enabled to rewrite can participate > - Use existing *HiveMaterializedViewsRegistry* through *Hive* object by > adding a lookup method by query text. > - There might be more than one materialized views which have the same query > text. In this case chose the first valid one. > - Validation can be done by calling > *Hive.validateMaterializedViewsFromRegistry()* > - The scope of this first patch is rewriting queries which entire text can be > matched only. > - Use the expanded query text (fully qualified column and table names) for > comparing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24410) Query-based compaction hangs because of doAs
[ https://issues.apache.org/jira/browse/HIVE-24410?focusedWorklogId=516639&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516639 ] ASF GitHub Bot logged work on HIVE-24410: - Author: ASF GitHub Bot Created on: 25/Nov/20 12:20 Start Date: 25/Nov/20 12:20 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #1693: URL: https://github.com/apache/hive/pull/1693#discussion_r530332125 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java ## @@ -590,6 +587,36 @@ public Object run() throws Exception { return true; } + private void failCompactionIfSetForTest() { +if(conf.getBoolVar(HiveConf.ConfVars.HIVE_IN_TEST) && conf.getBoolVar(HiveConf.ConfVars.HIVETESTMODEFAILCOMPACTION)) { + throw new RuntimeException(HiveConf.ConfVars.HIVETESTMODEFAILCOMPACTION.name() + "=true"); +} + } + + private void runCompactionViaMrJob(CompactionInfo ci, Table t, Partition p, StorageDescriptor sd, + ValidCompactorWriteIdList tblValidWriteIds, StringBuilder jobName, AcidUtils.Directory dir, StatsUpdater su) + throws IOException, HiveException, InterruptedException { +final CompactorMR mr = new CompactorMR(); +if (runJobAsSelf(ci.runAs)) { + mr.run(conf, jobName.toString(), t, p, sd, tblValidWriteIds, ci, su, msc, dir); +} else { + UserGroupInformation ugi = UserGroupInformation.createProxyUser(ci.runAs, + UserGroupInformation.getLoginUser()); + final Partition fp = p; Review comment: Done! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516639) Time Spent: 2h (was: 1h 50m) > Query-based compaction hangs because of doAs > > > Key: HIVE-24410 > URL: https://issues.apache.org/jira/browse/HIVE-24410 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > QB compaction runs within a doas +and+ hive.server2.enable.doAs is set to > true (as of HIVE-24089). On a secure cluster with Worker threads running in > HS2, this results in HMS client not receiving a login context during > compaction queries, so kerberos prompts for a login via stdin which causes > the worker thread to hang until it times out: > {code:java} > "node-x.com-44_executor" #47 daemon prio=1 os_prio=0 tid=0x01506000 > nid=0x1348 runnable [0x7f1beea95000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x9fa38c90> (a java.io.BufferedInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0x8c7d5010> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.readLine(BufferedReader.java:324) > - locked <0x8c7d5010> (a java.io.InputStreamReader) > at java.io.BufferedReader.readLine(BufferedReader.java:389) > at > com.sun.security.auth.callback.TextCallbackHandler.readLine(TextCallbackHandler.java:153) > at > com.sun.security.auth.callback.TextCallbackHandler.handle(TextCallbackHandler.java:120) > at > com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:862) > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:708) > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > javax.security.auth.login.
[jira] [Updated] (HIVE-24383) Add Table type to HPL/SQL
[ https://issues.apache.org/jira/browse/HIVE-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24383: - Parent: HIVE-24427 Issue Type: Sub-task (was: Improvement) > Add Table type to HPL/SQL > - > > Key: HIVE-24383 > URL: https://issues.apache.org/jira/browse/HIVE-24383 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24315) Improve validation and semantic analysis in HPL/SQL
[ https://issues.apache.org/jira/browse/HIVE-24315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24315: - Parent: HIVE-24427 Issue Type: Sub-task (was: Improvement) > Improve validation and semantic analysis in HPL/SQL > > > Key: HIVE-24315 > URL: https://issues.apache.org/jira/browse/HIVE-24315 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > > There are some known issues that need to be fixed. For example it seems that > arity of a function is not checked when calling it, and same is true for > parameter types. Calling an undefined function is evaluated to null and > sometimes it seems that incorrect syntax is silently ignored. > In cases like this a helpful error message would be expected, thought we > should also consider how PL/SQL works and maintain compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24346) Store HPL/SQL packages into HMS
[ https://issues.apache.org/jira/browse/HIVE-24346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24346: - Parent: HIVE-24427 Issue Type: Sub-task (was: New Feature) > Store HPL/SQL packages into HMS > --- > > Key: HIVE-24346 > URL: https://issues.apache.org/jira/browse/HIVE-24346 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures
[ https://issues.apache.org/jira/browse/HIVE-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24217: - Parent: HIVE-24427 Issue Type: Sub-task (was: Bug) > HMS storage backend for HPL/SQL stored procedures > - > > Key: HIVE-24217 > URL: https://issues.apache.org/jira/browse/HIVE-24217 > Project: Hive > Issue Type: Sub-task > Components: Hive, hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HPL_SQL storedproc HMS storage.pdf > > Time Spent: 4h 50m > Remaining Estimate: 0h > > HPL/SQL procedures are currently stored in text files. The goal of this Jira > is to implement a Metastore backend for storing and loading these procedures. > This is an incremental step towards having fully capable stored procedures in > Hive. > > See the attached design for more information. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24230) Integrate HPL/SQL into HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24230: - Parent: HIVE-24427 Issue Type: Sub-task (was: Bug) > Integrate HPL/SQL into HiveServer2 > -- > > Key: HIVE-24230 > URL: https://issues.apache.org/jira/browse/HIVE-24230 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > HPL/SQL is a standalone command line program that can store and load scripts > from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL > depends on Hive and not the other way around. > Changing the dependency order between HPL/SQL and HiveServer would open up > some possibilities which are currently not feasable to implement. For example > one might want to use a third party SQL tool to run selects on stored > procedure (or rather function in this case) outputs. > {code:java} > SELECT * from myStoredProcedure(1, 2); {code} > HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not > work with the current architecture. > Another important factor is performance. Declarative SQL commands are sent to > Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC > and use HiveSever’s internal API for compilation and execution. > The third factor is that existing tools like Beeline or Hue cannot be used > with HPL/SQL since it has its own, separated CLI. > > To make it easier to implement, we keep things separated in the inside at > first, by introducing a hive session level JDBC parameter. > {code:java} > jdbc:hive2://localhost:1/default;hplsqlMode=true {code} > > The hplsqlMode indicates that we are in procedural SQL mode where the user > can create and call stored procedures. HPLSQL allows you to write any kind of > procedural statement at the top level. This patch doesn't limit this but it > might be better to eventually restrict what statements are allowed outside of > stored procedures. > > Since HPLSQL and Hive are running in the same process there is no need to use > the JDBC driver between them. The patch adds an abstraction with 2 different > implementations, one for executing queries on JDBC (for keeping the existing > behaviour) and another one for directly calling Hive's compiler. In HPLSQL > mode the latter is used. > In the inside a new operation (HplSqlOperation) and operation type > (PROCEDURAL_SQL) was added which works similar to the SQLOperation but it > uses the hplsql interpreter to execute arbitrary scripts. This operation > might spawns new SQLOpertions. > For example consider the following statement: > {code:java} > FOR i in 1..10 LOOP > SELECT * FROM table > END LOOP;{code} > We send this to beeline while we'er in hplsql mode. Hive will create a hplsql > interpreter and store it in the session state. A new HplSqlOperation is > created to run the script on the interpreter. > HPLSQL knows how to execute the for loop, but i'll call Hive to run the > select expression. The HplSqlOperation is notified when the select reads a > row and accumulates the rows into a RowSet (memory consumption need to be > considered here) which can be retrieved via thrift from the client side. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24427) HPL/SQL improvements
[ https://issues.apache.org/jira/browse/HIVE-24427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-24427: > HPL/SQL improvements > > > Key: HIVE-24427 > URL: https://issues.apache.org/jira/browse/HIVE-24427 > Project: Hive > Issue Type: Improvement > Components: hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24410) Query-based compaction hangs because of doAs
[ https://issues.apache.org/jira/browse/HIVE-24410?focusedWorklogId=516610&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516610 ] ASF GitHub Bot logged work on HIVE-24410: - Author: ASF GitHub Bot Created on: 25/Nov/20 11:06 Start Date: 25/Nov/20 11:06 Worklog Time Spent: 10m Work Description: pvargacl commented on pull request #1693: URL: https://github.com/apache/hive/pull/1693#issuecomment-733638721 LGTM +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516610) Time Spent: 1h 50m (was: 1h 40m) > Query-based compaction hangs because of doAs > > > Key: HIVE-24410 > URL: https://issues.apache.org/jira/browse/HIVE-24410 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > QB compaction runs within a doas +and+ hive.server2.enable.doAs is set to > true (as of HIVE-24089). On a secure cluster with Worker threads running in > HS2, this results in HMS client not receiving a login context during > compaction queries, so kerberos prompts for a login via stdin which causes > the worker thread to hang until it times out: > {code:java} > "node-x.com-44_executor" #47 daemon prio=1 os_prio=0 tid=0x01506000 > nid=0x1348 runnable [0x7f1beea95000] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x9fa38c90> (a java.io.BufferedInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0x8c7d5010> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.readLine(BufferedReader.java:324) > - locked <0x8c7d5010> (a java.io.InputStreamReader) > at java.io.BufferedReader.readLine(BufferedReader.java:389) > at > com.sun.security.auth.callback.TextCallbackHandler.readLine(TextCallbackHandler.java:153) > at > com.sun.security.auth.callback.TextCallbackHandler.handle(TextCallbackHandler.java:120) > at > com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:862) > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:708) > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) > at javax.security.auth.login.LoginContext.login(LoginContext.java:587) > at sun.security.jgss.GSSUtil.login(GSSUtil.java:258) > at sun.security.jgss.krb5.Krb5Util.getInitialTicket(Krb5Util.java:175) > at > sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:341) > at > sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:337) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:336) > at > sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredentia
[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version
[ https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516609&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516609 ] ASF GitHub Bot logged work on HIVE-24403: - Author: ASF GitHub Bot Created on: 25/Nov/20 10:58 Start Date: 25/Nov/20 10:58 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1688: URL: https://github.com/apache/hive/pull/1688#discussion_r530284495 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java ## @@ -186,6 +186,19 @@ public boolean isDeadlock(SQLException e) { || e.getMessage().contains("can't serialize access for this transaction"; } + /** + * Is the given exception a table not found exception + * @param e Exception + * @return + */ + public boolean isTableNotExists(SQLException e) { Review comment: maybe rename to `isTableNotExistsError` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516609) Time Spent: 3h (was: 2h 50m) > change min_history_level schema change to be compatible with previous version > - > > Key: HIVE-24403 > URL: https://issues.apache.org/jira/browse/HIVE-24403 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > In some configurations the HMS backend DB is used by HMS services with > different versions. > HIVE-23107 dropped the min_history_level table from the backend DB making > the new schema version incompatible with the older HMS services. > It is possible to modify that change to keep the compatibility > * Keep the min_history_level table > * Add the new fields for the compaction_queue the same way > * Create a feature flag for min_history_level and if it is on > * Keep the logic inserting to the table during openTxn > * Keep the logic removing the records at commitTxn and abortTxn > * Change the logic in the cleaner, to get the highwatermark the old way > * But still change it to not start the cleaning before that > * The txn_to_write_id table cleaning can work the new way in the new version > and the old way in the old version > * This feature flag can be automatically setup based on the existence of the > min_history level table, this way if the table will be dropped all HMS-s can > switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24249) Create View fails if a materialized view exists with the same query
[ https://issues.apache.org/jira/browse/HIVE-24249?focusedWorklogId=516606&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516606 ] ASF GitHub Bot logged work on HIVE-24249: - Author: ASF GitHub Bot Created on: 25/Nov/20 10:50 Start Date: 25/Nov/20 10:50 Worklog Time Spent: 10m Work Description: kasakrisz merged pull request #1696: URL: https://github.com/apache/hive/pull/1696 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516606) Time Spent: 20m (was: 10m) > Create View fails if a materialized view exists with the same query > --- > > Key: HIVE-24249 > URL: https://issues.apache.org/jira/browse/HIVE-24249 > Project: Hive > Issue Type: Bug >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > {code:java} > create table t1(col0 int) STORED AS ORC > TBLPROPERTIES ('transactional'='true'); > create materialized view mv1 as > select * from t1 where col0 > 2; > create view v1 as > select sub.* from (select * from t1 where col0 > 2) sub > where sub.col0 = 10; > {code} > The planner realize that the view definition has a subquery which match the > materialized view query and replaces it to the materialized view scan. > {code:java} > HiveProject($f0=[CAST(10):INTEGER]) > HiveFilter(condition=[=(10, $0)]) > HiveTableScan(table=[[default, mv1]], table:alias=[default.mv1]) > {code} > Then exception is thrown: > {code:java} > org.apache.hadoop.hive.ql.parse.SemanticException: View definition > references materialized view default.mv1 > at > org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.validateCreateView(CreateViewAnalyzer.java:211) > at > org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.analyzeInternal(CreateViewAnalyzer.java:99) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:174) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:415) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:364) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:358) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) > at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744) > at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) > at > org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at
[jira] [Resolved] (HIVE-24249) Create View fails if a materialized view exists with the same query
[ https://issues.apache.org/jira/browse/HIVE-24249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa resolved HIVE-24249. --- Resolution: Fixed Pushed to master, thanks [~jcamachorodriguez] for review. > Create View fails if a materialized view exists with the same query > --- > > Key: HIVE-24249 > URL: https://issues.apache.org/jira/browse/HIVE-24249 > Project: Hive > Issue Type: Bug >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > {code:java} > create table t1(col0 int) STORED AS ORC > TBLPROPERTIES ('transactional'='true'); > create materialized view mv1 as > select * from t1 where col0 > 2; > create view v1 as > select sub.* from (select * from t1 where col0 > 2) sub > where sub.col0 = 10; > {code} > The planner realize that the view definition has a subquery which match the > materialized view query and replaces it to the materialized view scan. > {code:java} > HiveProject($f0=[CAST(10):INTEGER]) > HiveFilter(condition=[=(10, $0)]) > HiveTableScan(table=[[default, mv1]], table:alias=[default.mv1]) > {code} > Then exception is thrown: > {code:java} > org.apache.hadoop.hive.ql.parse.SemanticException: View definition > references materialized view default.mv1 > at > org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.validateCreateView(CreateViewAnalyzer.java:211) > at > org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.analyzeInternal(CreateViewAnalyzer.java:99) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:174) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:415) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:364) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:358) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) > at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744) > at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) > at > org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at or
[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version
[ https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516604&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516604 ] ASF GitHub Bot logged work on HIVE-24403: - Author: ASF GitHub Bot Created on: 25/Nov/20 10:46 Start Date: 25/Nov/20 10:46 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1688: URL: https://github.com/apache/hive/pull/1688#discussion_r530268290 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -1166,6 +1166,55 @@ public long findMinOpenTxnIdForCleaner() throws MetaException { } } + /** + * Returns the min txnid seen open by any active transaction + * @deprecated remove when min_history_level is dropped + * @return txnId + * @throws MetaException ex + */ + @Override + @RetrySemantics.Idempotent + @Deprecated + public long findMinTxnIdSeenOpen() throws MetaException { +if (!useMinHistoryLevel) { + return -1L; +} +Connection dbConn = null; +try { + try { +dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED); +long minOpenTxn; +try (Statement stmt = dbConn.createStatement()) { + try (ResultSet rs = stmt.executeQuery("SELECT MIN(\"MHL_MIN_OPEN_TXNID\") FROM \"MIN_HISTORY_LEVEL\"")) { +if (!rs.next()) { + throw new IllegalStateException("Scalar query returned no rows?!"); Review comment: is this even possible? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516604) Time Spent: 2h 50m (was: 2h 40m) > change min_history_level schema change to be compatible with previous version > - > > Key: HIVE-24403 > URL: https://issues.apache.org/jira/browse/HIVE-24403 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > In some configurations the HMS backend DB is used by HMS services with > different versions. > HIVE-23107 dropped the min_history_level table from the backend DB making > the new schema version incompatible with the older HMS services. > It is possible to modify that change to keep the compatibility > * Keep the min_history_level table > * Add the new fields for the compaction_queue the same way > * Create a feature flag for min_history_level and if it is on > * Keep the logic inserting to the table during openTxn > * Keep the logic removing the records at commitTxn and abortTxn > * Change the logic in the cleaner, to get the highwatermark the old way > * But still change it to not start the cleaning before that > * The txn_to_write_id table cleaning can work the new way in the new version > and the old way in the old version > * This feature flag can be automatically setup based on the existence of the > min_history level table, this way if the table will be dropped all HMS-s can > switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version
[ https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516600&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516600 ] ASF GitHub Bot logged work on HIVE-24403: - Author: ASF GitHub Bot Created on: 25/Nov/20 10:32 Start Date: 25/Nov/20 10:32 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1688: URL: https://github.com/apache/hive/pull/1688#discussion_r530268290 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -1166,6 +1166,55 @@ public long findMinOpenTxnIdForCleaner() throws MetaException { } } + /** + * Returns the min txnid seen open by any active transaction + * @deprecated remove when min_history_level is dropped + * @return txnId + * @throws MetaException ex + */ + @Override + @RetrySemantics.Idempotent + @Deprecated + public long findMinTxnIdSeenOpen() throws MetaException { +if (!useMinHistoryLevel) { + return -1L; +} +Connection dbConn = null; +try { + try { +dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED); +long minOpenTxn; +try (Statement stmt = dbConn.createStatement()) { + try (ResultSet rs = stmt.executeQuery("SELECT MIN(\"MHL_MIN_OPEN_TXNID\") FROM \"MIN_HISTORY_LEVEL\"")) { +if (!rs.next()) { + throw new IllegalStateException("Scalar query returned no rows?!"); Review comment: is this even possible? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516600) Time Spent: 2h 40m (was: 2.5h) > change min_history_level schema change to be compatible with previous version > - > > Key: HIVE-24403 > URL: https://issues.apache.org/jira/browse/HIVE-24403 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > In some configurations the HMS backend DB is used by HMS services with > different versions. > HIVE-23107 dropped the min_history_level table from the backend DB making > the new schema version incompatible with the older HMS services. > It is possible to modify that change to keep the compatibility > * Keep the min_history_level table > * Add the new fields for the compaction_queue the same way > * Create a feature flag for min_history_level and if it is on > * Keep the logic inserting to the table during openTxn > * Keep the logic removing the records at commitTxn and abortTxn > * Change the logic in the cleaner, to get the highwatermark the old way > * But still change it to not start the cleaning before that > * The txn_to_write_id table cleaning can work the new way in the new version > and the old way in the old version > * This feature flag can be automatically setup based on the existence of the > min_history level table, this way if the table will be dropped all HMS-s can > switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version
[ https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516599&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516599 ] ASF GitHub Bot logged work on HIVE-24403: - Author: ASF GitHub Bot Created on: 25/Nov/20 10:31 Start Date: 25/Nov/20 10:31 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1688: URL: https://github.com/apache/hive/pull/1688#discussion_r530238192 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java ## @@ -232,15 +232,23 @@ public static void cleanDb(Configuration conf) throws Exception { success &= truncateTable(conn, conf, stmt, "WRITE_SET"); success &= truncateTable(conn, conf, stmt, "REPL_TXN_MAP"); success &= truncateTable(conn, conf, stmt, "MATERIALIZATION_REBUILD_LOCKS"); + success &= truncateTable(conn, conf, stmt, "MIN_HISTORY_LEVEL"); try { -resetTxnSequence(conn, conf, stmt); -stmt.executeUpdate("INSERT INTO \"NEXT_LOCK_ID\" VALUES(1)"); -stmt.executeUpdate("INSERT INTO \"NEXT_COMPACTION_QUEUE_ID\" VALUES(1)"); - } catch (SQLException e) { -if (!getTableNotExistsErrorCodes().contains(e.getSQLState())) { - LOG.error("Error initializing sequence values", e); - success = false; +String dbProduct = conn.getMetaData().getDatabaseProductName(); +DatabaseProduct databaseProduct = determineDatabaseProduct(dbProduct, conf); +try { + resetTxnSequence(databaseProduct, stmt); + stmt.executeUpdate("INSERT INTO \"NEXT_LOCK_ID\" VALUES(1)"); + stmt.executeUpdate("INSERT INTO \"NEXT_COMPACTION_QUEUE_ID\" VALUES(1)"); +} catch (SQLException e) { + if (!databaseProduct.isTableNotExists(e)) { Review comment: Previous version was much more readable and concise. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516599) Time Spent: 2.5h (was: 2h 20m) > change min_history_level schema change to be compatible with previous version > - > > Key: HIVE-24403 > URL: https://issues.apache.org/jira/browse/HIVE-24403 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > In some configurations the HMS backend DB is used by HMS services with > different versions. > HIVE-23107 dropped the min_history_level table from the backend DB making > the new schema version incompatible with the older HMS services. > It is possible to modify that change to keep the compatibility > * Keep the min_history_level table > * Add the new fields for the compaction_queue the same way > * Create a feature flag for min_history_level and if it is on > * Keep the logic inserting to the table during openTxn > * Keep the logic removing the records at commitTxn and abortTxn > * Change the logic in the cleaner, to get the highwatermark the old way > * But still change it to not start the cleaning before that > * The txn_to_write_id table cleaning can work the new way in the new version > and the old way in the old version > * This feature flag can be automatically setup based on the existence of the > min_history level table, this way if the table will be dropped all HMS-s can > switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24389) Trailing zeros of constant decimal numbers are removed
[ https://issues.apache.org/jira/browse/HIVE-24389?focusedWorklogId=516598&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516598 ] ASF GitHub Bot logged work on HIVE-24389: - Author: ASF GitHub Bot Created on: 25/Nov/20 10:29 Start Date: 25/Nov/20 10:29 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #1676: URL: https://github.com/apache/hive/pull/1676#discussion_r530266544 ## File path: ql/src/test/results/clientpositive/llap/materialized_view_rewrite_window.q.out ## @@ -166,7 +166,7 @@ POSTHOOK: Input: arc_view@wealth A masked pattern was here CBO PLAN: HiveSortLimit(sort0=[$0], dir0=[ASC]) - HiveProject(quartile=[$0], total=[$1]) + HiveProject(quartile=[CAST($0):DECIMAL(12, 1)], total=[$1]) Review comment: The optimiziter rewrites this query to use materialized view `mv_tv_view_data_av1`. The plan of the mv with this patch is changed from ``` HiveTableScan(table=[[arc_view, mv_tv_view_data_av1]], table:alias=[mv_tv_view_data_av1]) ``` to ``` LogicalProject(quartile=[CAST($0):DECIMAL(12, 1)], total=[$1]) HiveTableScan(table=[[arc_view, mv_tv_view_data_av1]], table:alias=[mv_tv_view_data_av1]) ``` The mv definition contains a constant value cast to Decimal `cast(1.5 as decimal(9,4))` ``` create materialized view mv_tv_view_data_av1 stored as orc TBLPROPERTIES ('transactional'='true') as select t.quartile, max(t.total_views) total from wealth t2, (select total_views `total_views`, sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile, program from tv_view_data) t where t.program=t2.watches group by quartile; ``` We need the project with the cast on top of the mv scan because the mv table schema is different than the query schema. RowTypes after the patch ``` viewscan rowType: RecordType(DECIMAL(12, 4) quartile, BIGINT total) queryRel rowType: RecordType(DECIMAL(12, 1) quartile, BIGINT $f1) ``` before the patch ``` viewscan rowType: RecordType(DECIMAL(12, 1) quartile, BIGINT total) queryRel rowType: RecordType(DECIMAL(12, 1) quartile, BIGINT $f1) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516598) Time Spent: 1h 50m (was: 1h 40m) > Trailing zeros of constant decimal numbers are removed > -- > > Key: HIVE-24389 > URL: https://issues.apache.org/jira/browse/HIVE-24389 > Project: Hive > Issue Type: Bug > Components: Types >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > In some case Hive removes trailing zeros of constant decimal numbers > {code} > select cast(1.1 as decimal(22, 2)) > 1.1 > {code} > In this case *WritableConstantHiveDecimalObjectInspector* is used and this > object inspector takes it's wrapped HiveDecimal scale instead of the scale > specified in the wrapped typeinfo: > {code} > this = {WritableConstantHiveDecimalObjectInspector@14415} > value = {HiveDecimalWritable@14426} "1.1" > typeInfo = {DecimalTypeInfo@14421} "decimal(22,2)"{code} > However in case of an expression with an aggregate function > *WritableHiveDecimalObjectInspector* is used > {code} > select cast(sum(1.1) as decimal(22, 2)) > 1.10 > {code} > {code} > o = {HiveDecimalWritable@16633} "1.1" > oi = {WritableHiveDecimalObjectInspector@16634} > typeInfo = {DecimalTypeInfo@16640} "decimal(22,2)" > {code} > Casting the expressions to string > {code:java} > select cast(cast(1.1 as decimal(22, 2)) as string), cast(cast(sum(1.1) as > decimal(22, 2)) as string) > 1.1 1.10 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version
[ https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516596&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516596 ] ASF GitHub Bot logged work on HIVE-24403: - Author: ASF GitHub Bot Created on: 25/Nov/20 10:27 Start Date: 25/Nov/20 10:27 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1688: URL: https://github.com/apache/hive/pull/1688#discussion_r530250594 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -670,6 +725,8 @@ public OpenTxnsResponse openTxns(OpenTxnRequest rqst) throws MetaException { assert txnIds.size() == numTxns; + addTxnToMinHistoryLevel(dbConn, txnIds, minOpenTxnId); Review comment: why not to embed getMinOpenTxnIdWaterMark(dbConn) inside of addTxnToMinHistoryLevel and remove above minOpenTxnId block? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516596) Time Spent: 2h 20m (was: 2h 10m) > change min_history_level schema change to be compatible with previous version > - > > Key: HIVE-24403 > URL: https://issues.apache.org/jira/browse/HIVE-24403 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > In some configurations the HMS backend DB is used by HMS services with > different versions. > HIVE-23107 dropped the min_history_level table from the backend DB making > the new schema version incompatible with the older HMS services. > It is possible to modify that change to keep the compatibility > * Keep the min_history_level table > * Add the new fields for the compaction_queue the same way > * Create a feature flag for min_history_level and if it is on > * Keep the logic inserting to the table during openTxn > * Keep the logic removing the records at commitTxn and abortTxn > * Change the logic in the cleaner, to get the highwatermark the old way > * But still change it to not start the cleaning before that > * The txn_to_write_id table cleaning can work the new way in the new version > and the old way in the old version > * This feature flag can be automatically setup based on the existence of the > min_history level table, this way if the table will be dropped all HMS-s can > switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24245) Vectorized PTF with count and distinct over partition producing incorrect results.
[ https://issues.apache.org/jira/browse/HIVE-24245?focusedWorklogId=516595&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516595 ] ASF GitHub Bot logged work on HIVE-24245: - Author: ASF GitHub Bot Created on: 25/Nov/20 10:26 Start Date: 25/Nov/20 10:26 Worklog Time Spent: 10m Work Description: pgaref commented on pull request #1649: URL: https://github.com/apache/hive/pull/1649#issuecomment-733617093 Thanks for the update @abstractdog ! +1 tests pending This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516595) Time Spent: 1h 40m (was: 1.5h) > Vectorized PTF with count and distinct over partition producing incorrect > results. > -- > > Key: HIVE-24245 > URL: https://issues.apache.org/jira/browse/HIVE-24245 > Project: Hive > Issue Type: Bug > Components: Hive, PTF-Windowing, Vectorization >Affects Versions: 3.1.0, 3.1.2 >Reporter: Chiran Ravani >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Vectorized PTF for count and distinct over partition is broken. It produces > incorrect results. > Below is the test case. > {code} > CREATE TABLE bigd781b_new ( > id int, > txt1 string, > txt2 string, > cda_date int, > cda_job_name varchar(12)); > INSERT INTO bigd781b_new VALUES > (1,'2010005759','7164335675012038',20200528,'load1'), > (2,'2010005759','7164335675012038',20200528,'load2'); > {code} > Running below query produces incorrect results > {code} > SELECT > txt1, > txt2, > count(distinct txt1) over(partition by txt1) as n, > count(distinct txt2) over(partition by txt2) as m > FROM bigd781b_new > {code} > as below. > {code} > +-+---+++ > |txt1 | txt2| n | m | > +-+---+++ > | 2010005759 | 7164335675012038 | 2 | 2 | > | 2010005759 | 7164335675012038 | 2 | 2 | > +-+---+++ > {code} > While the correct output would be > {code} > +-+---+++ > |txt1 | txt2| n | m | > +-+---+++ > | 2010005759 | 7164335675012038 | 1 | 1 | > | 2010005759 | 7164335675012038 | 1 | 1 | > +-+---+++ > {code} > The problem does not appear after setting below property > set hive.vectorized.execution.ptf.enabled=false; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version
[ https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516594&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516594 ] ASF GitHub Bot logged work on HIVE-24403: - Author: ASF GitHub Bot Created on: 25/Nov/20 10:26 Start Date: 25/Nov/20 10:26 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1688: URL: https://github.com/apache/hive/pull/1688#discussion_r530250594 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -670,6 +725,8 @@ public OpenTxnsResponse openTxns(OpenTxnRequest rqst) throws MetaException { assert txnIds.size() == numTxns; + addTxnToMinHistoryLevel(dbConn, txnIds, minOpenTxnId); Review comment: why not to put this under above useMinHistoryLevel check? or even embed getMinOpenTxnIdWaterMark(dbConn) inside of addTxnToMinHistoryLevel? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516594) Time Spent: 2h 10m (was: 2h) > change min_history_level schema change to be compatible with previous version > - > > Key: HIVE-24403 > URL: https://issues.apache.org/jira/browse/HIVE-24403 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > In some configurations the HMS backend DB is used by HMS services with > different versions. > HIVE-23107 dropped the min_history_level table from the backend DB making > the new schema version incompatible with the older HMS services. > It is possible to modify that change to keep the compatibility > * Keep the min_history_level table > * Add the new fields for the compaction_queue the same way > * Create a feature flag for min_history_level and if it is on > * Keep the logic inserting to the table during openTxn > * Keep the logic removing the records at commitTxn and abortTxn > * Change the logic in the cleaner, to get the highwatermark the old way > * But still change it to not start the cleaning before that > * The txn_to_write_id table cleaning can work the new way in the new version > and the old way in the old version > * This feature flag can be automatically setup based on the existence of the > min_history level table, this way if the table will be dropped all HMS-s can > switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-23965: --- Assignee: Stamatis Zampetakis (was: Zoltan Haindrich) > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: master355.tgz > > Time Spent: 5h 20m > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-23965: Attachment: master355.tgz > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: master355.tgz > > Time Spent: 5h 20m > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reopened HIVE-23965: - I've reverted the patch for now because it have exposed some issue with our test environment (master builds have stuck) > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: master355.tgz > > Time Spent: 5h 20m > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-23965: --- Assignee: Zoltan Haindrich (was: Stamatis Zampetakis) > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: master355.tgz > > Time Spent: 5h 20m > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24245) Vectorized PTF with count and distinct over partition producing incorrect results.
[ https://issues.apache.org/jira/browse/HIVE-24245?focusedWorklogId=516592&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516592 ] ASF GitHub Bot logged work on HIVE-24245: - Author: ASF GitHub Bot Created on: 25/Nov/20 10:21 Start Date: 25/Nov/20 10:21 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1649: URL: https://github.com/apache/hive/pull/1649#discussion_r530261010 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java ## @@ -180,14 +181,13 @@ public void iterate(AggregationBuffer agg, Object[] parameters) if (((CountAgg) agg).uniqueObjects == null) { ((CountAgg) agg).uniqueObjects = new HashSet(); } - HashSet uniqueObjs = ((CountAgg) agg).uniqueObjects; + Set uniqueObjs = ((CountAgg) agg).uniqueObjects; ObjectInspectorObject obj = new ObjectInspectorObject( ObjectInspectorUtils.copyToStandardObject(parameters, inputOI, ObjectInspectorCopyOption.JAVA), outputOI); - if (!uniqueObjs.contains(obj)) { -uniqueObjs.add(obj); - } else { + boolean inserted = uniqueObjs.add(obj); + if (!inserted){ Review comment: Thanks for fixing this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516592) Time Spent: 1h 20m (was: 1h 10m) > Vectorized PTF with count and distinct over partition producing incorrect > results. > -- > > Key: HIVE-24245 > URL: https://issues.apache.org/jira/browse/HIVE-24245 > Project: Hive > Issue Type: Bug > Components: Hive, PTF-Windowing, Vectorization >Affects Versions: 3.1.0, 3.1.2 >Reporter: Chiran Ravani >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Vectorized PTF for count and distinct over partition is broken. It produces > incorrect results. > Below is the test case. > {code} > CREATE TABLE bigd781b_new ( > id int, > txt1 string, > txt2 string, > cda_date int, > cda_job_name varchar(12)); > INSERT INTO bigd781b_new VALUES > (1,'2010005759','7164335675012038',20200528,'load1'), > (2,'2010005759','7164335675012038',20200528,'load2'); > {code} > Running below query produces incorrect results > {code} > SELECT > txt1, > txt2, > count(distinct txt1) over(partition by txt1) as n, > count(distinct txt2) over(partition by txt2) as m > FROM bigd781b_new > {code} > as below. > {code} > +-+---+++ > |txt1 | txt2| n | m | > +-+---+++ > | 2010005759 | 7164335675012038 | 2 | 2 | > | 2010005759 | 7164335675012038 | 2 | 2 | > +-+---+++ > {code} > While the correct output would be > {code} > +-+---+++ > |txt1 | txt2| n | m | > +-+---+++ > | 2010005759 | 7164335675012038 | 1 | 1 | > | 2010005759 | 7164335675012038 | 1 | 1 | > +-+---+++ > {code} > The problem does not appear after setting below property > set hive.vectorized.execution.ptf.enabled=false; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24245) Vectorized PTF with count and distinct over partition producing incorrect results.
[ https://issues.apache.org/jira/browse/HIVE-24245?focusedWorklogId=516593&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516593 ] ASF GitHub Bot logged work on HIVE-24245: - Author: ASF GitHub Bot Created on: 25/Nov/20 10:21 Start Date: 25/Nov/20 10:21 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1649: URL: https://github.com/apache/hive/pull/1649#discussion_r530261010 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java ## @@ -180,14 +181,13 @@ public void iterate(AggregationBuffer agg, Object[] parameters) if (((CountAgg) agg).uniqueObjects == null) { ((CountAgg) agg).uniqueObjects = new HashSet(); } - HashSet uniqueObjs = ((CountAgg) agg).uniqueObjects; + Set uniqueObjs = ((CountAgg) agg).uniqueObjects; ObjectInspectorObject obj = new ObjectInspectorObject( ObjectInspectorUtils.copyToStandardObject(parameters, inputOI, ObjectInspectorCopyOption.JAVA), outputOI); - if (!uniqueObjs.contains(obj)) { -uniqueObjs.add(obj); - } else { + boolean inserted = uniqueObjs.add(obj); + if (!inserted){ Review comment: Thanks for taking care of this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516593) Time Spent: 1.5h (was: 1h 20m) > Vectorized PTF with count and distinct over partition producing incorrect > results. > -- > > Key: HIVE-24245 > URL: https://issues.apache.org/jira/browse/HIVE-24245 > Project: Hive > Issue Type: Bug > Components: Hive, PTF-Windowing, Vectorization >Affects Versions: 3.1.0, 3.1.2 >Reporter: Chiran Ravani >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Vectorized PTF for count and distinct over partition is broken. It produces > incorrect results. > Below is the test case. > {code} > CREATE TABLE bigd781b_new ( > id int, > txt1 string, > txt2 string, > cda_date int, > cda_job_name varchar(12)); > INSERT INTO bigd781b_new VALUES > (1,'2010005759','7164335675012038',20200528,'load1'), > (2,'2010005759','7164335675012038',20200528,'load2'); > {code} > Running below query produces incorrect results > {code} > SELECT > txt1, > txt2, > count(distinct txt1) over(partition by txt1) as n, > count(distinct txt2) over(partition by txt2) as m > FROM bigd781b_new > {code} > as below. > {code} > +-+---+++ > |txt1 | txt2| n | m | > +-+---+++ > | 2010005759 | 7164335675012038 | 2 | 2 | > | 2010005759 | 7164335675012038 | 2 | 2 | > +-+---+++ > {code} > While the correct output would be > {code} > +-+---+++ > |txt1 | txt2| n | m | > +-+---+++ > | 2010005759 | 7164335675012038 | 1 | 1 | > | 2010005759 | 7164335675012038 | 1 | 1 | > +-+---+++ > {code} > The problem does not appear after setting below property > set hive.vectorized.execution.ptf.enabled=false; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version
[ https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516591&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516591 ] ASF GitHub Bot logged work on HIVE-24403: - Author: ASF GitHub Bot Created on: 25/Nov/20 10:19 Start Date: 25/Nov/20 10:19 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1688: URL: https://github.com/apache/hive/pull/1688#discussion_r530253708 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -5094,6 +5153,99 @@ public void countOpenTxns() throws MetaException { } } + /** + * Add min history level entry for each generated txn record + * @param dbConn Connection + * @param txnIds new transaction ids + * @deprecated Remove this method when min_history_level table is dropped + * @throws SQLException ex + */ + @Deprecated + private void addTxnToMinHistoryLevel(Connection dbConn, List txnIds, long minOpenTxnId) throws SQLException { +if (!useMinHistoryLevel) { + return; +} +// Need to register minimum open txnid for current transactions into MIN_HISTORY table. +try (Statement stmt = dbConn.createStatement()) { + + List rows = txnIds.stream().map(txnId -> txnId + ", " + minOpenTxnId).collect(Collectors.toList()); + + // Insert transaction entries into MIN_HISTORY_LEVEL. + List inserts = + sqlGenerator.createInsertValuesStmt("\"MIN_HISTORY_LEVEL\" (\"MHL_TXNID\", \"MHL_MIN_OPEN_TXNID\")", rows); + for (String insert : inserts) { +LOG.debug("Going to execute insert <" + insert + ">"); +stmt.execute(insert); + } + LOG.info("Added entries to MIN_HISTORY_LEVEL for current txns: (" + txnIds + ") with min_open_txn: " + minOpenTxnId); +} catch (SQLException e) { + if (dbProduct.isTableNotExists(e)) { +// If the table does not exists anymore, we disable the flag and start to work the new way +// This enables to switch to the new functionality without a restart +useMinHistoryLevel = false; Review comment: Are you covering the case that schema change doesn't force any restart? Lot's of code duplication, can you wrap needed methods with aspect? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516591) Time Spent: 2h (was: 1h 50m) > change min_history_level schema change to be compatible with previous version > - > > Key: HIVE-24403 > URL: https://issues.apache.org/jira/browse/HIVE-24403 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > In some configurations the HMS backend DB is used by HMS services with > different versions. > HIVE-23107 dropped the min_history_level table from the backend DB making > the new schema version incompatible with the older HMS services. > It is possible to modify that change to keep the compatibility > * Keep the min_history_level table > * Add the new fields for the compaction_queue the same way > * Create a feature flag for min_history_level and if it is on > * Keep the logic inserting to the table during openTxn > * Keep the logic removing the records at commitTxn and abortTxn > * Change the logic in the cleaner, to get the highwatermark the old way > * But still change it to not start the cleaning before that > * The txn_to_write_id table cleaning can work the new way in the new version > and the old way in the old version > * This feature flag can be automatically setup based on the existence of the > min_history level table, this way if the table will be dropped all HMS-s can > switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version
[ https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516590&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516590 ] ASF GitHub Bot logged work on HIVE-24403: - Author: ASF GitHub Bot Created on: 25/Nov/20 10:11 Start Date: 25/Nov/20 10:11 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1688: URL: https://github.com/apache/hive/pull/1688#discussion_r530253708 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -5094,6 +5153,99 @@ public void countOpenTxns() throws MetaException { } } + /** + * Add min history level entry for each generated txn record + * @param dbConn Connection + * @param txnIds new transaction ids + * @deprecated Remove this method when min_history_level table is dropped + * @throws SQLException ex + */ + @Deprecated + private void addTxnToMinHistoryLevel(Connection dbConn, List txnIds, long minOpenTxnId) throws SQLException { +if (!useMinHistoryLevel) { + return; +} +// Need to register minimum open txnid for current transactions into MIN_HISTORY table. +try (Statement stmt = dbConn.createStatement()) { + + List rows = txnIds.stream().map(txnId -> txnId + ", " + minOpenTxnId).collect(Collectors.toList()); + + // Insert transaction entries into MIN_HISTORY_LEVEL. + List inserts = + sqlGenerator.createInsertValuesStmt("\"MIN_HISTORY_LEVEL\" (\"MHL_TXNID\", \"MHL_MIN_OPEN_TXNID\")", rows); + for (String insert : inserts) { +LOG.debug("Going to execute insert <" + insert + ">"); +stmt.execute(insert); + } + LOG.info("Added entries to MIN_HISTORY_LEVEL for current txns: (" + txnIds + ") with min_open_txn: " + minOpenTxnId); +} catch (SQLException e) { + if (dbProduct.isTableNotExists(e)) { +// If the table does not exists anymore, we disable the flag and start to work the new way +// This enables to switch to the new functionality without a restart +useMinHistoryLevel = false; Review comment: Why is this needed? Are you covering the case that schema change is done while old HMS is still running? Lot's of code duplication, can you wrap needed methods with aspect? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516590) Time Spent: 1h 50m (was: 1h 40m) > change min_history_level schema change to be compatible with previous version > - > > Key: HIVE-24403 > URL: https://issues.apache.org/jira/browse/HIVE-24403 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > In some configurations the HMS backend DB is used by HMS services with > different versions. > HIVE-23107 dropped the min_history_level table from the backend DB making > the new schema version incompatible with the older HMS services. > It is possible to modify that change to keep the compatibility > * Keep the min_history_level table > * Add the new fields for the compaction_queue the same way > * Create a feature flag for min_history_level and if it is on > * Keep the logic inserting to the table during openTxn > * Keep the logic removing the records at commitTxn and abortTxn > * Change the logic in the cleaner, to get the highwatermark the old way > * But still change it to not start the cleaning before that > * The txn_to_write_id table cleaning can work the new way in the new version > and the old way in the old version > * This feature flag can be automatically setup based on the existence of the > min_history level table, this way if the table will be dropped all HMS-s can > switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version
[ https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516589&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516589 ] ASF GitHub Bot logged work on HIVE-24403: - Author: ASF GitHub Bot Created on: 25/Nov/20 10:10 Start Date: 25/Nov/20 10:10 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1688: URL: https://github.com/apache/hive/pull/1688#discussion_r530253708 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -5094,6 +5153,99 @@ public void countOpenTxns() throws MetaException { } } + /** + * Add min history level entry for each generated txn record + * @param dbConn Connection + * @param txnIds new transaction ids + * @deprecated Remove this method when min_history_level table is dropped + * @throws SQLException ex + */ + @Deprecated + private void addTxnToMinHistoryLevel(Connection dbConn, List txnIds, long minOpenTxnId) throws SQLException { +if (!useMinHistoryLevel) { + return; +} +// Need to register minimum open txnid for current transactions into MIN_HISTORY table. +try (Statement stmt = dbConn.createStatement()) { + + List rows = txnIds.stream().map(txnId -> txnId + ", " + minOpenTxnId).collect(Collectors.toList()); + + // Insert transaction entries into MIN_HISTORY_LEVEL. + List inserts = + sqlGenerator.createInsertValuesStmt("\"MIN_HISTORY_LEVEL\" (\"MHL_TXNID\", \"MHL_MIN_OPEN_TXNID\")", rows); + for (String insert : inserts) { +LOG.debug("Going to execute insert <" + insert + ">"); +stmt.execute(insert); + } + LOG.info("Added entries to MIN_HISTORY_LEVEL for current txns: (" + txnIds + ") with min_open_txn: " + minOpenTxnId); +} catch (SQLException e) { + if (dbProduct.isTableNotExists(e)) { +// If the table does not exists anymore, we disable the flag and start to work the new way +// This enables to switch to the new functionality without a restart +useMinHistoryLevel = false; Review comment: Why is this needed? Are you covering the case that schema change is done while old HMS is still running? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516589) Time Spent: 1h 40m (was: 1.5h) > change min_history_level schema change to be compatible with previous version > - > > Key: HIVE-24403 > URL: https://issues.apache.org/jira/browse/HIVE-24403 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > In some configurations the HMS backend DB is used by HMS services with > different versions. > HIVE-23107 dropped the min_history_level table from the backend DB making > the new schema version incompatible with the older HMS services. > It is possible to modify that change to keep the compatibility > * Keep the min_history_level table > * Add the new fields for the compaction_queue the same way > * Create a feature flag for min_history_level and if it is on > * Keep the logic inserting to the table during openTxn > * Keep the logic removing the records at commitTxn and abortTxn > * Change the logic in the cleaner, to get the highwatermark the old way > * But still change it to not start the cleaning before that > * The txn_to_write_id table cleaning can work the new way in the new version > and the old way in the old version > * This feature flag can be automatically setup based on the existence of the > min_history level table, this way if the table will be dropped all HMS-s can > switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version
[ https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516585&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516585 ] ASF GitHub Bot logged work on HIVE-24403: - Author: ASF GitHub Bot Created on: 25/Nov/20 10:05 Start Date: 25/Nov/20 10:05 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1688: URL: https://github.com/apache/hive/pull/1688#discussion_r530250594 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -670,6 +725,8 @@ public OpenTxnsResponse openTxns(OpenTxnRequest rqst) throws MetaException { assert txnIds.size() == numTxns; + addTxnToMinHistoryLevel(dbConn, txnIds, minOpenTxnId); Review comment: why not to put this under above useMinHistoryLevel check? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516585) Time Spent: 1.5h (was: 1h 20m) > change min_history_level schema change to be compatible with previous version > - > > Key: HIVE-24403 > URL: https://issues.apache.org/jira/browse/HIVE-24403 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > In some configurations the HMS backend DB is used by HMS services with > different versions. > HIVE-23107 dropped the min_history_level table from the backend DB making > the new schema version incompatible with the older HMS services. > It is possible to modify that change to keep the compatibility > * Keep the min_history_level table > * Add the new fields for the compaction_queue the same way > * Create a feature flag for min_history_level and if it is on > * Keep the logic inserting to the table during openTxn > * Keep the logic removing the records at commitTxn and abortTxn > * Change the logic in the cleaner, to get the highwatermark the old way > * But still change it to not start the cleaning before that > * The txn_to_write_id table cleaning can work the new way in the new version > and the old way in the old version > * This feature flag can be automatically setup based on the existence of the > min_history level table, this way if the table will be dropped all HMS-s can > switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version
[ https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516583&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516583 ] ASF GitHub Bot logged work on HIVE-24403: - Author: ASF GitHub Bot Created on: 25/Nov/20 10:01 Start Date: 25/Nov/20 10:01 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1688: URL: https://github.com/apache/hive/pull/1688#discussion_r530247484 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -390,6 +404,42 @@ public void setConf(Configuration conf){ } } + /** + * Check if min_history_level table is usable + * @return + * @throws MetaException + */ + private boolean checkMinHistoryLevelTable(boolean configValue) throws MetaException { +if (!configValue) { + // don't check it if disabled + return false; +} +Connection dbConn = null; +boolean tableExists = true; +try { + dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED); + try (Statement stmt = dbConn.createStatement()) { +// Dummy query to see if table exists +try (ResultSet rs = stmt.executeQuery("SELECT MIN(\"MHL_MIN_OPEN_TXNID\") FROM \"MIN_HISTORY_LEVEL\"")) { Review comment: you can just select 1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516583) Time Spent: 1h 20m (was: 1h 10m) > change min_history_level schema change to be compatible with previous version > - > > Key: HIVE-24403 > URL: https://issues.apache.org/jira/browse/HIVE-24403 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > In some configurations the HMS backend DB is used by HMS services with > different versions. > HIVE-23107 dropped the min_history_level table from the backend DB making > the new schema version incompatible with the older HMS services. > It is possible to modify that change to keep the compatibility > * Keep the min_history_level table > * Add the new fields for the compaction_queue the same way > * Create a feature flag for min_history_level and if it is on > * Keep the logic inserting to the table during openTxn > * Keep the logic removing the records at commitTxn and abortTxn > * Change the logic in the cleaner, to get the highwatermark the old way > * But still change it to not start the cleaning before that > * The txn_to_write_id table cleaning can work the new way in the new version > and the old way in the old version > * This feature flag can be automatically setup based on the existence of the > min_history level table, this way if the table will be dropped all HMS-s can > switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version
[ https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516579&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516579 ] ASF GitHub Bot logged work on HIVE-24403: - Author: ASF GitHub Bot Created on: 25/Nov/20 09:54 Start Date: 25/Nov/20 09:54 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1688: URL: https://github.com/apache/hive/pull/1688#discussion_r530242536 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java ## @@ -385,6 +391,26 @@ public static String queryToString(Configuration conf, String query, boolean inc return sb.toString(); } + /** + * This is only for testing, it does not use the connectionPool from TxnHandler! + * @param conf + * @param query + * @throws Exception + */ + @VisibleForTesting + public static void executeUpdate(Configuration conf, String query) Review comment: That's not a test class. Production code becomes massive because of that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516579) Time Spent: 1h 10m (was: 1h) > change min_history_level schema change to be compatible with previous version > - > > Key: HIVE-24403 > URL: https://issues.apache.org/jira/browse/HIVE-24403 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > In some configurations the HMS backend DB is used by HMS services with > different versions. > HIVE-23107 dropped the min_history_level table from the backend DB making > the new schema version incompatible with the older HMS services. > It is possible to modify that change to keep the compatibility > * Keep the min_history_level table > * Add the new fields for the compaction_queue the same way > * Create a feature flag for min_history_level and if it is on > * Keep the logic inserting to the table during openTxn > * Keep the logic removing the records at commitTxn and abortTxn > * Change the logic in the cleaner, to get the highwatermark the old way > * But still change it to not start the cleaning before that > * The txn_to_write_id table cleaning can work the new way in the new version > and the old way in the old version > * This feature flag can be automatically setup based on the existence of the > min_history level table, this way if the table will be dropped all HMS-s can > switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version
[ https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516576&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516576 ] ASF GitHub Bot logged work on HIVE-24403: - Author: ASF GitHub Bot Created on: 25/Nov/20 09:47 Start Date: 25/Nov/20 09:47 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1688: URL: https://github.com/apache/hive/pull/1688#discussion_r530238192 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java ## @@ -232,15 +232,23 @@ public static void cleanDb(Configuration conf) throws Exception { success &= truncateTable(conn, conf, stmt, "WRITE_SET"); success &= truncateTable(conn, conf, stmt, "REPL_TXN_MAP"); success &= truncateTable(conn, conf, stmt, "MATERIALIZATION_REBUILD_LOCKS"); + success &= truncateTable(conn, conf, stmt, "MIN_HISTORY_LEVEL"); try { -resetTxnSequence(conn, conf, stmt); -stmt.executeUpdate("INSERT INTO \"NEXT_LOCK_ID\" VALUES(1)"); -stmt.executeUpdate("INSERT INTO \"NEXT_COMPACTION_QUEUE_ID\" VALUES(1)"); - } catch (SQLException e) { -if (!getTableNotExistsErrorCodes().contains(e.getSQLState())) { - LOG.error("Error initializing sequence values", e); - success = false; +String dbProduct = conn.getMetaData().getDatabaseProductName(); +DatabaseProduct databaseProduct = determineDatabaseProduct(dbProduct, conf); +try { + resetTxnSequence(databaseProduct, stmt); + stmt.executeUpdate("INSERT INTO \"NEXT_LOCK_ID\" VALUES(1)"); + stmt.executeUpdate("INSERT INTO \"NEXT_COMPACTION_QUEUE_ID\" VALUES(1)"); +} catch (SQLException e) { + if (!databaseProduct.isTableNotExists(e)) { Review comment: Previous version was much more readable and concise. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516576) Time Spent: 1h (was: 50m) > change min_history_level schema change to be compatible with previous version > - > > Key: HIVE-24403 > URL: https://issues.apache.org/jira/browse/HIVE-24403 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > In some configurations the HMS backend DB is used by HMS services with > different versions. > HIVE-23107 dropped the min_history_level table from the backend DB making > the new schema version incompatible with the older HMS services. > It is possible to modify that change to keep the compatibility > * Keep the min_history_level table > * Add the new fields for the compaction_queue the same way > * Create a feature flag for min_history_level and if it is on > * Keep the logic inserting to the table during openTxn > * Keep the logic removing the records at commitTxn and abortTxn > * Change the logic in the cleaner, to get the highwatermark the old way > * But still change it to not start the cleaning before that > * The txn_to_write_id table cleaning can work the new way in the new version > and the old way in the old version > * This feature flag can be automatically setup based on the existence of the > min_history level table, this way if the table will be dropped all HMS-s can > switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version
[ https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516575&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516575 ] ASF GitHub Bot logged work on HIVE-24403: - Author: ASF GitHub Bot Created on: 25/Nov/20 09:42 Start Date: 25/Nov/20 09:42 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1688: URL: https://github.com/apache/hive/pull/1688#discussion_r530225330 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -98,16 +98,24 @@ public void run() { handle = txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Cleaner.name()); startedAt = System.currentTimeMillis(); long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner(); + long minTxnIdSeenOpen = txnHandler.findMinTxnIdSeenOpen(); Review comment: could we skip this extra db call if we have `metastore.txn.use.minhistorylevel=false` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516575) Time Spent: 50m (was: 40m) > change min_history_level schema change to be compatible with previous version > - > > Key: HIVE-24403 > URL: https://issues.apache.org/jira/browse/HIVE-24403 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > In some configurations the HMS backend DB is used by HMS services with > different versions. > HIVE-23107 dropped the min_history_level table from the backend DB making > the new schema version incompatible with the older HMS services. > It is possible to modify that change to keep the compatibility > * Keep the min_history_level table > * Add the new fields for the compaction_queue the same way > * Create a feature flag for min_history_level and if it is on > * Keep the logic inserting to the table during openTxn > * Keep the logic removing the records at commitTxn and abortTxn > * Change the logic in the cleaner, to get the highwatermark the old way > * But still change it to not start the cleaning before that > * The txn_to_write_id table cleaning can work the new way in the new version > and the old way in the old version > * This feature flag can be automatically setup based on the existence of the > min_history level table, this way if the table will be dropped all HMS-s can > switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version
[ https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516572&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516572 ] ASF GitHub Bot logged work on HIVE-24403: - Author: ASF GitHub Bot Created on: 25/Nov/20 09:29 Start Date: 25/Nov/20 09:29 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1688: URL: https://github.com/apache/hive/pull/1688#discussion_r530225330 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -98,16 +98,24 @@ public void run() { handle = txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Cleaner.name()); startedAt = System.currentTimeMillis(); long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner(); + long minTxnIdSeenOpen = txnHandler.findMinTxnIdSeenOpen(); Review comment: could we skip this extra db call if we have `metastore.txn.use.minhistorylevel=false` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516572) Time Spent: 40m (was: 0.5h) > change min_history_level schema change to be compatible with previous version > - > > Key: HIVE-24403 > URL: https://issues.apache.org/jira/browse/HIVE-24403 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > In some configurations the HMS backend DB is used by HMS services with > different versions. > HIVE-23107 dropped the min_history_level table from the backend DB making > the new schema version incompatible with the older HMS services. > It is possible to modify that change to keep the compatibility > * Keep the min_history_level table > * Add the new fields for the compaction_queue the same way > * Create a feature flag for min_history_level and if it is on > * Keep the logic inserting to the table during openTxn > * Keep the logic removing the records at commitTxn and abortTxn > * Change the logic in the cleaner, to get the highwatermark the old way > * But still change it to not start the cleaning before that > * The txn_to_write_id table cleaning can work the new way in the new version > and the old way in the old version > * This feature flag can be automatically setup based on the existence of the > min_history level table, this way if the table will be dropped all HMS-s can > switch to the new functionality without restart -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24314) compactor.Cleaner should not set state "mark cleaned" if it didn't remove any files
[ https://issues.apache.org/jira/browse/HIVE-24314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238618#comment-17238618 ] Karen Coppage commented on HIVE-24314: -- Committed to master Nov 3, 2020. Thanks for the reviews [~pvargacl] and [~kuczoram]! > compactor.Cleaner should not set state "mark cleaned" if it didn't remove any > files > --- > > Key: HIVE-24314 > URL: https://issues.apache.org/jira/browse/HIVE-24314 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > If the Cleaner didn't remove any files, don't mark the compaction queue entry > as "succeeded" but instead leave it in "ready for cleaning" state for later > cleaning. If it removed at least one file, then the compaction queue entry as > "succeeded". This is a partial fix, HIVE-24291 is the complete fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24314) compactor.Cleaner should not set state "mark cleaned" if it didn't remove any files
[ https://issues.apache.org/jira/browse/HIVE-24314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Coppage resolved HIVE-24314. -- Fix Version/s: 4.0.0 Resolution: Fixed > compactor.Cleaner should not set state "mark cleaned" if it didn't remove any > files > --- > > Key: HIVE-24314 > URL: https://issues.apache.org/jira/browse/HIVE-24314 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > If the Cleaner didn't remove any files, don't mark the compaction queue entry > as "succeeded" but instead leave it in "ready for cleaning" state for later > cleaning. If it removed at least one file, then the compaction queue entry as > "succeeded". This is a partial fix, HIVE-24291 is the complete fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24245) Vectorized PTF with count and distinct over partition producing incorrect results.
[ https://issues.apache.org/jira/browse/HIVE-24245?focusedWorklogId=516529&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516529 ] ASF GitHub Bot logged work on HIVE-24245: - Author: ASF GitHub Bot Created on: 25/Nov/20 08:27 Start Date: 25/Nov/20 08:27 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1649: URL: https://github.com/apache/hive/pull/1649#discussion_r530185546 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFEvaluatorCountDistinct.java ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector.ptf; + +import java.util.HashSet; +import java.util.Set; + +import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.ColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.ptf.WindowFrameDef; +import org.apache.hive.common.util.Murmur3; + +import com.google.common.base.Preconditions; + +/** + * This class evaluates count(column) for a PTF group where a distinct keyword is applied to the + * partitioning column itself, e.g.: + * + * SELECT + * txt1, + * txt2, + * count(distinct txt1) over(partition by txt1) as n, + * count(distinct txt2) over(partition by txt2) as m + * FROM example; + * + * In this case, the framework is still supposed to ensure sorting + * on the key (let's say txt1 for the first Reducer stage), but the original + * VectorPTFEvaluatorCount is not aware that a distinct keyword was applied + * to the key column. This case would be simple, because such function should + * return 1 every time. However, that's just a corner-case, a real scenario is + * when the partitioning column is not the same. In such cases, a real count + * distinct implementation is needed: + * + * SELECT + * txt1, + * txt2, + * count(distinct txt2) over(partition by txt1) as n, + * count(distinct txt1) over(partition by txt2) as m + * FROM example; + */ +public abstract class VectorPTFEvaluatorCountDistinct extends VectorPTFEvaluatorCount { + + protected Set uniqueObjects; + + public VectorPTFEvaluatorCountDistinct(WindowFrameDef windowFrameDef, + VectorExpression inputVecExpr, int outputColumnNum) { +super(windowFrameDef, inputVecExpr, outputColumnNum); +resetEvaluator(); + } + + @Override + public void evaluateGroupBatch(VectorizedRowBatch batch) throws HiveException { + +evaluateInputExpr(batch); + +// We do not filter when PTF is in reducer. +Preconditions.checkState(!batch.selectedInUse); + +final int size = batch.size; +if (size == 0) { + return; +} +ColumnVector colVector = batch.cols[inputColumnNum]; +if (colVector.isRepeating) { + if (colVector.noNulls || !colVector.isNull[0]) { +countValue(colVector, 0); + } +} else { + boolean[] batchIsNull = colVector.isNull; + for (int i = 0; i < size; i++) { +if (!batchIsNull[i]) { + countValue(colVector, i); +} + } +} + } + + protected void countValue(ColumnVector colVector, int i) { +Object value = getValue(colVector, i); +if (!uniqueObjects.contains(value)) { + uniqueObjects.add(value); Review comment: btw, I copied this wrong approach from GenericUDAFCount, I'm fixing it there also This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516529) Time Spent: 1h 10m (was: 1h) > Vectorized PTF with count and distinct over partition producing incorrect > results. > -
[jira] [Work logged] (HIVE-24245) Vectorized PTF with count and distinct over partition producing incorrect results.
[ https://issues.apache.org/jira/browse/HIVE-24245?focusedWorklogId=516526&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516526 ] ASF GitHub Bot logged work on HIVE-24245: - Author: ASF GitHub Bot Created on: 25/Nov/20 08:21 Start Date: 25/Nov/20 08:21 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1649: URL: https://github.com/apache/hive/pull/1649#discussion_r530182438 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFEvaluatorBytesCountDistinct.java ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector.ptf; + +import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.ColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; +import org.apache.hadoop.hive.ql.plan.ptf.WindowFrameDef; +import org.apache.hive.common.util.Murmur3; + +/** + * Bytes (String) implementation for VectorPTFEvaluatorCountDistinct. + */ +public class VectorPTFEvaluatorBytesCountDistinct extends VectorPTFEvaluatorCountDistinct { + + public VectorPTFEvaluatorBytesCountDistinct(WindowFrameDef windowFrameDef, + VectorExpression inputVecExpr, int outputColumnNum) { +super(windowFrameDef, inputVecExpr, outputColumnNum); +resetEvaluator(); + } + + protected Object getValue(ColumnVector colVector, int i) { +BytesColumnVector inV = (BytesColumnVector) colVector; +return Murmur3.hash32(inV.vector[i], inV.start[i], inV.length[i], Murmur3.DEFAULT_SEED); Review comment: I used hashing for memory considerations, I was hoping that this way the unique set will consume less memory (storing hashes instead of strings of arbitrary length)...now I think that we could be fine with storing strings and prevent additional hashing, as we tend to optimize CPU cycles instead of memory in the very-first round...I'll change this to new String(byte[]) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516526) Time Spent: 1h (was: 50m) > Vectorized PTF with count and distinct over partition producing incorrect > results. > -- > > Key: HIVE-24245 > URL: https://issues.apache.org/jira/browse/HIVE-24245 > Project: Hive > Issue Type: Bug > Components: Hive, PTF-Windowing, Vectorization >Affects Versions: 3.1.0, 3.1.2 >Reporter: Chiran Ravani >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Vectorized PTF for count and distinct over partition is broken. It produces > incorrect results. > Below is the test case. > {code} > CREATE TABLE bigd781b_new ( > id int, > txt1 string, > txt2 string, > cda_date int, > cda_job_name varchar(12)); > INSERT INTO bigd781b_new VALUES > (1,'2010005759','7164335675012038',20200528,'load1'), > (2,'2010005759','7164335675012038',20200528,'load2'); > {code} > Running below query produces incorrect results > {code} > SELECT > txt1, > txt2, > count(distinct txt1) over(partition by txt1) as n, > count(distinct txt2) over(partition by txt2) as m > FROM bigd781b_new > {code} > as below. > {code} > +-+---+++ > |txt1 | txt2| n | m | > +-+---+++ > | 2010005759 | 7164335675012038 | 2 | 2 | > | 2010005759 | 7164335675012038 | 2 | 2 | > +-+---+++ > {code} > While the correct output would be > {code} > +
[jira] [Work logged] (HIVE-24245) Vectorized PTF with count and distinct over partition producing incorrect results.
[ https://issues.apache.org/jira/browse/HIVE-24245?focusedWorklogId=516522&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516522 ] ASF GitHub Bot logged work on HIVE-24245: - Author: ASF GitHub Bot Created on: 25/Nov/20 08:17 Start Date: 25/Nov/20 08:17 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1649: URL: https://github.com/apache/hive/pull/1649#discussion_r530179906 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFEvaluatorCountDistinct.java ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector.ptf; + +import java.util.HashSet; +import java.util.Set; + +import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.ColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.ptf.WindowFrameDef; +import org.apache.hive.common.util.Murmur3; + +import com.google.common.base.Preconditions; + +/** + * This class evaluates count(column) for a PTF group where a distinct keyword is applied to the + * partitioning column itself, e.g.: + * + * SELECT + * txt1, + * txt2, + * count(distinct txt1) over(partition by txt1) as n, + * count(distinct txt2) over(partition by txt2) as m + * FROM example; + * + * In this case, the framework is still supposed to ensure sorting + * on the key (let's say txt1 for the first Reducer stage), but the original + * VectorPTFEvaluatorCount is not aware that a distinct keyword was applied + * to the key column. This case would be simple, because such function should + * return 1 every time. However, that's just a corner-case, a real scenario is + * when the partitioning column is not the same. In such cases, a real count + * distinct implementation is needed: + * + * SELECT + * txt1, + * txt2, + * count(distinct txt2) over(partition by txt1) as n, + * count(distinct txt1) over(partition by txt2) as m + * FROM example; + */ +public abstract class VectorPTFEvaluatorCountDistinct extends VectorPTFEvaluatorCount { + + protected Set uniqueObjects; + + public VectorPTFEvaluatorCountDistinct(WindowFrameDef windowFrameDef, + VectorExpression inputVecExpr, int outputColumnNum) { +super(windowFrameDef, inputVecExpr, outputColumnNum); +resetEvaluator(); + } + + @Override + public void evaluateGroupBatch(VectorizedRowBatch batch) throws HiveException { + +evaluateInputExpr(batch); + +// We do not filter when PTF is in reducer. +Preconditions.checkState(!batch.selectedInUse); + +final int size = batch.size; +if (size == 0) { + return; +} +ColumnVector colVector = batch.cols[inputColumnNum]; +if (colVector.isRepeating) { + if (colVector.noNulls || !colVector.isNull[0]) { +countValue(colVector, 0); + } +} else { + boolean[] batchIsNull = colVector.isNull; + for (int i = 0; i < size; i++) { +if (!batchIsNull[i]) { + countValue(colVector, i); Review comment: right! updating it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516522) Time Spent: 50m (was: 40m) > Vectorized PTF with count and distinct over partition producing incorrect > results. > -- > > Key: HIVE-24245 > URL: https://issues.apache.org/jira/browse/HIVE-24245 > Project: Hive > Issue Type: Bug > Components: Hive, PTF-Windowing, Vectorization >
[jira] [Work logged] (HIVE-24245) Vectorized PTF with count and distinct over partition producing incorrect results.
[ https://issues.apache.org/jira/browse/HIVE-24245?focusedWorklogId=516520&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516520 ] ASF GitHub Bot logged work on HIVE-24245: - Author: ASF GitHub Bot Created on: 25/Nov/20 08:14 Start Date: 25/Nov/20 08:14 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1649: URL: https://github.com/apache/hive/pull/1649#discussion_r530178604 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFEvaluatorCountDistinct.java ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.vector.ptf; + +import java.util.HashSet; +import java.util.Set; + +import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.ColumnVector; +import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; +import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.ql.plan.ptf.WindowFrameDef; +import org.apache.hive.common.util.Murmur3; + +import com.google.common.base.Preconditions; + +/** + * This class evaluates count(column) for a PTF group where a distinct keyword is applied to the + * partitioning column itself, e.g.: + * + * SELECT + * txt1, + * txt2, + * count(distinct txt1) over(partition by txt1) as n, + * count(distinct txt2) over(partition by txt2) as m + * FROM example; + * + * In this case, the framework is still supposed to ensure sorting + * on the key (let's say txt1 for the first Reducer stage), but the original + * VectorPTFEvaluatorCount is not aware that a distinct keyword was applied + * to the key column. This case would be simple, because such function should + * return 1 every time. However, that's just a corner-case, a real scenario is + * when the partitioning column is not the same. In such cases, a real count + * distinct implementation is needed: + * + * SELECT + * txt1, + * txt2, + * count(distinct txt2) over(partition by txt1) as n, + * count(distinct txt1) over(partition by txt2) as m + * FROM example; + */ +public abstract class VectorPTFEvaluatorCountDistinct extends VectorPTFEvaluatorCount { + + protected Set uniqueObjects; + + public VectorPTFEvaluatorCountDistinct(WindowFrameDef windowFrameDef, + VectorExpression inputVecExpr, int outputColumnNum) { +super(windowFrameDef, inputVecExpr, outputColumnNum); +resetEvaluator(); + } + + @Override + public void evaluateGroupBatch(VectorizedRowBatch batch) throws HiveException { + +evaluateInputExpr(batch); + +// We do not filter when PTF is in reducer. +Preconditions.checkState(!batch.selectedInUse); + +final int size = batch.size; +if (size == 0) { + return; +} +ColumnVector colVector = batch.cols[inputColumnNum]; +if (colVector.isRepeating) { + if (colVector.noNulls || !colVector.isNull[0]) { +countValue(colVector, 0); + } +} else { + boolean[] batchIsNull = colVector.isNull; + for (int i = 0; i < size; i++) { +if (!batchIsNull[i]) { + countValue(colVector, i); +} + } +} + } + + protected void countValue(ColumnVector colVector, int i) { +Object value = getValue(colVector, i); +if (!uniqueObjects.contains(value)) { + uniqueObjects.add(value); Review comment: yeah, thanks, forgot that Set takes care of uniqueness :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 516520) Time Spent: 40m (was: 0.5h) > Vectorized PTF with count and distinct over partition producing incorrect > results. > ---