date:20201125

[jira] [Updated] (HIVE-24434) Filter out materialized views for rewriting if plan pattern is not allowed

2020-11-25 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-24434:
--
Component/s: Materialized views

> Filter out materialized views for rewriting if plan pattern is not allowed
> --
>
> Key: HIVE-24434
> URL: https://issues.apache.org/jira/browse/HIVE-24434
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Affects Versions: 4.0.0
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> Some materialized views are not enabled for Calcite based rewriting. Rules 
> for validating materialized views are implemented by HIVE-20748. 
> Since text based materialized view query rewrite doesn't have such 
> limitations some logic must be implemented to flag materialized view whether 
> they are enabled to text based rewrite only or both.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24434) Filter out materialized views for rewriting if plan pattern is not allowed

2020-11-25 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-24434:
-


> Filter out materialized views for rewriting if plan pattern is not allowed
> --
>
> Key: HIVE-24434
> URL: https://issues.apache.org/jira/browse/HIVE-24434
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> Some materialized views are not enabled for Calcite based rewriting. Rules 
> for validating materialized views are implemented by HIVE-20748. 
> Since text based materialized view query rewrite doesn't have such 
> limitations some logic must be implemented to flag materialized view whether 
> they are enabled to text based rewrite only or both.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24397) Add the projection specification to the table request object and add placeholders in ObjectStore.java

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24397?focusedWorklogId=516926&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516926
 ]

ASF GitHub Bot logged work on HIVE-24397:
-

Author: ASF GitHub Bot
Created on: 26/Nov/20 06:45
Start Date: 26/Nov/20 06:45
Worklog Time Spent: 10m 
  Work Description: vnhive commented on a change in pull request #1681:
URL: https://github.com/apache/hive/pull/1681#discussion_r530803306



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
##
@@ -2360,6 +2360,20 @@ public Table getTable(String catName, String dbName, 
String tableName, String va
 return 
deepCopyTables(FilterUtils.filterTablesIfEnabled(isClientFilterEnabled, 
filterHook, tabs));
   }
 
+  @Override
+  public List getTableObjectsByRequest(GetTablesRequest req) throws 
TException {

Review comment:
   You are referring to SessionHiveMetaStoreClient right ?
   
   SessionHiveMetaStoreClient does not have an implementation for 
get_partitions_with_specs and piggybacks on the implementation in its 
superclass (HiveMetaStoreClient) from the inheritance hierarchy. I just 
followed the same pattern here.
   
   Also it just returns a list of table objects, basically a read query and 
should work the same across sessions, since, it just returns persisted session 
information.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516926)
Time Spent: 20m  (was: 10m)

> Add the projection specification to the table request object and add 
> placeholders in ObjectStore.java
> -
>
> Key: HIVE-24397
> URL: https://issues.apache.org/jira/browse/HIVE-24397
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24144) getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24144?focusedWorklogId=516902&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516902
 ]

ASF GitHub Bot logged work on HIVE-24144:
-

Author: ASF GitHub Bot
Created on: 26/Nov/20 02:40
Start Date: 26/Nov/20 02:40
Worklog Time Spent: 10m 
  Work Description: jcamachor opened a new pull request #1487:
URL: https://github.com/apache/hive/pull/1487


   …incorrect value
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516902)
Time Spent: 50m  (was: 40m)

> getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value
> 
>
> Key: HIVE-24144
> URL: https://issues.apache.org/jira/browse/HIVE-24144
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC, JDBC storage handler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code}
>   public String getIdentifierQuoteString() throws SQLException {
> return " ";
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values

2020-11-25 Thread Naresh P R (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-24433:
--
Description: 
PartionKeyValue is getting converted into lowerCase in below 2 places.

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]

Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries from 
proper partition values.

When query completes, the entry moves from TXN_COMPONENTS to 
COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the partition 
& considers it as invalid partition
{code:java}
create table abc(name string) partitioned by(city string) stored as orc 
tblproperties('transactional'='true');
insert into abc partition(city='Bangalore') values('aaa');
{code}
Example entry in COMPLETED_TXN_COMPONENTS
{noformat}
+---+--++---+-+-+---+
| CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
+---+--++---+-+-+---+
|         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 | 
          1 | N                 |
+---+--++---+-+-+---+
{noformat}
 

AutoCompaction fails to get triggered with below error
{code:java}
2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator 
(Initiator.java:run(98)) - Checking to see if we should compact 
default.abc.city=bangalore
2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator 
(Initiator.java:run(155)) - Can't find partition 
default.compaction_test.city=bhubaneshwar, assuming it has been dropped and 
moving on{code}
I verifed below 4 SQL's with my PR, those all produced correct PartitionKeyValue

i.e, COMPLETED_TXN_COMPONENTS.CTC_PARTITION="city=Bangalore"
{code:java}
insert into table abc PARTITION(CitY='Bangalore') values('Dan');
insert overwrite table abc partition(CiTy='Bangalore') select Name from abc;
update table abc set Name='xy' where CiTy='Bangalore';
delete from abc where CiTy='Bangalore';{code}

  was:
PartionKeyValue is getting converted into lowerCase in below 2 places.

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]

Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries from 
proper partition values.

When query completes, the entry moves from TXN_COMPONENTS to 
COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the partition 
& considers it as invalid partition
{code:java}
create table abc(name string) partitioned by(city string) stored as orc 
tblproperties('transactional'='true');
insert into abc partition(city='Bangalore') values('aaa');
{code}
Example entry in COMPLETED_TXN_COMPONENTS
{noformat}
+---+--++---+-+-+---+
| CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
+---+--++---+-+-+---+
|         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 | 
          1 | N                 |
+---+--++---+-+-+---+
{noformat}
 

AutoCompaction fails to get triggered with below error
{code:java}
2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator 
(Initiator.java:run(98)) - Checking to see if we should compact 
default.abc.city=bangalore
2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator 
(Initiator.java:run(155)) - Can't find partition 
default.compaction_test.city=bhubaneshwar, assuming it has been dropped and 
moving on{code}


> AutoCompaction is not getting triggered for CamelCase Partition Values
> --
>
> Key: HIVE-24433
> URL: https://issues.apache.org/jira/browse/HIVE-24433
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-

[jira] [Updated] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24433:
--
Labels: pull-request-available  (was: )

> AutoCompaction is not getting triggered for CamelCase Partition Values
> --
>
> Key: HIVE-24433
> URL: https://issues.apache.org/jira/browse/HIVE-24433
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> PartionKeyValue is getting converted into lowerCase in below 2 places.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]
> Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries 
> from proper partition values.
> When query completes, the entry moves from TXN_COMPONENTS to 
> COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the 
> partition & considers it as invalid partition
> {code:java}
> create table abc(name string) partitioned by(city string) stored as orc 
> tblproperties('transactional'='true');
> insert into abc partition(city='Bangalore') values('aaa');
> {code}
> Example entry in COMPLETED_TXN_COMPONENTS
> {noformat}
> +---+--++---+-+-+---+
> | CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
> CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
> +---+--++---+-+-+---+
> |         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 
> |           1 | N                 |
> +---+--++---+-+-+---+
> {noformat}
>  
> AutoCompaction fails to get triggered with below error
> {code:java}
> 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(98)) - Checking to see if we should compact 
> default.abc.city=bangalore
> 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(155)) - Can't find partition 
> default.compaction_test.city=bhubaneshwar, assuming it has been dropped and 
> moving on{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24433?focusedWorklogId=516894&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516894
 ]

ASF GitHub Bot logged work on HIVE-24433:
-

Author: ASF GitHub Bot
Created on: 26/Nov/20 02:03
Start Date: 26/Nov/20 02:03
Worklog Time Spent: 10m 
  Work Description: nareshpr opened a new pull request #1712:
URL: https://github.com/apache/hive/pull/1712


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516894)
Remaining Estimate: 0h
Time Spent: 10m

> AutoCompaction is not getting triggered for CamelCase Partition Values
> --
>
> Key: HIVE-24433
> URL: https://issues.apache.org/jira/browse/HIVE-24433
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> PartionKeyValue is getting converted into lowerCase in below 2 places.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]
> Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries 
> from proper partition values.
> When query completes, the entry moves from TXN_COMPONENTS to 
> COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the 
> partition & considers it as invalid partition
> {code:java}
> create table abc(name string) partitioned by(city string) stored as orc 
> tblproperties('transactional'='true');
> insert into abc partition(city='Bangalore') values('aaa');
> {code}
> Example entry in COMPLETED_TXN_COMPONENTS
> {noformat}
> +---+--++---+-+-+---+
> | CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
> CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
> +---+--++---+-+-+---+
> |         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 
> |           1 | N                 |
> +---+--++---+-+-+---+
> {noformat}
>  
> AutoCompaction fails to get triggered with below error
> {code:java}
> 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(98)) - Checking to see if we should compact 
> default.abc.city=bangalore
> 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(155)) - Can't find partition 
> default.compaction_test.city=bhubaneshwar, assuming it has been dropped and 
> moving on{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values

2020-11-25 Thread Naresh P R (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-24433:
--
Description: 
PartionKeyValue is getting converted into lowerCase in below 2 places.

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]

Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries from 
proper partition values.

When query completes, the entry moves from TXN_COMPONENTS to 
COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the partition 
& considers it as invalid partition
{code:java}
create table abc(name string) partitioned by(city string) stored as orc 
tblproperties('transactional'='true');
insert into abc partition(city='Bangalore') values('aaa');
{code}
Example entry in COMPLETED_TXN_COMPONENTS
{noformat}
+---+--++---+-+-+---+
| CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
+---+--++---+-+-+---+
|         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 | 
          1 | N                 |
+---+--++---+-+-+---+
{noformat}
 

AutoCompaction fails to get triggered with below error
{code:java}
2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator 
(Initiator.java:run(98)) - Checking to see if we should compact 
default.abc.city=bangalore
2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator 
(Initiator.java:run(155)) - Can't find partition 
default.compaction_test.city=bhubaneshwar, assuming it has been dropped and 
moving on{code}

  was:
PartionKeyValue is getting converted into lowerCase in below 2 places.

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]

Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries from 
proper partition values.

When query completes, the entry moves from TXN_COMPONENTS to 
COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the partition 
& considers it as invalid partition

 
{code:java}
create table abc(name string) partitioned by(city string) stored as orc 
tblproperties('transactional'='true');
insert into abc partition(city='Bangalore') values('aaa');
{code}
 

Example entry in COMPLETED_TXN_COMPONENTS

 
{noformat}
+---+--++---+-+-+---+
| CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
+---+--++---+-+-+---+
|         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 | 
          1 | N                 |
+---+--++---+-+-+---+
{noformat}
 

AutoCompaction fails to get triggered with below error
{code:java}
2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator 
(Initiator.java:run(98)) - Checking to see if we should compact 
default.abc.city=bangalore
2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator 
(Initiator.java:run(155)) - Can't find partition 
default.compaction_test.city=bhubaneshwar, assuming it has been dropped and 
moving on{code}


> AutoCompaction is not getting triggered for CamelCase Partition Values
> --
>
> Key: HIVE-24433
> URL: https://issues.apache.org/jira/browse/HIVE-24433
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
> PartionKeyValue is getting converted into lowerCase in below 2 places.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]
> Because of which TXN_COM

[jira] [Updated] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values

2020-11-25 Thread Naresh P R (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-24433:
--
Description: 
PartionKeyValue is getting converted into lowerCase in below 2 places.

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]

Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries from 
proper partition values.

When query completes, the entry moves from TXN_COMPONENTS to 
COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the partition 
& considers it as invalid partition

 
{code:java}
create table abc(name string) partitioned by(city string) stored as orc 
tblproperties('transactional'='true');
insert into abc partition(city='Bangalore') values('aaa');
{code}
 

Example entry in COMPLETED_TXN_COMPONENTS

 
{noformat}
+---+--++---+-+-+---+
| CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
+---+--++---+-+-+---+
|         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 | 
          1 | N                 |
+---+--++---+-+-+---+
{noformat}
 

AutoCompaction fails to get triggered with below error
{code:java}
2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator 
(Initiator.java:run(98)) - Checking to see if we should compact 
default.abc.city=bangalore
2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator 
(Initiator.java:run(155)) - Can't find partition 
default.compaction_test.city=bhubaneshwar, assuming it has been dropped and 
moving on{code}

  was:
PartionKeyValue is getting converted into lowerCase in below 2 places.

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]

Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries from 
proper partition values.

When query completes, the entry moves from TXN_COMPONENTS to 
COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the partition 
& considers it as invalid partition

create table abc(name string) partitioned by(city string) stored as orc 
tblproperties('transactional'='true');

insert into abc partition(city='Bangalore') values('aaa');

Example entry in COMPLETED_TXN_COMPONENTS

 
{noformat}
+---+--++---+-+-+---+
| CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
+---+--++---+-+-+---+
|         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 | 
          1 | N                 |
+---+--++---+-+-+---+
{noformat}
 

AutoCompaction fails to get triggered with below error
{code:java}
2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator 
(Initiator.java:run(98)) - Checking to see if we should compact 
default.abc.city=bangalore
 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator 
(Initiator.java:run(155)) - Can't find partition 
default.compaction_test.city=bhubaneshwar, assuming it has been dropped and 
moving on{code}


> AutoCompaction is not getting triggered for CamelCase Partition Values
> --
>
> Key: HIVE-24433
> URL: https://issues.apache.org/jira/browse/HIVE-24433
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
> PartionKeyValue is getting converted into lowerCase in below 2 places.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]
> Because of which TXN_COMPONENTS & HI

[jira] [Updated] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values

2020-11-25 Thread Naresh P R (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-24433:
--
Description: 
PartionKeyValue is getting converted into lowerCase in below 2 places.

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]

Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries from 
proper partition values.

When query completes, the entry moves from TXN_COMPONENTS to 
COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the partition 
& considers it as invalid partition

create table abc(name string) partitioned by(city string) stored as orc 
tblproperties('transactional'='true');

insert into abc partition(city='Bangalore') values('aaa');

Example entry in COMPLETED_TXN_COMPONENTS

 
{noformat}
+---+--++---+-+-+---+
| CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
+---+--++---+-+-+---+
|         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 | 
          1 | N                 |
+---+--++---+-+-+---+
{noformat}
 

AutoCompaction fails to get triggered with below error
{code:java}
2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator 
(Initiator.java:run(98)) - Checking to see if we should compact 
default.abc.city=bangalore
 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator 
(Initiator.java:run(155)) - Can't find partition 
default.compaction_test.city=bhubaneshwar, assuming it has been dropped and 
moving on{code}

  was:
partionKey=paritionValue is getting converted into lowerCase in below 2 places.

[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]

https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851

Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries from 
proper partition values.

When query completes, the entry moves from TXN_COMPONENTS to 
COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the partition 
& considers it as invalid partition

create table abc(name string) partitioned by(city string) stored as orc 
tblproperties('transactional'='true');

insert into abc partition(city='Bangalore') values('aaa');

Example entry in COMPLETED_TXN_COMPONENTS

 
{noformat}
+---+--++---+-+-+---+
| CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
+---+--++---+-+-+---+
|         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 | 
          1 | N                 |
+---+--++---+-+-+---+
{noformat}
 

AutoCompaction fails to get triggered with below error
2020-11-25T09:35:10,364 INFO  [Thread-9]: compactor.Initiator 
(Initiator.java:run(98)) - Checking to see if we should compact 
default.abc.city=bangalore
2020-11-25T09:35:10,380 INFO  [Thread-9]: compactor.Initiator 
(Initiator.java:run(155)) - Can't find partition 
default.compaction_test.city=bhubaneshwar, assuming it has been dropped and 
moving on


> AutoCompaction is not getting triggered for CamelCase Partition Values
> --
>
> Key: HIVE-24433
> URL: https://issues.apache.org/jira/browse/HIVE-24433
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
> PartionKeyValue is getting converted into lowerCase in below 2 places.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]
> Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having

[jira] [Assigned] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values

2020-11-25 Thread Naresh P R (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R reassigned HIVE-24433:
-


> AutoCompaction is not getting triggered for CamelCase Partition Values
> --
>
> Key: HIVE-24433
> URL: https://issues.apache.org/jira/browse/HIVE-24433
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
> partionKey=paritionValue is getting converted into lowerCase in below 2 
> places.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851
> Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries 
> from proper partition values.
> When query completes, the entry moves from TXN_COMPONENTS to 
> COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the 
> partition & considers it as invalid partition
> create table abc(name string) partitioned by(city string) stored as orc 
> tblproperties('transactional'='true');
> insert into abc partition(city='Bangalore') values('aaa');
> Example entry in COMPLETED_TXN_COMPONENTS
>  
> {noformat}
> +---+--++---+-+-+---+
> | CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
> CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
> +---+--++---+-+-+---+
> |         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 
> |           1 | N                 |
> +---+--++---+-+-+---+
> {noformat}
>  
> AutoCompaction fails to get triggered with below error
> 2020-11-25T09:35:10,364 INFO  [Thread-9]: compactor.Initiator 
> (Initiator.java:run(98)) - Checking to see if we should compact 
> default.abc.city=bangalore
> 2020-11-25T09:35:10,380 INFO  [Thread-9]: compactor.Initiator 
> (Initiator.java:run(155)) - Can't find partition 
> default.compaction_test.city=bhubaneshwar, assuming it has been dropped and 
> moving on



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24144) getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24144?focusedWorklogId=516872&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516872
 ]

ASF GitHub Bot logged work on HIVE-24144:
-

Author: ASF GitHub Bot
Created on: 26/Nov/20 00:42
Start Date: 26/Nov/20 00:42
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1487:
URL: https://github.com/apache/hive/pull/1487


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516872)
Time Spent: 40m  (was: 0.5h)

> getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value
> 
>
> Key: HIVE-24144
> URL: https://issues.apache.org/jira/browse/HIVE-24144
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC, JDBC storage handler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
>   public String getIdentifierQuoteString() throws SQLException {
> return " ";
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24073) Execution exception in sort-merge semijoin

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24073?focusedWorklogId=516871&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516871
 ]

ASF GitHub Bot logged work on HIVE-24073:
-

Author: ASF GitHub Bot
Created on: 26/Nov/20 00:42
Start Date: 26/Nov/20 00:42
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1476:
URL: https://github.com/apache/hive/pull/1476#issuecomment-734010074


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516871)
Time Spent: 20m  (was: 10m)

> Execution exception in sort-merge semijoin
> --
>
> Key: HIVE-24073
> URL: https://issues.apache.org/jira/browse/HIVE-24073
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Reporter: Jesus Camacho Rodriguez
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Working on HIVE-24041, we trigger an additional SJ conversion that leads to 
> this exception at execution time:
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1063)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:685)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:462)
>   ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
> nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1037)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1060)
>   ... 22 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to 
> overwrite nextKeyWritables[1]
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.processKey(CommonMergeJoinOperator.java:564)
>   at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:243)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
>   at 
> org.apache.hadoop.hive.ql.exec.TezDummyStoreOperator.process(TezDummyStoreOperator.java:49)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1003)
>   at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1020)
>   ... 23 more
> {code}
> To reproduce, just set {{hive.auto.convert.sortmerge.join}} to {{true}} in 
> the last query in {{auto_sortmerge_join_10.q}} after HIVE-24041 has been 
> merged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24324) Remove deprecated API usage from Avro

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24324?focusedWorklogId=516835&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516835
 ]

ASF GitHub Bot logged work on HIVE-24324:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 21:19
Start Date: 25/Nov/20 21:19
Worklog Time Spent: 10m 
  Work Description: sunchao opened a new pull request #1711:
URL: https://github.com/apache/hive/pull/1711


   
   
   ### What changes were proposed in this pull request?
   
   
   This backports #1621 to branch-3.1.
   
   This mainly replace `JsonProperties.getJsonProp` with 
`JsonProperties.getObjectProp`. 
   
   Note that there's one place in `SchemaToTypeInfo` where we explicitly call 
`getIntValue` to forbid string as precision/scale values (see 
[HIVE-7174](https://issues.apache.org/jira/browse/HIVE-7174)). To retain the 
old behavior, we check if the returned object is integer type, and if not, 
return a default 0 following `JsonNode` implementation.
   
   ### Why are the changes needed?
   
   
   `JsonProperties#getJsonProp` has been marked as deprecated in Avro 1.8 and 
removed since Avro 1.9. This replaces the API usage for this with 
`getObjectProp` which doesn't leak Json node from jackson. This will help 
downstream apps to depend on Hive while using higher version of Avro, and also 
help Hive to upgrade Avro version itself.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   Existing tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516835)
Time Spent: 1h 10m  (was: 1h)

> Remove deprecated API usage from Avro
> -
>
> Key: HIVE-24324
> URL: https://issues.apache.org/jira/browse/HIVE-24324
> Project: Hive
>  Issue Type: Improvement
>  Components: Avro
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.8, 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {{JsonProperties#getJsonProp}} has been marked as deprecated in Avro 1.8 and 
> removed since Avro 1.9. This replaces the API usage for this with 
> {{getObjectProp}} which doesn't leak Json node from jackson. This will help 
> downstream apps to depend on Hive while using higher version of Avro, and 
> also help Hive to upgrade Avro version itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24414) Backport HIVE-19662 to branch-3.1

2020-11-25 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-24414:

Fix Version/s: 3.1.3

> Backport HIVE-19662 to branch-3.1
> -
>
> Key: HIVE-24414
> URL: https://issues.apache.org/jira/browse/HIVE-24414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.3
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This JIRA proposes to backport HIVE-19662 to branch-3.1 and upgrade Avro to 
> 1.8.2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24414) Backport HIVE-19662 to branch-3.1

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24414?focusedWorklogId=516832&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516832
 ]

ASF GitHub Bot logged work on HIVE-24414:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 21:15
Start Date: 25/Nov/20 21:15
Worklog Time Spent: 10m 
  Work Description: sunchao merged pull request #1698:
URL: https://github.com/apache/hive/pull/1698


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516832)
Time Spent: 40m  (was: 0.5h)

> Backport HIVE-19662 to branch-3.1
> -
>
> Key: HIVE-24414
> URL: https://issues.apache.org/jira/browse/HIVE-24414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This JIRA proposes to backport HIVE-19662 to branch-3.1 and upgrade Avro to 
> 1.8.2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24414) Backport HIVE-19662 to branch-3.1

2020-11-25 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HIVE-24414.
-
Hadoop Flags: Reviewed
  Resolution: Fixed

> Backport HIVE-19662 to branch-3.1
> -
>
> Key: HIVE-24414
> URL: https://issues.apache.org/jira/browse/HIVE-24414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This JIRA proposes to backport HIVE-19662 to branch-3.1 and upgrade Avro to 
> 1.8.2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24414) Backport HIVE-19662 to branch-3.1

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24414?focusedWorklogId=516833&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516833
 ]

ASF GitHub Bot logged work on HIVE-24414:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 21:15
Start Date: 25/Nov/20 21:15
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1698:
URL: https://github.com/apache/hive/pull/1698#issuecomment-733948911


   Thanks @aihuaxu ! merged to branch-3.1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516833)
Time Spent: 50m  (was: 40m)

> Backport HIVE-19662 to branch-3.1
> -
>
> Key: HIVE-24414
> URL: https://issues.apache.org/jira/browse/HIVE-24414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This JIRA proposes to backport HIVE-19662 to branch-3.1 and upgrade Avro to 
> 1.8.2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24431) Null Pointer exception while sending data to jdbc

2020-11-25 Thread Fabien Carrion (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabien Carrion updated HIVE-24431:
--
Description: 
I was receiving some null pointer while writing in db:
{quote}{{ERROR : FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
3, vertexId=vertex_1604850281565_5081_1_02, diagnostics=[Task failed, 
taskId=task_1604850281565_5081_1_02_01, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Error while running task ( failure ) : 
attempt_1604850281565_5081_1_02_01_0:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row}}
{{ at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)}}
{{ at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)}}
{{ at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)}}
{{ at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)}}
{{ at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)}}
{{ at java.security.AccessController.doPrivileged(Native Method)}}
{{ at javax.security.auth.Subject.doAs(Subject.java:422)}}
{{ at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)}}
{{ at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)}}
{{ at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)}}
{{ at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)}}
{{ at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)}}
{{ at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)}}
{{ at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)}}
{{ at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)}}
{{ at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)}}
{{ at java.lang.Thread.run(Thread.java:745)}}
{{Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row}}
{{ at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:304)}}
{{ at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318)}}
{{ at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)}}
{{ ... 16 more}}
{{Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
Error while processing row}}
{{ at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:378)}}
{{ at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:294)}}
{{ ... 18 more}}
{{Caused by: java.lang.NullPointerException}}
{{ at org.apache.hive.storage.jdbc.JdbcSerDe.serialize(JdbcSerDe.java:166)}}
{{ at org.apache.hive.storage.jdbc.JdbcSerDe.serialize(JdbcSerDe.java:59)}}
{{ at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:961)}}
{{ at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)}}
{{ at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)}}
{{ at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)}}
{{ at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)}}
{{ at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)}}
{{ at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)}}
{{ at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)}}
{{ at 
org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.handleOutputRows(PTFOperator.java:337)}}
{{ at 
org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.processRow(PTFOperator.java:325)}}
{{ at org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:139)}}
{{ at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)}}
{{ at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)}}
{{ at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)}}
{{ at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)}}
{{ at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:363)}}
{{ ... 19 more}}
{quote}
I just add a check in the patch.

  was:
I was receiving some null pointer while writing in db.

I just add a check.


> Null Pointer exception while sending data to jdbc
> -
>
> Key: HIVE-24431

[jira] [Updated] (HIVE-24431) Null Pointer exception while sending data to jdbc

2020-11-25 Thread Fabien Carrion (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabien Carrion updated HIVE-24431:
--
Description: 
I was receiving some null pointer while writing in db:
{quote}ERROR : FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
3, vertexId=vertex_1604850281565_5081_1_02, diagnostics=[Task failed, 
taskId=task_1604850281565_5081_1_02_01, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Error while running task ( failure ) : 
attempt_1604850281565_5081_1_02_01_0:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
 at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
 at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
 at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
 at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
 at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
 at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
 at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
 at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
 at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row
 at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:304)
 at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318)
 at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
 ... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row
 at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:378)
 at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:294)
 ... 18 more
Caused by: java.lang.NullPointerException
 at org.apache.hive.storage.jdbc.JdbcSerDe.serialize(JdbcSerDe.java:166)
 at org.apache.hive.storage.jdbc.JdbcSerDe.serialize(JdbcSerDe.java:59)
 at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:961)
 at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)
 at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
 at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)
 at 
org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.handleOutputRows(PTFOperator.java:337)
 at 
org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.processRow(PTFOperator.java:325)
 at org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:139)
 at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)
 at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
 at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:363)
 ... 19 more
{quote}
I just add a check in the patch.

  was:
I was receiving some null pointer while writing in db:
{quote}{{ERROR : FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
3, vertexId=vertex_1604850281565_5081_1_02, diagnostics=[Task failed, 
taskId=task_1604850281565_5081_1_02_01, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Error while running task ( failure ) : 
attempt

[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=516827&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516827
 ]

ASF GitHub Bot logged work on HIVE-24432:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 20:55
Start Date: 25/Nov/20 20:55
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1710:
URL: https://github.com/apache/hive/pull/1710


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516827)
Remaining Estimate: 0h
Time Spent: 10m

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24432) Delete Notification Events in Batches

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24432:
--
Labels: pull-request-available  (was: )

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24432) Delete Notification Events in Batches

2020-11-25 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-24432:
-


> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24431) Null Pointer exception while sending data to jdbc

2020-11-25 Thread Fabien Carrion (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabien Carrion updated HIVE-24431:
--
Attachment: check_null.patch

> Null Pointer exception while sending data to jdbc
> -
>
> Key: HIVE-24431
> URL: https://issues.apache.org/jira/browse/HIVE-24431
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Affects Versions: All Versions
>Reporter: Fabien Carrion
>Priority: Trivial
> Attachments: check_null.patch
>
>
> I was receiving some null pointer while writing in db.
> I just add a check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24408) Upgrade Parquet to 1.11.1

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24408?focusedWorklogId=516803&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516803
 ]

ASF GitHub Bot logged work on HIVE-24408:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 18:39
Start Date: 25/Nov/20 18:39
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1692:
URL: https://github.com/apache/hive/pull/1692#issuecomment-733884583


   Thanks @jcamachor for merging!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516803)
Time Spent: 1h 40m  (was: 1.5h)

> Upgrade Parquet to 1.11.1
> -
>
> Key: HIVE-24408
> URL: https://issues.apache.org/jira/browse/HIVE-24408
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Parquet 1.11.1 has some bug fixes so Hive should consider to upgrade to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24430) DiskRangeInfo should make use of DiskRangeList

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24430?focusedWorklogId=516797&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516797
 ]

ASF GitHub Bot logged work on HIVE-24430:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 17:45
Start Date: 25/Nov/20 17:45
Worklog Time Spent: 10m 
  Work Description: pgaref closed pull request #1707:
URL: https://github.com/apache/hive/pull/1707


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516797)
Time Spent: 0.5h  (was: 20m)

> DiskRangeInfo should make use of DiskRangeList
> --
>
> Key: HIVE-24430
> URL: https://issues.apache.org/jira/browse/HIVE-24430
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> DiskRangeInfo should make user of DiskRangeList instead of List – 
> this will help us transition to ORC 1.6.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24430) DiskRangeInfo should make use of DiskRangeList

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24430?focusedWorklogId=516796&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516796
 ]

ASF GitHub Bot logged work on HIVE-24430:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 17:45
Start Date: 25/Nov/20 17:45
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #1709:
URL: https://github.com/apache/hive/pull/1709


   ### What changes were proposed in this pull request?
   Change DiskRangeInfo to use DiskRangeList instead of DiskRange
   
   
   ### Why are the changes needed?
   Transition to ORC 1.6 where DiskRangeList is the main class used.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Existing tests



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516796)
Time Spent: 20m  (was: 10m)

> DiskRangeInfo should make use of DiskRangeList
> --
>
> Key: HIVE-24430
> URL: https://issues.apache.org/jira/browse/HIVE-24430
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> DiskRangeInfo should make user of DiskRangeList instead of List – 
> this will help us transition to ORC 1.6.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24409) Use LazyBinarySerDe2 in PlanUtils::getReduceValueTableDesc

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24409:
--
Labels: pull-request-available  (was: )

> Use LazyBinarySerDe2 in PlanUtils::getReduceValueTableDesc
> --
>
> Key: HIVE-24409
> URL: https://issues.apache.org/jira/browse/HIVE-24409
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-11-23 at 10.52.49 AM.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> !Screenshot 2020-11-23 at 10.52.49 AM.png|width=858,height=493!  
> Lines of interest:
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java#L535]
>  (non-vectorized path due to stats)
>  
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java#L581]
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24409) Use LazyBinarySerDe2 in PlanUtils::getReduceValueTableDesc

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24409?focusedWorklogId=516787&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516787
 ]

ASF GitHub Bot logged work on HIVE-24409:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 17:33
Start Date: 25/Nov/20 17:33
Worklog Time Spent: 10m 
  Work Description: maheshk114 opened a new pull request #1708:
URL: https://github.com/apache/hive/pull/1708


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516787)
Remaining Estimate: 0h
Time Spent: 10m

> Use LazyBinarySerDe2 in PlanUtils::getReduceValueTableDesc
> --
>
> Key: HIVE-24409
> URL: https://issues.apache.org/jira/browse/HIVE-24409
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: Screenshot 2020-11-23 at 10.52.49 AM.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> !Screenshot 2020-11-23 at 10.52.49 AM.png|width=858,height=493!  
> Lines of interest:
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java#L535]
>  (non-vectorized path due to stats)
>  
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java#L581]
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24424) Use PreparedStatements in DbNotificationListener getNextNLId

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24424?focusedWorklogId=516777&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516777
 ]

ASF GitHub Bot logged work on HIVE-24424:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 17:01
Start Date: 25/Nov/20 17:01
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1704:
URL: https://github.com/apache/hive/pull/1704


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516777)
Time Spent: 0.5h  (was: 20m)

> Use PreparedStatements in DbNotificationListener getNextNLId
> 
>
> Key: HIVE-24424
> URL: https://issues.apache.org/jira/browse/HIVE-24424
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Simplify the code, remove debug logging concatenation, and make it more 
> readable,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24424) Use PreparedStatements in DbNotificationListener getNextNLId

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24424?focusedWorklogId=516776&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516776
 ]

ASF GitHub Bot logged work on HIVE-24424:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 17:00
Start Date: 25/Nov/20 17:00
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1704:
URL: https://github.com/apache/hive/pull/1704


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516776)
Time Spent: 20m  (was: 10m)

> Use PreparedStatements in DbNotificationListener getNextNLId
> 
>
> Key: HIVE-24424
> URL: https://issues.apache.org/jira/browse/HIVE-24424
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Simplify the code, remove debug logging concatenation, and make it more 
> readable,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-19253) HMS ignores tableType property for external tables

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-19253?focusedWorklogId=516771&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516771
 ]

ASF GitHub Bot logged work on HIVE-19253:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 16:57
Start Date: 25/Nov/20 16:57
Worklog Time Spent: 10m 
  Work Description: szehon-ho edited a comment on pull request #1537:
URL: https://github.com/apache/hive/pull/1537#issuecomment-733051181


   Sorry I've been awhile.  Thanks Naveen for taking a look.  I had some free 
time today to take a look at the TestHiveMetstoreTransformer but am still a bit 
lost.  I tried to set the table type to ManagedTable as you suggest, but the 
MetastoreDefaultTransformer actually transforms it back to External table by 
the time the assert happens (actually this should probably do it not via the 
properties but by the modeled TableType, but in this case it doesn't matter).  
Code: 
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDefaultTransformer.java#L596.
 So then after that , it run the asserts which fail as they seem to be testing 
for ManagedTable, unless I am mistaken.  If you have some time to let me know 
anything else to try, would appreciate it.  I haven't taken a look at Miguel's 
comments yet.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516771)
Time Spent: 2h 10m  (was: 2h)

> HMS ignores tableType property for external tables
> --
>
> Key: HIVE-19253
> URL: https://issues.apache.org/jira/browse/HIVE-19253
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.0, 3.0.0, 4.0.0
>Reporter: Alex Kolbasov
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: newbie, pull-request-available
> Attachments: HIVE-19253.01.patch, HIVE-19253.02.patch, 
> HIVE-19253.03.patch, HIVE-19253.03.patch, HIVE-19253.04.patch, 
> HIVE-19253.05.patch, HIVE-19253.06.patch, HIVE-19253.07.patch, 
> HIVE-19253.08.patch, HIVE-19253.09.patch, HIVE-19253.10.patch, 
> HIVE-19253.11.patch, HIVE-19253.12.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When someone creates a table using Thrift API they may think that setting 
> tableType to {{EXTERNAL_TABLE}} creates an external table. And boom - their 
> table is gone later because HMS will silently change it to managed table.
> here is the offending code:
> {code:java}
>   private MTable convertToMTable(Table tbl) throws InvalidObjectException,
>   MetaException {
> ...
> // If the table has property EXTERNAL set, update table type
> // accordingly
> String tableType = tbl.getTableType();
> boolean isExternal = 
> Boolean.parseBoolean(tbl.getParameters().get("EXTERNAL"));
> if (TableType.MANAGED_TABLE.toString().equals(tableType)) {
>   if (isExternal) {
> tableType = TableType.EXTERNAL_TABLE.toString();
>   }
> }
> if (TableType.EXTERNAL_TABLE.toString().equals(tableType)) {
>   if (!isExternal) { // Here!
> tableType = TableType.MANAGED_TABLE.toString();
>   }
> }
> {code}
> So if the EXTERNAL parameter is not set, table type is changed to managed 
> even if it was external in the first place - which is wrong.
> More over, in other places code looks at the table property to decide table 
> type and some places look at parameter. HMS should really make its mind which 
> one to use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24051) Hive lineage information exposed in ExecuteWithHookContext

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24051?focusedWorklogId=516769&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516769
 ]

ASF GitHub Bot logged work on HIVE-24051:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 16:56
Start Date: 25/Nov/20 16:56
Worklog Time Spent: 10m 
  Work Description: szehon-ho commented on pull request #1413:
URL: https://github.com/apache/hive/pull/1413#issuecomment-733828648


   Thanks a lot @sunchao !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516769)
Time Spent: 2h  (was: 1h 50m)

> Hive lineage information exposed in ExecuteWithHookContext
> --
>
> Key: HIVE-24051
> URL: https://issues.apache.org/jira/browse/HIVE-24051
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-24051.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The lineage information is not populated unless certain hooks are enabled.
> However, this is a bit fragile, and no way for another hook that we write to 
> get this information.  This proposes a flag to enable this instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-24430) DiskRangeInfo should make use of DiskRangeList

2020-11-25 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24430 started by Panagiotis Garefalakis.
-
> DiskRangeInfo should make use of DiskRangeList
> --
>
> Key: HIVE-24430
> URL: https://issues.apache.org/jira/browse/HIVE-24430
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> DiskRangeInfo should make user of DiskRangeList instead of List – 
> this will help us transition to ORC 1.6.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24430) DiskRangeInfo should make use of DiskRangeList

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24430:
--
Labels: pull-request-available  (was: )

> DiskRangeInfo should make use of DiskRangeList
> --
>
> Key: HIVE-24430
> URL: https://issues.apache.org/jira/browse/HIVE-24430
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> DiskRangeInfo should make user of DiskRangeList instead of List – 
> this will help us transition to ORC 1.6.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24430) DiskRangeInfo should make use of DiskRangeList

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24430?focusedWorklogId=516759&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516759
 ]

ASF GitHub Bot logged work on HIVE-24430:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 16:46
Start Date: 25/Nov/20 16:46
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #1707:
URL: https://github.com/apache/hive/pull/1707


   ### What changes were proposed in this pull request?
   Change DiskRangeInfo to use DiskRangeList instead of DiskRange
   
   
   ### Why are the changes needed?
   Transition to ORC 1.6 where DiskRangeList is the main class used.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Existing tests
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516759)
Remaining Estimate: 0h
Time Spent: 10m

> DiskRangeInfo should make use of DiskRangeList
> --
>
> Key: HIVE-24430
> URL: https://issues.apache.org/jira/browse/HIVE-24430
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> DiskRangeInfo should make user of DiskRangeList instead of List – 
> this will help us transition to ORC 1.6.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24430) DiskRangeInfo should make use of DiskRangeList

2020-11-25 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-24430:
-


> DiskRangeInfo should make use of DiskRangeList
> --
>
> Key: HIVE-24430
> URL: https://issues.apache.org/jira/browse/HIVE-24430
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>
> DiskRangeInfo should make user of DiskRangeList instead of List – 
> this will help us transition to ORC 1.6.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24414) Backport HIVE-19662 to branch-3.1

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24414?focusedWorklogId=516754&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516754
 ]

ASF GitHub Bot logged work on HIVE-24414:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 16:31
Start Date: 25/Nov/20 16:31
Worklog Time Spent: 10m 
  Work Description: aihuaxu commented on pull request #1698:
URL: https://github.com/apache/hive/pull/1698#issuecomment-733814196


   Thanks @sunchao to work on this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516754)
Time Spent: 0.5h  (was: 20m)

> Backport HIVE-19662 to branch-3.1
> -
>
> Key: HIVE-24414
> URL: https://issues.apache.org/jira/browse/HIVE-24414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This JIRA proposes to backport HIVE-19662 to branch-3.1 and upgrade Avro to 
> 1.8.2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23553) Bump ORC version to 1.6

2020-11-25 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-23553:
--
Description: 
 Apache Hive is currently on 1.5.X version and in order to take advantage of 
the latest ORC improvements such as column encryption we have to bump to 1.6.X.

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin

Even though ORC reader could work out of the box, HIVE LLAP is heavily 
depending on internal ORC APIs e.g., to retrieve and store File Footers, Tails, 
streams – un/compress RG data etc. As there ware many internal changes from 1.5 
to 1.6 (Input stream offsets, relative BufferChunks etc.) the upgrade is not 
straightforward.

This Umbrella Jira tracks this upgrade effort.

  was:
 Apache Hive is currently on 1.5.X version and in order to take advantage of 
the latest ORC improvements such as column encryption we have to bump to 1.6.X.

Even though ORC reader could work out of the box, HIVE LLAP is heavily 
depending on internal ORC APIs e.g., to retrieve and store File Footers, Tails, 
streams – un/compress RG data etc. As there ware many internal changes from 1.5 
to 1.6 (Input stream offsets, relative BufferChunks etc.) the upgrade is not 
straightforward.

This Umbrella Jira tracks this upgrade effort.


> Bump ORC version to 1.6
> ---
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288&styleName=&projectId=12318320&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23553) Bump ORC version to 1.6

2020-11-25 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-23553:
--
Description: 
 Apache Hive is currently on 1.5.X version and in order to take advantage of 
the latest ORC improvements such as column encryption we have to bump to 1.6.X.

Even though ORC reader could work out of the box, HIVE LLAP is heavily 
depending on internal ORC APIs e.g., to retrieve and store File Footers, Tails, 
streams – un/compress RG data etc. As there ware many internal changes from 1.5 
to 1.6 (Input stream offsets, relative BufferChunks etc.) the upgrade is not 
straightforward.

This Umbrella Jira tracks this upgrade effort.

  was:
 

 


> Bump ORC version to 1.6
> ---
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-23553) Bump ORC version to 1.6

2020-11-25 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23553 started by Panagiotis Garefalakis.
-
> Bump ORC version to 1.6
> ---
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23553) Bump ORC version to 1.6

2020-11-25 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-23553:
--
Parent: (was: HIVE-22731)
Issue Type: Improvement  (was: Sub-task)

> Bump ORC version to 1.6
> ---
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24410) Query-based compaction hangs because of doAs

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24410?focusedWorklogId=516730&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516730
 ]

ASF GitHub Bot logged work on HIVE-24410:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 15:18
Start Date: 25/Nov/20 15:18
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1693:
URL: https://github.com/apache/hive/pull/1693#discussion_r530450573



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -531,29 +531,26 @@ protected Boolean findNextCompactionAndExecute(boolean 
computeStats) throws Inte
   final StatsUpdater su = computeStats ? StatsUpdater.init(ci, 
msc.findColumnsWithStats(
   CompactionInfo.compactionInfoToStruct(ci)), conf,
   runJobAsSelf(ci.runAs) ? ci.runAs : t.getOwner()) : null;
-  final CompactorMR mr = new CompactorMR();
+
   try {
-if (runJobAsSelf(ci.runAs)) {
-  mr.run(conf, jobName.toString(), t, p, sd, tblValidWriteIds, ci, su, 
msc, dir);
+failCompactionIfSetForTest();

Review comment:
   Thanks @klcopp!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516730)
Time Spent: 2h 50m  (was: 2h 40m)

> Query-based compaction hangs because of doAs
> 
>
> Key: HIVE-24410
> URL: https://issues.apache.org/jira/browse/HIVE-24410
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> QB compaction runs within a doas +and+ hive.server2.enable.doAs is set to 
> true (as of HIVE-24089). On a secure cluster with Worker threads running in 
> HS2, this results in HMS client not receiving a login context during 
> compaction queries, so kerberos prompts for a login via stdin which causes 
> the worker thread to hang until it times out:
> {code:java}
> "node-x.com-44_executor" #47 daemon prio=1 os_prio=0 tid=0x01506000 
> nid=0x1348 runnable [0x7f1beea95000]
>java.lang.Thread.State: RUNNABLE
> at java.io.FileInputStream.readBytes(Native Method)
> at java.io.FileInputStream.read(FileInputStream.java:255)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0x9fa38c90> (a java.io.BufferedInputStream)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> - locked <0x8c7d5010> (a java.io.InputStreamReader)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:161)
> at java.io.BufferedReader.readLine(BufferedReader.java:324)
> - locked <0x8c7d5010> (a java.io.InputStreamReader)
> at java.io.BufferedReader.readLine(BufferedReader.java:389)
> at 
> com.sun.security.auth.callback.TextCallbackHandler.readLine(TextCallbackHandler.java:153)
> at 
> com.sun.security.auth.callback.TextCallbackHandler.handle(TextCallbackHandler.java:120)
> at 
> com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:862)
> at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:708)
> at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
> at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
> at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
> at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
> at java.security.AccessController.doPrivileged(Native Method)
> at 
> javax.security.auth.login.LoginContex

[jira] [Work logged] (HIVE-24410) Query-based compaction hangs because of doAs

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24410?focusedWorklogId=516721&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516721
 ]

ASF GitHub Bot logged work on HIVE-24410:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 14:54
Start Date: 25/Nov/20 14:54
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1693:
URL: https://github.com/apache/hive/pull/1693#discussion_r530432552



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -531,29 +531,26 @@ protected Boolean findNextCompactionAndExecute(boolean 
computeStats) throws Inte
   final StatsUpdater su = computeStats ? StatsUpdater.init(ci, 
msc.findColumnsWithStats(
   CompactionInfo.compactionInfoToStruct(ci)), conf,
   runJobAsSelf(ci.runAs) ? ci.runAs : t.getOwner()) : null;
-  final CompactorMR mr = new CompactorMR();
+
   try {
-if (runJobAsSelf(ci.runAs)) {
-  mr.run(conf, jobName.toString(), t, p, sd, tblValidWriteIds, ci, su, 
msc, dir);
+failCompactionIfSetForTest();

Review comment:
   Done: HIVE-24429





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516721)
Time Spent: 2h 40m  (was: 2.5h)

> Query-based compaction hangs because of doAs
> 
>
> Key: HIVE-24410
> URL: https://issues.apache.org/jira/browse/HIVE-24410
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> QB compaction runs within a doas +and+ hive.server2.enable.doAs is set to 
> true (as of HIVE-24089). On a secure cluster with Worker threads running in 
> HS2, this results in HMS client not receiving a login context during 
> compaction queries, so kerberos prompts for a login via stdin which causes 
> the worker thread to hang until it times out:
> {code:java}
> "node-x.com-44_executor" #47 daemon prio=1 os_prio=0 tid=0x01506000 
> nid=0x1348 runnable [0x7f1beea95000]
>java.lang.Thread.State: RUNNABLE
> at java.io.FileInputStream.readBytes(Native Method)
> at java.io.FileInputStream.read(FileInputStream.java:255)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0x9fa38c90> (a java.io.BufferedInputStream)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> - locked <0x8c7d5010> (a java.io.InputStreamReader)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:161)
> at java.io.BufferedReader.readLine(BufferedReader.java:324)
> - locked <0x8c7d5010> (a java.io.InputStreamReader)
> at java.io.BufferedReader.readLine(BufferedReader.java:389)
> at 
> com.sun.security.auth.callback.TextCallbackHandler.readLine(TextCallbackHandler.java:153)
> at 
> com.sun.security.auth.callback.TextCallbackHandler.handle(TextCallbackHandler.java:120)
> at 
> com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:862)
> at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:708)
> at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
> at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
> at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
> at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
> at java.security.AccessController.doPrivileged(Native Method)
> at 
> javax.security.auth.login.LoginContex

[jira] [Commented] (HIVE-24429) Figure out a better way to test failed compactions

2020-11-25 Thread Karen Coppage (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238778#comment-17238778
 ] 

Karen Coppage commented on HIVE-24429:
--

Requested by [~pvary] at https://github.com/apache/hive/pull/1693

> Figure out a better way to test failed compactions
> --
>
> Key: HIVE-24429
> URL: https://issues.apache.org/jira/browse/HIVE-24429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karen Coppage
>Priority: Major
>
> This block is executed during compaction: 
> {code:java}
> if(conf.getBoolVar(HiveConf.ConfVars.HIVE_IN_TEST) && 
> conf.getBoolVar(HiveConf.ConfVars.HIVETESTMODEFAILCOMPACTION)) {
>  throw new 
> RuntimeException(HiveConf.ConfVars.HIVETESTMODEFAILCOMPACTION.name() + 
> "=true");
> }{code}
> We should figure out a better way to test failed compaction than including 
> test code in the source.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24418) there is an error "java.lang.IllegalArgumentException: No columns to insert" when the result data is empty

2020-11-25 Thread HuiyuZhou (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HuiyuZhou updated HIVE-24418:
-
Description: 
i created the external hive table to link hbase, when i use hsql to insert data 
into hbase，there is an error "java.lang.IllegalArgumentException: No columns to 
insert", i search for the reason and found hbase client does not allow all 
empty column except rowkey to insert hbase.

please following the link for hbase validatePut funtion.

[https://stackoverflow.com/questions/56073332/why-hbase-client-put-object-expecting-at-least-a-column-to-be-added-before-subm]

 

i want to find any configuration to skip the error for my hsql, it seems there 
is no configuration for it.

[https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HBaseStorageHandler]

 

  was:
i created the external hive table to link hbase, when i use hsql to insert data 
into hbase，there is an error "java.lang.IllegalArgumentException: No columns to 
insert", i search for the reason and found hbase client does not allow all 
empty column except rowkey to insert hbase.

i also try to use "set hyperbase.fill.null.enable=true" to skip the error for 
my hsql, but it doest't work, how to avoid the error?

is it a bug for this?


> there is an error "java.lang.IllegalArgumentException: No columns to insert" 
> when the result data is empty
> --
>
> Key: HIVE-24418
> URL: https://issues.apache.org/jira/browse/HIVE-24418
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 1.1.0
>Reporter: HuiyuZhou
>Priority: Major
>
> i created the external hive table to link hbase, when i use hsql to insert 
> data into hbase，there is an error "java.lang.IllegalArgumentException: No 
> columns to insert", i search for the reason and found hbase client does not 
> allow all empty column except rowkey to insert hbase.
> please following the link for hbase validatePut funtion.
> [https://stackoverflow.com/questions/56073332/why-hbase-client-put-object-expecting-at-least-a-column-to-be-added-before-subm]
>  
> i want to find any configuration to skip the error for my hsql, it seems 
> there is no configuration for it.
> [https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HBaseStorageHandler]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24410) Query-based compaction hangs because of doAs

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24410?focusedWorklogId=516713&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516713
 ]

ASF GitHub Bot logged work on HIVE-24410:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 14:39
Start Date: 25/Nov/20 14:39
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1693:
URL: https://github.com/apache/hive/pull/1693#discussion_r530421562



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -531,29 +531,26 @@ protected Boolean findNextCompactionAndExecute(boolean 
computeStats) throws Inte
   final StatsUpdater su = computeStats ? StatsUpdater.init(ci, 
msc.findColumnsWithStats(
   CompactionInfo.compactionInfoToStruct(ci)), conf,
   runJobAsSelf(ci.runAs) ? ci.runAs : t.getOwner()) : null;
-  final CompactorMR mr = new CompactorMR();
+
   try {
-if (runJobAsSelf(ci.runAs)) {
-  mr.run(conf, jobName.toString(), t, p, sd, tblValidWriteIds, ci, su, 
msc, dir);
+failCompactionIfSetForTest();

Review comment:
   Yes, please raise a jira!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516713)
Time Spent: 2.5h  (was: 2h 20m)

> Query-based compaction hangs because of doAs
> 
>
> Key: HIVE-24410
> URL: https://issues.apache.org/jira/browse/HIVE-24410
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> QB compaction runs within a doas +and+ hive.server2.enable.doAs is set to 
> true (as of HIVE-24089). On a secure cluster with Worker threads running in 
> HS2, this results in HMS client not receiving a login context during 
> compaction queries, so kerberos prompts for a login via stdin which causes 
> the worker thread to hang until it times out:
> {code:java}
> "node-x.com-44_executor" #47 daemon prio=1 os_prio=0 tid=0x01506000 
> nid=0x1348 runnable [0x7f1beea95000]
>java.lang.Thread.State: RUNNABLE
> at java.io.FileInputStream.readBytes(Native Method)
> at java.io.FileInputStream.read(FileInputStream.java:255)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0x9fa38c90> (a java.io.BufferedInputStream)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> - locked <0x8c7d5010> (a java.io.InputStreamReader)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:161)
> at java.io.BufferedReader.readLine(BufferedReader.java:324)
> - locked <0x8c7d5010> (a java.io.InputStreamReader)
> at java.io.BufferedReader.readLine(BufferedReader.java:389)
> at 
> com.sun.security.auth.callback.TextCallbackHandler.readLine(TextCallbackHandler.java:153)
> at 
> com.sun.security.auth.callback.TextCallbackHandler.handle(TextCallbackHandler.java:120)
> at 
> com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:862)
> at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:708)
> at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
> at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
> at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
> at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
> at java.security.AccessController.doPrivileged(Native Method)
> at 
> javax.security.auth.login.Login

[jira] [Updated] (HIVE-24418) there is an error "java.lang.IllegalArgumentException: No columns to insert" when the result data is empty

2020-11-25 Thread HuiyuZhou (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HuiyuZhou updated HIVE-24418:
-
Affects Version/s: (was: 1.1.1)
   1.1.0

> there is an error "java.lang.IllegalArgumentException: No columns to insert" 
> when the result data is empty
> --
>
> Key: HIVE-24418
> URL: https://issues.apache.org/jira/browse/HIVE-24418
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 1.1.0
>Reporter: HuiyuZhou
>Priority: Major
>
> i created the external hive table to link hbase, when i use hsql to insert 
> data into hbase，there is an error "java.lang.IllegalArgumentException: No 
> columns to insert", i search for the reason and found hbase client does not 
> allow all empty column except rowkey to insert hbase.
> i also try to use "set hyperbase.fill.null.enable=true" to skip the error for 
> my hsql, but it doest't work, how to avoid the error?
> is it a bug for this?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24410) Query-based compaction hangs because of doAs

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24410?focusedWorklogId=516681&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516681
 ]

ASF GitHub Bot logged work on HIVE-24410:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 13:32
Start Date: 25/Nov/20 13:32
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1693:
URL: https://github.com/apache/hive/pull/1693#discussion_r530375823



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -531,29 +531,26 @@ protected Boolean findNextCompactionAndExecute(boolean 
computeStats) throws Inte
   final StatsUpdater su = computeStats ? StatsUpdater.init(ci, 
msc.findColumnsWithStats(
   CompactionInfo.compactionInfoToStruct(ci)), conf,
   runJobAsSelf(ci.runAs) ? ci.runAs : t.getOwner()) : null;
-  final CompactorMR mr = new CompactorMR();
+
   try {
-if (runJobAsSelf(ci.runAs)) {
-  mr.run(conf, jobName.toString(), t, p, sd, tblValidWriteIds, ci, su, 
msc, dir);
+failCompactionIfSetForTest();

Review comment:
   Yes, it's test code and it used to be in CompactorMr#run, I just 
refactored it here.
   Great question, shall I raise a jira for it?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516681)
Time Spent: 2h 20m  (was: 2h 10m)

> Query-based compaction hangs because of doAs
> 
>
> Key: HIVE-24410
> URL: https://issues.apache.org/jira/browse/HIVE-24410
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> QB compaction runs within a doas +and+ hive.server2.enable.doAs is set to 
> true (as of HIVE-24089). On a secure cluster with Worker threads running in 
> HS2, this results in HMS client not receiving a login context during 
> compaction queries, so kerberos prompts for a login via stdin which causes 
> the worker thread to hang until it times out:
> {code:java}
> "node-x.com-44_executor" #47 daemon prio=1 os_prio=0 tid=0x01506000 
> nid=0x1348 runnable [0x7f1beea95000]
>java.lang.Thread.State: RUNNABLE
> at java.io.FileInputStream.readBytes(Native Method)
> at java.io.FileInputStream.read(FileInputStream.java:255)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0x9fa38c90> (a java.io.BufferedInputStream)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> - locked <0x8c7d5010> (a java.io.InputStreamReader)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:161)
> at java.io.BufferedReader.readLine(BufferedReader.java:324)
> - locked <0x8c7d5010> (a java.io.InputStreamReader)
> at java.io.BufferedReader.readLine(BufferedReader.java:389)
> at 
> com.sun.security.auth.callback.TextCallbackHandler.readLine(TextCallbackHandler.java:153)
> at 
> com.sun.security.auth.callback.TextCallbackHandler.handle(TextCallbackHandler.java:120)
> at 
> com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:862)
> at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:708)
> at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
> at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
> at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
> at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
>

[jira] [Assigned] (HIVE-24428) Concurrent add_partitions requests may lead to data loss

2020-11-25 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24428:
---


> Concurrent add_partitions requests may lead to data loss
> 
>
> Key: HIVE-24428
> URL: https://issues.apache.org/jira/browse/HIVE-24428
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> in case multiple clients are adding partitions to the same table - when the 
> same partition is being added there is a chance that the data dir is removed 
> after the other client have already written its data
> https://github.com/apache/hive/blob/5e96b14a2357c66a0640254d5414bc706d8be852/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3958



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24410) Query-based compaction hangs because of doAs

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24410?focusedWorklogId=516677&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516677
 ]

ASF GitHub Bot logged work on HIVE-24410:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 13:27
Start Date: 25/Nov/20 13:27
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1693:
URL: https://github.com/apache/hive/pull/1693#discussion_r530372646



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -531,29 +531,26 @@ protected Boolean findNextCompactionAndExecute(boolean 
computeStats) throws Inte
   final StatsUpdater su = computeStats ? StatsUpdater.init(ci, 
msc.findColumnsWithStats(
   CompactionInfo.compactionInfoToStruct(ci)), conf,
   runJobAsSelf(ci.runAs) ? ci.runAs : t.getOwner()) : null;
-  final CompactorMR mr = new CompactorMR();
+
   try {
-if (runJobAsSelf(ci.runAs)) {
-  mr.run(conf, jobName.toString(), t, p, sd, tblValidWriteIds, ci, su, 
msc, dir);
+failCompactionIfSetForTest();

Review comment:
   Is this test code?
   Could we find another way to test this?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516677)
Time Spent: 2h 10m  (was: 2h)

> Query-based compaction hangs because of doAs
> 
>
> Key: HIVE-24410
> URL: https://issues.apache.org/jira/browse/HIVE-24410
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> QB compaction runs within a doas +and+ hive.server2.enable.doAs is set to 
> true (as of HIVE-24089). On a secure cluster with Worker threads running in 
> HS2, this results in HMS client not receiving a login context during 
> compaction queries, so kerberos prompts for a login via stdin which causes 
> the worker thread to hang until it times out:
> {code:java}
> "node-x.com-44_executor" #47 daemon prio=1 os_prio=0 tid=0x01506000 
> nid=0x1348 runnable [0x7f1beea95000]
>java.lang.Thread.State: RUNNABLE
> at java.io.FileInputStream.readBytes(Native Method)
> at java.io.FileInputStream.read(FileInputStream.java:255)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0x9fa38c90> (a java.io.BufferedInputStream)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> - locked <0x8c7d5010> (a java.io.InputStreamReader)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:161)
> at java.io.BufferedReader.readLine(BufferedReader.java:324)
> - locked <0x8c7d5010> (a java.io.InputStreamReader)
> at java.io.BufferedReader.readLine(BufferedReader.java:389)
> at 
> com.sun.security.auth.callback.TextCallbackHandler.readLine(TextCallbackHandler.java:153)
> at 
> com.sun.security.auth.callback.TextCallbackHandler.handle(TextCallbackHandler.java:120)
> at 
> com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:862)
> at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:708)
> at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
> at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
> at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
> at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
> at java.security.AccessController.doPrivileged(Native Method)
> a

[jira] [Work logged] (HIVE-24274) Implement Query Text based MaterializedView rewrite

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24274?focusedWorklogId=516640&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516640
 ]

ASF GitHub Bot logged work on HIVE-24274:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 12:22
Start Date: 25/Nov/20 12:22
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #1706:
URL: https://github.com/apache/hive/pull/1706


   ### What changes were proposed in this pull request?
   * Add feature: Enable materialized view rewrite of a query if the query text 
is the same as the query defined in the materialized view.
   * Enable unparsing for all queries in order to generate the expanded query 
text for comparison.
   * Refactor and extend the `HiveMaterializedViewsRegistry` with the lookup by 
query text functionality.
   
   ### Why are the changes needed?
   This patch provides an alternative way to rewrite queries using materialized 
views. Materialized view query definitions has some limitations like can't have 
`UNION`, `SORT BY` operator. These are enabled when using the text based 
rewrite.
   
   ### Does this PR introduce _any_ user-facing change?
   In some cases when rewrite was not possible because of the limitations 
mentioned above. With this patch the rewriting will be executed and it will 
have an effect of the output of `EXPLAIN`, `EXPLAIN CBO` commands: instead of 
the original query plan a scan on the materialized view will appear.
   
   ### How was this patch tested?
   ```
   mvn test -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=mv_rewrite_by_text.q,masking_14.q,masking_mv.q,schq_materialized.q,sketches_materialized_view_safety.q
 -pl itests/qtest -Pitests
   mvn test -Dtest=TestMaterializedViewsCache -pl ql
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516640)
Time Spent: 20m  (was: 10m)

> Implement Query Text based MaterializedView rewrite
> ---
>
> Key: HIVE-24274
> URL: https://issues.apache.org/jira/browse/HIVE-24274
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Besides the way queries are currently rewritten to use materialized views in 
> Hive this project provides an alternative:
> Compare the query text with the materialized views query text stored. If we 
> found a match the original query's logical plan can be replaced by a scan on 
> the materialized view.
> - Only materialized views which are enabled to rewrite can participate
> - Use existing *HiveMaterializedViewsRegistry* through *Hive* object by 
> adding a lookup method by query text.
> - There might be more than one materialized views which have the same query 
> text. In this case chose the first valid one.
> - Validation can be done by calling 
> *Hive.validateMaterializedViewsFromRegistry()*
> - The scope of this first patch is rewriting queries which entire text can be 
> matched only.
> - Use the expanded query text (fully qualified column and table names) for 
> comparing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24410) Query-based compaction hangs because of doAs

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24410?focusedWorklogId=516639&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516639
 ]

ASF GitHub Bot logged work on HIVE-24410:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 12:20
Start Date: 25/Nov/20 12:20
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1693:
URL: https://github.com/apache/hive/pull/1693#discussion_r530332125



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -590,6 +587,36 @@ public Object run() throws Exception {
 return true;
   }
 
+  private void failCompactionIfSetForTest() {
+if(conf.getBoolVar(HiveConf.ConfVars.HIVE_IN_TEST) && 
conf.getBoolVar(HiveConf.ConfVars.HIVETESTMODEFAILCOMPACTION)) {
+  throw new 
RuntimeException(HiveConf.ConfVars.HIVETESTMODEFAILCOMPACTION.name() + "=true");
+}
+  }
+
+  private void runCompactionViaMrJob(CompactionInfo ci, Table t, Partition p, 
StorageDescriptor sd,
+  ValidCompactorWriteIdList tblValidWriteIds, StringBuilder jobName, 
AcidUtils.Directory dir, StatsUpdater su)
+  throws IOException, HiveException, InterruptedException {
+final CompactorMR mr = new CompactorMR();
+if (runJobAsSelf(ci.runAs)) {
+  mr.run(conf, jobName.toString(), t, p, sd, tblValidWriteIds, ci, su, 
msc, dir);
+} else {
+  UserGroupInformation ugi = UserGroupInformation.createProxyUser(ci.runAs,
+  UserGroupInformation.getLoginUser());
+  final Partition fp = p;

Review comment:
   Done!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516639)
Time Spent: 2h  (was: 1h 50m)

> Query-based compaction hangs because of doAs
> 
>
> Key: HIVE-24410
> URL: https://issues.apache.org/jira/browse/HIVE-24410
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> QB compaction runs within a doas +and+ hive.server2.enable.doAs is set to 
> true (as of HIVE-24089). On a secure cluster with Worker threads running in 
> HS2, this results in HMS client not receiving a login context during 
> compaction queries, so kerberos prompts for a login via stdin which causes 
> the worker thread to hang until it times out:
> {code:java}
> "node-x.com-44_executor" #47 daemon prio=1 os_prio=0 tid=0x01506000 
> nid=0x1348 runnable [0x7f1beea95000]
>java.lang.Thread.State: RUNNABLE
> at java.io.FileInputStream.readBytes(Native Method)
> at java.io.FileInputStream.read(FileInputStream.java:255)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0x9fa38c90> (a java.io.BufferedInputStream)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> - locked <0x8c7d5010> (a java.io.InputStreamReader)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:161)
> at java.io.BufferedReader.readLine(BufferedReader.java:324)
> - locked <0x8c7d5010> (a java.io.InputStreamReader)
> at java.io.BufferedReader.readLine(BufferedReader.java:389)
> at 
> com.sun.security.auth.callback.TextCallbackHandler.readLine(TextCallbackHandler.java:153)
> at 
> com.sun.security.auth.callback.TextCallbackHandler.handle(TextCallbackHandler.java:120)
> at 
> com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:862)
> at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:708)
> at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> javax.security.auth.login.

[jira] [Updated] (HIVE-24383) Add Table type to HPL/SQL

2020-11-25 Thread Attila Magyar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24383:
-
Parent: HIVE-24427
Issue Type: Sub-task  (was: Improvement)

> Add Table type to HPL/SQL
> -
>
> Key: HIVE-24383
> URL: https://issues.apache.org/jira/browse/HIVE-24383
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24315) Improve validation and semantic analysis in HPL/SQL

2020-11-25 Thread Attila Magyar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24315:
-
Parent: HIVE-24427
Issue Type: Sub-task  (was: Improvement)

> Improve validation and semantic analysis in HPL/SQL 
> 
>
> Key: HIVE-24315
> URL: https://issues.apache.org/jira/browse/HIVE-24315
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> There are some known issues that need to be fixed. For example it seems that 
> arity of a function is not checked when calling it, and same is true for 
> parameter types. Calling an undefined function is evaluated to null and 
> sometimes it seems that incorrect syntax is silently ignored. 
> In cases like this a helpful error message would be expected, thought we 
> should also consider how PL/SQL works and maintain compatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24346) Store HPL/SQL packages into HMS

2020-11-25 Thread Attila Magyar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24346:
-
Parent: HIVE-24427
Issue Type: Sub-task  (was: New Feature)

> Store HPL/SQL packages into HMS
> ---
>
> Key: HIVE-24346
> URL: https://issues.apache.org/jira/browse/HIVE-24346
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-11-25 Thread Attila Magyar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24217:
-
Parent: HIVE-24427
Issue Type: Sub-task  (was: Bug)

> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HPL_SQL storedproc HMS storage.pdf
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures. 
> This is an incremental step towards having fully capable stored procedures in 
> Hive.
>  
> See the attached design for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24230) Integrate HPL/SQL into HiveServer2

2020-11-25 Thread Attila Magyar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24230:
-
Parent: HIVE-24427
Issue Type: Sub-task  (was: Bug)

> Integrate HPL/SQL into HiveServer2
> --
>
> Key: HIVE-24230
> URL: https://issues.apache.org/jira/browse/HIVE-24230
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> HPL/SQL is a standalone command line program that can store and load scripts 
> from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL 
> depends on Hive and not the other way around.
> Changing the dependency order between HPL/SQL and HiveServer would open up 
> some possibilities which are currently not feasable to implement. For example 
> one might want to use a third party SQL tool to run selects on stored 
> procedure (or rather function in this case) outputs.
> {code:java}
> SELECT * from myStoredProcedure(1, 2); {code}
> HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not 
> work with the current architecture.
> Another important factor is performance. Declarative SQL commands are sent to 
> Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC 
> and use HiveSever’s internal API for compilation and execution.
> The third factor is that existing tools like Beeline or Hue cannot be used 
> with HPL/SQL since it has its own, separated CLI.
>  
> To make it easier to implement, we keep things separated in the inside at 
> first, by introducing a hive session level JDBC parameter.
> {code:java}
> jdbc:hive2://localhost:1/default;hplsqlMode=true {code}
>  
> The hplsqlMode indicates that we are in procedural SQL mode where the user 
> can create and call stored procedures. HPLSQL allows you to write any kind of 
> procedural statement at the top level. This patch doesn't limit this but it 
> might be better to eventually restrict what statements are allowed outside of 
> stored procedures.
>  
> Since HPLSQL and Hive are running in the same process there is no need to use 
> the JDBC driver between them. The patch adds an abstraction with 2 different 
> implementations, one for executing queries on JDBC (for keeping the existing 
> behaviour) and another one for directly calling Hive's compiler. In HPLSQL 
> mode the latter is used.
> In the inside a new operation (HplSqlOperation) and operation type 
> (PROCEDURAL_SQL) was added which works similar to the SQLOperation but it 
> uses the hplsql interpreter to execute arbitrary scripts. This operation 
> might spawns new SQLOpertions.
> For example consider the following statement:
> {code:java}
> FOR i in 1..10 LOOP   
>   SELECT * FROM table 
> END LOOP;{code}
> We send this to beeline while we'er in hplsql mode. Hive will create a hplsql 
> interpreter and store it in the session state. A new HplSqlOperation is 
> created to run the script on the interpreter.
> HPLSQL knows how to execute the for loop, but i'll call Hive to run the 
> select expression. The HplSqlOperation is notified when the select reads a 
> row and accumulates the rows into a RowSet (memory consumption need to be 
> considered here) which can be retrieved via thrift from the client side.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-24427) HPL/SQL improvements

2020-11-25 Thread Attila Magyar (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24427:



> HPL/SQL improvements
> 
>
> Key: HIVE-24427
> URL: https://issues.apache.org/jira/browse/HIVE-24427
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24410) Query-based compaction hangs because of doAs

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24410?focusedWorklogId=516610&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516610
 ]

ASF GitHub Bot logged work on HIVE-24410:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 11:06
Start Date: 25/Nov/20 11:06
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1693:
URL: https://github.com/apache/hive/pull/1693#issuecomment-733638721


   LGTM +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516610)
Time Spent: 1h 50m  (was: 1h 40m)

> Query-based compaction hangs because of doAs
> 
>
> Key: HIVE-24410
> URL: https://issues.apache.org/jira/browse/HIVE-24410
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> QB compaction runs within a doas +and+ hive.server2.enable.doAs is set to 
> true (as of HIVE-24089). On a secure cluster with Worker threads running in 
> HS2, this results in HMS client not receiving a login context during 
> compaction queries, so kerberos prompts for a login via stdin which causes 
> the worker thread to hang until it times out:
> {code:java}
> "node-x.com-44_executor" #47 daemon prio=1 os_prio=0 tid=0x01506000 
> nid=0x1348 runnable [0x7f1beea95000]
>java.lang.Thread.State: RUNNABLE
> at java.io.FileInputStream.readBytes(Native Method)
> at java.io.FileInputStream.read(FileInputStream.java:255)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0x9fa38c90> (a java.io.BufferedInputStream)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> - locked <0x8c7d5010> (a java.io.InputStreamReader)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:161)
> at java.io.BufferedReader.readLine(BufferedReader.java:324)
> - locked <0x8c7d5010> (a java.io.InputStreamReader)
> at java.io.BufferedReader.readLine(BufferedReader.java:389)
> at 
> com.sun.security.auth.callback.TextCallbackHandler.readLine(TextCallbackHandler.java:153)
> at 
> com.sun.security.auth.callback.TextCallbackHandler.handle(TextCallbackHandler.java:120)
> at 
> com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:862)
> at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:708)
> at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
> at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
> at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
> at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
> at java.security.AccessController.doPrivileged(Native Method)
> at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
> at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
> at sun.security.jgss.GSSUtil.login(GSSUtil.java:258)
> at sun.security.jgss.krb5.Krb5Util.getInitialTicket(Krb5Util.java:175)
> at 
> sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:341)
> at 
> sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:337)
> at java.security.AccessController.doPrivileged(Native Method)
> at 
> sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:336)
> at 
> sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredentia

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516609&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516609
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 10:58
Start Date: 25/Nov/20 10:58
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r530284495



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java
##
@@ -186,6 +186,19 @@ public boolean isDeadlock(SQLException e) {
 || e.getMessage().contains("can't serialize access for this 
transaction";
   }
 
+  /**
+   * Is the given exception a table not found exception
+   * @param e Exception
+   * @return
+   */
+  public boolean isTableNotExists(SQLException e) {

Review comment:
   maybe rename to `isTableNotExistsError`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516609)
Time Spent: 3h  (was: 2h 50m)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24249) Create View fails if a materialized view exists with the same query

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24249?focusedWorklogId=516606&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516606
 ]

ASF GitHub Bot logged work on HIVE-24249:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 10:50
Start Date: 25/Nov/20 10:50
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #1696:
URL: https://github.com/apache/hive/pull/1696


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516606)
Time Spent: 20m  (was: 10m)

> Create View fails if a materialized view exists with the same query
> ---
>
> Key: HIVE-24249
> URL: https://issues.apache.org/jira/browse/HIVE-24249
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java}
> create table t1(col0 int) STORED AS ORC
>   TBLPROPERTIES ('transactional'='true');
> create materialized view mv1 as
> select * from t1 where col0 > 2;
> create view v1 as
> select sub.* from (select * from t1 where col0 > 2) sub
> where sub.col0 = 10;
> {code}
> The planner realize that the view definition has a subquery which match the 
> materialized view query and replaces it to the materialized view scan.
> {code:java}
> HiveProject($f0=[CAST(10):INTEGER])
>   HiveFilter(condition=[=(10, $0)])
> HiveTableScan(table=[[default, mv1]], table:alias=[default.mv1])
> {code}
> Then exception is thrown:
> {code:java}
>  org.apache.hadoop.hive.ql.parse.SemanticException: View definition 
> references materialized view default.mv1
>   at 
> org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.validateCreateView(CreateViewAnalyzer.java:211)
>   at 
> org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.analyzeInternal(CreateViewAnalyzer.java:99)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:174)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:415)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:364)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:358)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at

[jira] [Resolved] (HIVE-24249) Create View fails if a materialized view exists with the same query

2020-11-25 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-24249.
---
Resolution: Fixed

Pushed to master, thanks [~jcamachorodriguez] for review.

> Create View fails if a materialized view exists with the same query
> ---
>
> Key: HIVE-24249
> URL: https://issues.apache.org/jira/browse/HIVE-24249
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java}
> create table t1(col0 int) STORED AS ORC
>   TBLPROPERTIES ('transactional'='true');
> create materialized view mv1 as
> select * from t1 where col0 > 2;
> create view v1 as
> select sub.* from (select * from t1 where col0 > 2) sub
> where sub.col0 = 10;
> {code}
> The planner realize that the view definition has a subquery which match the 
> materialized view query and replaces it to the materialized view scan.
> {code:java}
> HiveProject($f0=[CAST(10):INTEGER])
>   HiveFilter(condition=[=(10, $0)])
> HiveTableScan(table=[[default, mv1]], table:alias=[default.mv1])
> {code}
> Then exception is thrown:
> {code:java}
>  org.apache.hadoop.hive.ql.parse.SemanticException: View definition 
> references materialized view default.mv1
>   at 
> org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.validateCreateView(CreateViewAnalyzer.java:211)
>   at 
> org.apache.hadoop.hive.ql.ddl.view.create.CreateViewAnalyzer.analyzeInternal(CreateViewAnalyzer.java:99)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:174)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:415)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:364)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:358)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at or

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516604&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516604
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 10:46
Start Date: 25/Nov/20 10:46
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r530268290



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -1166,6 +1166,55 @@ public long findMinOpenTxnIdForCleaner() throws 
MetaException {
 }
   }
 
+  /**
+   * Returns the min txnid seen open by any active transaction
+   * @deprecated remove when min_history_level is dropped
+   * @return txnId
+   * @throws MetaException ex
+   */
+  @Override
+  @RetrySemantics.Idempotent
+  @Deprecated
+  public long findMinTxnIdSeenOpen() throws MetaException {
+if (!useMinHistoryLevel) {
+  return -1L;
+}
+Connection dbConn = null;
+try {
+  try {
+dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
+long minOpenTxn;
+try (Statement stmt = dbConn.createStatement()) {
+  try (ResultSet rs = stmt.executeQuery("SELECT 
MIN(\"MHL_MIN_OPEN_TXNID\") FROM \"MIN_HISTORY_LEVEL\"")) {
+if (!rs.next()) {
+  throw new IllegalStateException("Scalar query returned no 
rows?!");

Review comment:
   is this even possible?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516604)
Time Spent: 2h 50m  (was: 2h 40m)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516600&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516600
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 10:32
Start Date: 25/Nov/20 10:32
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r530268290



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##
@@ -1166,6 +1166,55 @@ public long findMinOpenTxnIdForCleaner() throws 
MetaException {
 }
   }
 
+  /**
+   * Returns the min txnid seen open by any active transaction
+   * @deprecated remove when min_history_level is dropped
+   * @return txnId
+   * @throws MetaException ex
+   */
+  @Override
+  @RetrySemantics.Idempotent
+  @Deprecated
+  public long findMinTxnIdSeenOpen() throws MetaException {
+if (!useMinHistoryLevel) {
+  return -1L;
+}
+Connection dbConn = null;
+try {
+  try {
+dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
+long minOpenTxn;
+try (Statement stmt = dbConn.createStatement()) {
+  try (ResultSet rs = stmt.executeQuery("SELECT 
MIN(\"MHL_MIN_OPEN_TXNID\") FROM \"MIN_HISTORY_LEVEL\"")) {
+if (!rs.next()) {
+  throw new IllegalStateException("Scalar query returned no 
rows?!");

Review comment:
   is this even possible?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516600)
Time Spent: 2h 40m  (was: 2.5h)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516599&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516599
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 10:31
Start Date: 25/Nov/20 10:31
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r530238192



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java
##
@@ -232,15 +232,23 @@ public static void cleanDb(Configuration conf) throws 
Exception {
   success &= truncateTable(conn, conf, stmt, "WRITE_SET");
   success &= truncateTable(conn, conf, stmt, "REPL_TXN_MAP");
   success &= truncateTable(conn, conf, stmt, 
"MATERIALIZATION_REBUILD_LOCKS");
+  success &= truncateTable(conn, conf, stmt, "MIN_HISTORY_LEVEL");
   try {
-resetTxnSequence(conn, conf, stmt);
-stmt.executeUpdate("INSERT INTO \"NEXT_LOCK_ID\" VALUES(1)");
-stmt.executeUpdate("INSERT INTO \"NEXT_COMPACTION_QUEUE_ID\" 
VALUES(1)");
-  } catch (SQLException e) {
-if (!getTableNotExistsErrorCodes().contains(e.getSQLState())) {
-  LOG.error("Error initializing sequence values", e);
-  success = false;
+String dbProduct = conn.getMetaData().getDatabaseProductName();
+DatabaseProduct databaseProduct = determineDatabaseProduct(dbProduct, 
conf);
+try {
+  resetTxnSequence(databaseProduct, stmt);
+  stmt.executeUpdate("INSERT INTO \"NEXT_LOCK_ID\" VALUES(1)");
+  stmt.executeUpdate("INSERT INTO \"NEXT_COMPACTION_QUEUE_ID\" 
VALUES(1)");
+} catch (SQLException e) {
+  if (!databaseProduct.isTableNotExists(e)) {

Review comment:
   Previous version was much more readable and concise. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516599)
Time Spent: 2.5h  (was: 2h 20m)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24389) Trailing zeros of constant decimal numbers are removed

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24389?focusedWorklogId=516598&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516598
 ]

ASF GitHub Bot logged work on HIVE-24389:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 10:29
Start Date: 25/Nov/20 10:29
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #1676:
URL: https://github.com/apache/hive/pull/1676#discussion_r530266544



##
File path: 
ql/src/test/results/clientpositive/llap/materialized_view_rewrite_window.q.out
##
@@ -166,7 +166,7 @@ POSTHOOK: Input: arc_view@wealth
  A masked pattern was here 
 CBO PLAN:
 HiveSortLimit(sort0=[$0], dir0=[ASC])
-  HiveProject(quartile=[$0], total=[$1])
+  HiveProject(quartile=[CAST($0):DECIMAL(12, 1)], total=[$1])

Review comment:
   The optimiziter rewrites this query to use materialized view 
`mv_tv_view_data_av1`.
   The plan of the mv with this patch is changed
   from
   ```
   HiveTableScan(table=[[arc_view, mv_tv_view_data_av1]], 
table:alias=[mv_tv_view_data_av1])
   ```
   to
   ```
   LogicalProject(quartile=[CAST($0):DECIMAL(12, 1)], total=[$1])
 HiveTableScan(table=[[arc_view, mv_tv_view_data_av1]], 
table:alias=[mv_tv_view_data_av1])
   ```
   The mv definition contains a constant value cast to Decimal `cast(1.5 as 
decimal(9,4))`
   ```
   create materialized view mv_tv_view_data_av1 stored as orc TBLPROPERTIES 
('transactional'='true') as
   select
 t.quartile,
 max(t.total_views) total
   from wealth t2,
   (select
 total_views `total_views`,
 sum(cast(1.5 as decimal(9,4))) over (order by total_views) as quartile,
 program
   from tv_view_data) t
   where t.program=t2.watches
   group by quartile;
   ```
   
   We need the project with the cast on top of the mv scan because the mv table 
schema is different than the query schema. 
   RowTypes after the patch
   ```
   viewscan rowType: RecordType(DECIMAL(12, 4) quartile, BIGINT total)
   queryRel rowType: RecordType(DECIMAL(12, 1) quartile, BIGINT $f1)
   ```
   before the patch
   ```
   viewscan rowType: RecordType(DECIMAL(12, 1) quartile, BIGINT total)
   queryRel rowType: RecordType(DECIMAL(12, 1) quartile, BIGINT $f1)
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516598)
Time Spent: 1h 50m  (was: 1h 40m)

> Trailing zeros of constant decimal numbers are removed
> --
>
> Key: HIVE-24389
> URL: https://issues.apache.org/jira/browse/HIVE-24389
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In some case Hive removes trailing zeros of constant decimal numbers
> {code}
> select cast(1.1 as decimal(22, 2)) 
> 1.1
> {code}
> In this case *WritableConstantHiveDecimalObjectInspector* is used and this 
> object inspector takes it's wrapped HiveDecimal scale instead of the scale 
> specified in the wrapped typeinfo: 
> {code}
> this = {WritableConstantHiveDecimalObjectInspector@14415} 
>  value = {HiveDecimalWritable@14426} "1.1"
>  typeInfo = {DecimalTypeInfo@14421} "decimal(22,2)"{code}
> However in case of an expression with an aggregate function 
> *WritableHiveDecimalObjectInspector* is used
> {code}
> select cast(sum(1.1) as decimal(22, 2))
> 1.10
> {code}
> {code}
> o = {HiveDecimalWritable@16633} "1.1"
> oi = {WritableHiveDecimalObjectInspector@16634} 
>  typeInfo = {DecimalTypeInfo@16640} "decimal(22,2)"
> {code}
> Casting the expressions to string
> {code:java}
> select cast(cast(1.1 as decimal(22, 2)) as string), cast(cast(sum(1.1) as 
> decimal(22, 2)) as string)
> 1.1   1.10
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516596&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516596
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 10:27
Start Date: 25/Nov/20 10:27
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r530250594



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -670,6 +725,8 @@ public OpenTxnsResponse openTxns(OpenTxnRequest rqst) 
throws MetaException {
 
   assert txnIds.size() == numTxns;
 
+  addTxnToMinHistoryLevel(dbConn, txnIds, minOpenTxnId);

Review comment:
   why not to embed getMinOpenTxnIdWaterMark(dbConn) inside of 
addTxnToMinHistoryLevel and remove above minOpenTxnId block?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516596)
Time Spent: 2h 20m  (was: 2h 10m)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24245) Vectorized PTF with count and distinct over partition producing incorrect results.

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24245?focusedWorklogId=516595&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516595
 ]

ASF GitHub Bot logged work on HIVE-24245:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 10:26
Start Date: 25/Nov/20 10:26
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1649:
URL: https://github.com/apache/hive/pull/1649#issuecomment-733617093


   Thanks for the update @abstractdog !
   +1 tests pending



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516595)
Time Spent: 1h 40m  (was: 1.5h)

> Vectorized PTF with count and distinct over partition producing incorrect 
> results.
> --
>
> Key: HIVE-24245
> URL: https://issues.apache.org/jira/browse/HIVE-24245
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, PTF-Windowing, Vectorization
>Affects Versions: 3.1.0, 3.1.2
>Reporter: Chiran Ravani
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Vectorized PTF for count and distinct over partition is broken. It produces 
> incorrect results.
> Below is the test case.
> {code}
> CREATE TABLE bigd781b_new (
>   id int,
>   txt1 string,
>   txt2 string,
>   cda_date int,
>   cda_job_name varchar(12));
> INSERT INTO bigd781b_new VALUES 
>   (1,'2010005759','7164335675012038',20200528,'load1'),
>   (2,'2010005759','7164335675012038',20200528,'load2');
> {code}
> Running below query produces incorrect results
> {code}
> SELECT
> txt1,
> txt2,
> count(distinct txt1) over(partition by txt1) as n,
> count(distinct txt2) over(partition by txt2) as m
> FROM bigd781b_new
> {code}
> as below.
> {code}
> +-+---+++
> |txt1 |   txt2| n  | m  |
> +-+---+++
> | 2010005759  | 7164335675012038  | 2  | 2  |
> | 2010005759  | 7164335675012038  | 2  | 2  |
> +-+---+++
> {code}
> While the correct output would be
> {code}
> +-+---+++
> |txt1 |   txt2| n  | m  |
> +-+---+++
> | 2010005759  | 7164335675012038  | 1  | 1  |
> | 2010005759  | 7164335675012038  | 1  | 1  |
> +-+---+++
> {code}
> The problem does not appear after setting below property
> set hive.vectorized.execution.ptf.enabled=false;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516594&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516594
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 10:26
Start Date: 25/Nov/20 10:26
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r530250594



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -670,6 +725,8 @@ public OpenTxnsResponse openTxns(OpenTxnRequest rqst) 
throws MetaException {
 
   assert txnIds.size() == numTxns;
 
+  addTxnToMinHistoryLevel(dbConn, txnIds, minOpenTxnId);

Review comment:
   why not to put this under above useMinHistoryLevel check? or even embed 
getMinOpenTxnIdWaterMark(dbConn) inside of addTxnToMinHistoryLevel?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516594)
Time Spent: 2h 10m  (was: 2h)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-25 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-23965:
---

Assignee: Stamatis Zampetakis  (was: Zoltan Haindrich)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-25 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-23965:

Attachment: master355.tgz

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-25 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reopened HIVE-23965:
-

I've reverted the patch for now because it have exposed some issue with our 
test environment (master builds have stuck)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-25 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-23965:
---

Assignee: Zoltan Haindrich  (was: Stamatis Zampetakis)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24245) Vectorized PTF with count and distinct over partition producing incorrect results.

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24245?focusedWorklogId=516592&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516592
 ]

ASF GitHub Bot logged work on HIVE-24245:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 10:21
Start Date: 25/Nov/20 10:21
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1649:
URL: https://github.com/apache/hive/pull/1649#discussion_r530261010



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java
##
@@ -180,14 +181,13 @@ public void iterate(AggregationBuffer agg, Object[] 
parameters)
   if (((CountAgg) agg).uniqueObjects == null) {
 ((CountAgg) agg).uniqueObjects = new 
HashSet();
   }
-  HashSet uniqueObjs = ((CountAgg) 
agg).uniqueObjects;
+  Set uniqueObjs = ((CountAgg) 
agg).uniqueObjects;
 
   ObjectInspectorObject obj = new ObjectInspectorObject(
   ObjectInspectorUtils.copyToStandardObject(parameters, inputOI, 
ObjectInspectorCopyOption.JAVA),
   outputOI);
-  if (!uniqueObjs.contains(obj)) {
-uniqueObjs.add(obj);
-  } else {
+  boolean inserted = uniqueObjs.add(obj);
+  if (!inserted){

Review comment:
   Thanks for fixing this





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516592)
Time Spent: 1h 20m  (was: 1h 10m)

> Vectorized PTF with count and distinct over partition producing incorrect 
> results.
> --
>
> Key: HIVE-24245
> URL: https://issues.apache.org/jira/browse/HIVE-24245
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, PTF-Windowing, Vectorization
>Affects Versions: 3.1.0, 3.1.2
>Reporter: Chiran Ravani
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Vectorized PTF for count and distinct over partition is broken. It produces 
> incorrect results.
> Below is the test case.
> {code}
> CREATE TABLE bigd781b_new (
>   id int,
>   txt1 string,
>   txt2 string,
>   cda_date int,
>   cda_job_name varchar(12));
> INSERT INTO bigd781b_new VALUES 
>   (1,'2010005759','7164335675012038',20200528,'load1'),
>   (2,'2010005759','7164335675012038',20200528,'load2');
> {code}
> Running below query produces incorrect results
> {code}
> SELECT
> txt1,
> txt2,
> count(distinct txt1) over(partition by txt1) as n,
> count(distinct txt2) over(partition by txt2) as m
> FROM bigd781b_new
> {code}
> as below.
> {code}
> +-+---+++
> |txt1 |   txt2| n  | m  |
> +-+---+++
> | 2010005759  | 7164335675012038  | 2  | 2  |
> | 2010005759  | 7164335675012038  | 2  | 2  |
> +-+---+++
> {code}
> While the correct output would be
> {code}
> +-+---+++
> |txt1 |   txt2| n  | m  |
> +-+---+++
> | 2010005759  | 7164335675012038  | 1  | 1  |
> | 2010005759  | 7164335675012038  | 1  | 1  |
> +-+---+++
> {code}
> The problem does not appear after setting below property
> set hive.vectorized.execution.ptf.enabled=false;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24245) Vectorized PTF with count and distinct over partition producing incorrect results.

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24245?focusedWorklogId=516593&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516593
 ]

ASF GitHub Bot logged work on HIVE-24245:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 10:21
Start Date: 25/Nov/20 10:21
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1649:
URL: https://github.com/apache/hive/pull/1649#discussion_r530261010



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java
##
@@ -180,14 +181,13 @@ public void iterate(AggregationBuffer agg, Object[] 
parameters)
   if (((CountAgg) agg).uniqueObjects == null) {
 ((CountAgg) agg).uniqueObjects = new 
HashSet();
   }
-  HashSet uniqueObjs = ((CountAgg) 
agg).uniqueObjects;
+  Set uniqueObjs = ((CountAgg) 
agg).uniqueObjects;
 
   ObjectInspectorObject obj = new ObjectInspectorObject(
   ObjectInspectorUtils.copyToStandardObject(parameters, inputOI, 
ObjectInspectorCopyOption.JAVA),
   outputOI);
-  if (!uniqueObjs.contains(obj)) {
-uniqueObjs.add(obj);
-  } else {
+  boolean inserted = uniqueObjs.add(obj);
+  if (!inserted){

Review comment:
   Thanks for taking care of this





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516593)
Time Spent: 1.5h  (was: 1h 20m)

> Vectorized PTF with count and distinct over partition producing incorrect 
> results.
> --
>
> Key: HIVE-24245
> URL: https://issues.apache.org/jira/browse/HIVE-24245
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, PTF-Windowing, Vectorization
>Affects Versions: 3.1.0, 3.1.2
>Reporter: Chiran Ravani
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Vectorized PTF for count and distinct over partition is broken. It produces 
> incorrect results.
> Below is the test case.
> {code}
> CREATE TABLE bigd781b_new (
>   id int,
>   txt1 string,
>   txt2 string,
>   cda_date int,
>   cda_job_name varchar(12));
> INSERT INTO bigd781b_new VALUES 
>   (1,'2010005759','7164335675012038',20200528,'load1'),
>   (2,'2010005759','7164335675012038',20200528,'load2');
> {code}
> Running below query produces incorrect results
> {code}
> SELECT
> txt1,
> txt2,
> count(distinct txt1) over(partition by txt1) as n,
> count(distinct txt2) over(partition by txt2) as m
> FROM bigd781b_new
> {code}
> as below.
> {code}
> +-+---+++
> |txt1 |   txt2| n  | m  |
> +-+---+++
> | 2010005759  | 7164335675012038  | 2  | 2  |
> | 2010005759  | 7164335675012038  | 2  | 2  |
> +-+---+++
> {code}
> While the correct output would be
> {code}
> +-+---+++
> |txt1 |   txt2| n  | m  |
> +-+---+++
> | 2010005759  | 7164335675012038  | 1  | 1  |
> | 2010005759  | 7164335675012038  | 1  | 1  |
> +-+---+++
> {code}
> The problem does not appear after setting below property
> set hive.vectorized.execution.ptf.enabled=false;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516591&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516591
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 10:19
Start Date: 25/Nov/20 10:19
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r530253708



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -5094,6 +5153,99 @@ public void countOpenTxns() throws MetaException {
 }
   }
 
+  /**
+   * Add min history level entry for each generated txn record
+   * @param dbConn Connection
+   * @param txnIds new transaction ids
+   * @deprecated Remove this method when min_history_level table is dropped
+   * @throws SQLException ex
+   */
+  @Deprecated
+  private void addTxnToMinHistoryLevel(Connection dbConn, List txnIds, 
long minOpenTxnId) throws SQLException {
+if (!useMinHistoryLevel) {
+  return;
+}
+// Need to register minimum open txnid for current transactions into 
MIN_HISTORY table.
+try (Statement stmt = dbConn.createStatement()) {
+
+  List rows = txnIds.stream().map(txnId -> txnId + ", " + 
minOpenTxnId).collect(Collectors.toList());
+
+  // Insert transaction entries into MIN_HISTORY_LEVEL.
+  List inserts =
+  sqlGenerator.createInsertValuesStmt("\"MIN_HISTORY_LEVEL\" 
(\"MHL_TXNID\", \"MHL_MIN_OPEN_TXNID\")", rows);
+  for (String insert : inserts) {
+LOG.debug("Going to execute insert <" + insert + ">");
+stmt.execute(insert);
+  }
+  LOG.info("Added entries to MIN_HISTORY_LEVEL for current txns: (" + 
txnIds + ") with min_open_txn: " + minOpenTxnId);
+} catch (SQLException e) {
+  if (dbProduct.isTableNotExists(e)) {
+// If the table does not exists anymore, we disable the flag and start 
to work the new way
+// This enables to switch to the new functionality without a restart
+useMinHistoryLevel = false;

Review comment:
   Are you covering the case that schema change doesn't force any restart? 
Lot's of code duplication, can you wrap needed methods with aspect?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516591)
Time Spent: 2h  (was: 1h 50m)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516590&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516590
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 10:11
Start Date: 25/Nov/20 10:11
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r530253708



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -5094,6 +5153,99 @@ public void countOpenTxns() throws MetaException {
 }
   }
 
+  /**
+   * Add min history level entry for each generated txn record
+   * @param dbConn Connection
+   * @param txnIds new transaction ids
+   * @deprecated Remove this method when min_history_level table is dropped
+   * @throws SQLException ex
+   */
+  @Deprecated
+  private void addTxnToMinHistoryLevel(Connection dbConn, List txnIds, 
long minOpenTxnId) throws SQLException {
+if (!useMinHistoryLevel) {
+  return;
+}
+// Need to register minimum open txnid for current transactions into 
MIN_HISTORY table.
+try (Statement stmt = dbConn.createStatement()) {
+
+  List rows = txnIds.stream().map(txnId -> txnId + ", " + 
minOpenTxnId).collect(Collectors.toList());
+
+  // Insert transaction entries into MIN_HISTORY_LEVEL.
+  List inserts =
+  sqlGenerator.createInsertValuesStmt("\"MIN_HISTORY_LEVEL\" 
(\"MHL_TXNID\", \"MHL_MIN_OPEN_TXNID\")", rows);
+  for (String insert : inserts) {
+LOG.debug("Going to execute insert <" + insert + ">");
+stmt.execute(insert);
+  }
+  LOG.info("Added entries to MIN_HISTORY_LEVEL for current txns: (" + 
txnIds + ") with min_open_txn: " + minOpenTxnId);
+} catch (SQLException e) {
+  if (dbProduct.isTableNotExists(e)) {
+// If the table does not exists anymore, we disable the flag and start 
to work the new way
+// This enables to switch to the new functionality without a restart
+useMinHistoryLevel = false;

Review comment:
   Why is this needed? Are you covering the case that schema change is done 
while old HMS is still running? Lot's of code duplication, can you wrap needed 
methods with aspect?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516590)
Time Spent: 1h 50m  (was: 1h 40m)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516589&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516589
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 10:10
Start Date: 25/Nov/20 10:10
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r530253708



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -5094,6 +5153,99 @@ public void countOpenTxns() throws MetaException {
 }
   }
 
+  /**
+   * Add min history level entry for each generated txn record
+   * @param dbConn Connection
+   * @param txnIds new transaction ids
+   * @deprecated Remove this method when min_history_level table is dropped
+   * @throws SQLException ex
+   */
+  @Deprecated
+  private void addTxnToMinHistoryLevel(Connection dbConn, List txnIds, 
long minOpenTxnId) throws SQLException {
+if (!useMinHistoryLevel) {
+  return;
+}
+// Need to register minimum open txnid for current transactions into 
MIN_HISTORY table.
+try (Statement stmt = dbConn.createStatement()) {
+
+  List rows = txnIds.stream().map(txnId -> txnId + ", " + 
minOpenTxnId).collect(Collectors.toList());
+
+  // Insert transaction entries into MIN_HISTORY_LEVEL.
+  List inserts =
+  sqlGenerator.createInsertValuesStmt("\"MIN_HISTORY_LEVEL\" 
(\"MHL_TXNID\", \"MHL_MIN_OPEN_TXNID\")", rows);
+  for (String insert : inserts) {
+LOG.debug("Going to execute insert <" + insert + ">");
+stmt.execute(insert);
+  }
+  LOG.info("Added entries to MIN_HISTORY_LEVEL for current txns: (" + 
txnIds + ") with min_open_txn: " + minOpenTxnId);
+} catch (SQLException e) {
+  if (dbProduct.isTableNotExists(e)) {
+// If the table does not exists anymore, we disable the flag and start 
to work the new way
+// This enables to switch to the new functionality without a restart
+useMinHistoryLevel = false;

Review comment:
   Why is this needed? Are you covering the case that schema change is done 
while old HMS is still running?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516589)
Time Spent: 1h 40m  (was: 1.5h)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516585&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516585
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 10:05
Start Date: 25/Nov/20 10:05
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r530250594



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -670,6 +725,8 @@ public OpenTxnsResponse openTxns(OpenTxnRequest rqst) 
throws MetaException {
 
   assert txnIds.size() == numTxns;
 
+  addTxnToMinHistoryLevel(dbConn, txnIds, minOpenTxnId);

Review comment:
   why not to put this under above useMinHistoryLevel check?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516585)
Time Spent: 1.5h  (was: 1h 20m)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516583&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516583
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 10:01
Start Date: 25/Nov/20 10:01
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r530247484



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -390,6 +404,42 @@ public void setConf(Configuration conf){
 }
   }
 
+  /**
+   * Check if min_history_level table is usable
+   * @return
+   * @throws MetaException
+   */
+  private boolean checkMinHistoryLevelTable(boolean configValue) throws 
MetaException {
+if (!configValue) {
+  // don't check it if disabled
+  return false;
+}
+Connection dbConn = null;
+boolean tableExists = true;
+try {
+  dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
+  try (Statement stmt = dbConn.createStatement()) {
+// Dummy query to see if table exists
+try (ResultSet rs = stmt.executeQuery("SELECT 
MIN(\"MHL_MIN_OPEN_TXNID\") FROM \"MIN_HISTORY_LEVEL\"")) {

Review comment:
   you can just select 1





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516583)
Time Spent: 1h 20m  (was: 1h 10m)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516579&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516579
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 09:54
Start Date: 25/Nov/20 09:54
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r530242536



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java
##
@@ -385,6 +391,26 @@ public static String queryToString(Configuration conf, 
String query, boolean inc
 return sb.toString();
   }
 
+  /**
+   * This is only for testing, it does not use the connectionPool from 
TxnHandler!
+   * @param conf
+   * @param query
+   * @throws Exception
+   */
+  @VisibleForTesting
+  public static void executeUpdate(Configuration conf, String query)

Review comment:
   That's not a test class. Production code becomes massive because of that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516579)
Time Spent: 1h 10m  (was: 1h)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516576&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516576
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 09:47
Start Date: 25/Nov/20 09:47
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r530238192



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java
##
@@ -232,15 +232,23 @@ public static void cleanDb(Configuration conf) throws 
Exception {
   success &= truncateTable(conn, conf, stmt, "WRITE_SET");
   success &= truncateTable(conn, conf, stmt, "REPL_TXN_MAP");
   success &= truncateTable(conn, conf, stmt, 
"MATERIALIZATION_REBUILD_LOCKS");
+  success &= truncateTable(conn, conf, stmt, "MIN_HISTORY_LEVEL");
   try {
-resetTxnSequence(conn, conf, stmt);
-stmt.executeUpdate("INSERT INTO \"NEXT_LOCK_ID\" VALUES(1)");
-stmt.executeUpdate("INSERT INTO \"NEXT_COMPACTION_QUEUE_ID\" 
VALUES(1)");
-  } catch (SQLException e) {
-if (!getTableNotExistsErrorCodes().contains(e.getSQLState())) {
-  LOG.error("Error initializing sequence values", e);
-  success = false;
+String dbProduct = conn.getMetaData().getDatabaseProductName();
+DatabaseProduct databaseProduct = determineDatabaseProduct(dbProduct, 
conf);
+try {
+  resetTxnSequence(databaseProduct, stmt);
+  stmt.executeUpdate("INSERT INTO \"NEXT_LOCK_ID\" VALUES(1)");
+  stmt.executeUpdate("INSERT INTO \"NEXT_COMPACTION_QUEUE_ID\" 
VALUES(1)");
+} catch (SQLException e) {
+  if (!databaseProduct.isTableNotExists(e)) {

Review comment:
   Previous version was much more readable and concise. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516576)
Time Spent: 1h  (was: 50m)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516575&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516575
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 09:42
Start Date: 25/Nov/20 09:42
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r530225330



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -98,16 +98,24 @@ public void run() {
   handle = 
txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Cleaner.name());
   startedAt = System.currentTimeMillis();
   long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
+  long minTxnIdSeenOpen = txnHandler.findMinTxnIdSeenOpen();

Review comment:
   could we skip this extra db call if we have 
`metastore.txn.use.minhistorylevel=false`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516575)
Time Spent: 50m  (was: 40m)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24403) change min_history_level schema change to be compatible with previous version

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24403?focusedWorklogId=516572&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516572
 ]

ASF GitHub Bot logged work on HIVE-24403:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 09:29
Start Date: 25/Nov/20 09:29
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1688:
URL: https://github.com/apache/hive/pull/1688#discussion_r530225330



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -98,16 +98,24 @@ public void run() {
   handle = 
txnHandler.getMutexAPI().acquireLock(TxnStore.MUTEX_KEY.Cleaner.name());
   startedAt = System.currentTimeMillis();
   long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
+  long minTxnIdSeenOpen = txnHandler.findMinTxnIdSeenOpen();

Review comment:
   could we skip this extra db call if we have 
`metastore.txn.use.minhistorylevel=false`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516572)
Time Spent: 40m  (was: 0.5h)

> change min_history_level schema change to be compatible with previous version
> -
>
> Key: HIVE-24403
> URL: https://issues.apache.org/jira/browse/HIVE-24403
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In some configurations the HMS backend DB is used by HMS services with 
> different versions. 
>  HIVE-23107 dropped the min_history_level table from the backend DB making 
> the new schema version incompatible with the older HMS services. 
>  It is possible to modify that change to keep the compatibility
>  * Keep the min_history_level table
>  * Add the new fields for the compaction_queue the same way
>  * Create a feature flag for min_history_level and if it is on
>  * Keep the logic inserting to the table during openTxn
>  * Keep the logic removing the records at commitTxn and abortTxn
>  * Change the logic in the cleaner, to get the highwatermark the old way
>  * But still change it to not start the cleaning before that
>  * The txn_to_write_id table cleaning can work the new way in the new version 
> and the old way in the old version
>  * This feature flag can be automatically setup based on the existence of the 
> min_history level table, this way if the table will be dropped all HMS-s can 
> switch to the new functionality without restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24314) compactor.Cleaner should not set state "mark cleaned" if it didn't remove any files

2020-11-25 Thread Karen Coppage (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-24314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238618#comment-17238618
 ] 

Karen Coppage commented on HIVE-24314:
--

Committed to master Nov 3, 2020. Thanks for the reviews [~pvargacl] and 
[~kuczoram]!

> compactor.Cleaner should not set state "mark cleaned" if it didn't remove any 
> files
> ---
>
> Key: HIVE-24314
> URL: https://issues.apache.org/jira/browse/HIVE-24314
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If the Cleaner didn't remove any files, don't mark the compaction queue entry 
> as "succeeded" but instead leave it in "ready for cleaning" state for later 
> cleaning. If it removed at least one file, then the compaction queue entry as 
> "succeeded". This is a partial fix, HIVE-24291 is the complete fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-24314) compactor.Cleaner should not set state "mark cleaned" if it didn't remove any files

2020-11-25 Thread Karen Coppage (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage resolved HIVE-24314.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

> compactor.Cleaner should not set state "mark cleaned" if it didn't remove any 
> files
> ---
>
> Key: HIVE-24314
> URL: https://issues.apache.org/jira/browse/HIVE-24314
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If the Cleaner didn't remove any files, don't mark the compaction queue entry 
> as "succeeded" but instead leave it in "ready for cleaning" state for later 
> cleaning. If it removed at least one file, then the compaction queue entry as 
> "succeeded". This is a partial fix, HIVE-24291 is the complete fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24245) Vectorized PTF with count and distinct over partition producing incorrect results.

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24245?focusedWorklogId=516529&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516529
 ]

ASF GitHub Bot logged work on HIVE-24245:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 08:27
Start Date: 25/Nov/20 08:27
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1649:
URL: https://github.com/apache/hive/pull/1649#discussion_r530185546



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFEvaluatorCountDistinct.java
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.ptf;
+
+import java.util.HashSet;
+import java.util.Set;
+
+import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.ptf.WindowFrameDef;
+import org.apache.hive.common.util.Murmur3;
+
+import com.google.common.base.Preconditions;
+
+/**
+ * This class evaluates count(column) for a PTF group where a distinct keyword 
is applied to the
+ * partitioning column itself, e.g.:
+ * 
+ * SELECT
+ *   txt1,
+ *   txt2,
+ *   count(distinct txt1) over(partition by txt1) as n,
+ *   count(distinct txt2) over(partition by txt2) as m
+ * FROM example;
+ *
+ * In this case, the framework is still supposed to ensure sorting
+ * on the key (let's say txt1 for the first Reducer stage), but the original
+ * VectorPTFEvaluatorCount is not aware that a distinct keyword was applied
+ * to the key column. This case would be simple, because such function should
+ * return 1 every time. However, that's just a corner-case, a real scenario is
+ * when the partitioning column is not the same. In such cases, a real count
+ * distinct implementation is needed:
+ *
+ * SELECT
+ *   txt1,
+ *   txt2,
+ *   count(distinct txt2) over(partition by txt1) as n,
+ *   count(distinct txt1) over(partition by txt2) as m
+ * FROM example;
+ */
+public abstract class VectorPTFEvaluatorCountDistinct extends 
VectorPTFEvaluatorCount {
+
+  protected Set uniqueObjects;
+
+  public VectorPTFEvaluatorCountDistinct(WindowFrameDef windowFrameDef,
+  VectorExpression inputVecExpr, int outputColumnNum) {
+super(windowFrameDef, inputVecExpr, outputColumnNum);
+resetEvaluator();
+  }
+
+  @Override
+  public void evaluateGroupBatch(VectorizedRowBatch batch) throws 
HiveException {
+
+evaluateInputExpr(batch);
+
+// We do not filter when PTF is in reducer.
+Preconditions.checkState(!batch.selectedInUse);
+
+final int size = batch.size;
+if (size == 0) {
+  return;
+}
+ColumnVector colVector = batch.cols[inputColumnNum];
+if (colVector.isRepeating) {
+  if (colVector.noNulls || !colVector.isNull[0]) {
+countValue(colVector, 0);
+  }
+} else {
+  boolean[] batchIsNull = colVector.isNull;
+  for (int i = 0; i < size; i++) {
+if (!batchIsNull[i]) {
+  countValue(colVector, i);
+}
+  }
+}
+  }
+
+  protected void countValue(ColumnVector colVector, int i) {
+Object value = getValue(colVector, i);
+if (!uniqueObjects.contains(value)) {
+  uniqueObjects.add(value);

Review comment:
   btw, I copied this wrong approach from GenericUDAFCount, I'm fixing it 
there also





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516529)
Time Spent: 1h 10m  (was: 1h)

> Vectorized PTF with count and distinct over partition producing incorrect 
> results.
> -

[jira] [Work logged] (HIVE-24245) Vectorized PTF with count and distinct over partition producing incorrect results.

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24245?focusedWorklogId=516526&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516526
 ]

ASF GitHub Bot logged work on HIVE-24245:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 08:21
Start Date: 25/Nov/20 08:21
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1649:
URL: https://github.com/apache/hive/pull/1649#discussion_r530182438



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFEvaluatorBytesCountDistinct.java
##
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.ptf;
+
+import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
+import org.apache.hadoop.hive.ql.plan.ptf.WindowFrameDef;
+import org.apache.hive.common.util.Murmur3;
+
+/**
+ * Bytes (String) implementation for VectorPTFEvaluatorCountDistinct.
+ */
+public class VectorPTFEvaluatorBytesCountDistinct extends 
VectorPTFEvaluatorCountDistinct {
+
+  public VectorPTFEvaluatorBytesCountDistinct(WindowFrameDef windowFrameDef,
+  VectorExpression inputVecExpr, int outputColumnNum) {
+super(windowFrameDef, inputVecExpr, outputColumnNum);
+resetEvaluator();
+  }
+
+  protected Object getValue(ColumnVector colVector, int i) {
+BytesColumnVector inV = (BytesColumnVector) colVector;
+return Murmur3.hash32(inV.vector[i], inV.start[i], inV.length[i], 
Murmur3.DEFAULT_SEED);

Review comment:
   I used hashing for memory considerations, I was hoping that this way the 
unique set will consume less memory (storing hashes instead of strings of 
arbitrary length)...now I think that we could be fine with storing strings and 
prevent additional hashing, as we tend to optimize CPU cycles instead of memory 
in the very-first round...I'll change this to new String(byte[])





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516526)
Time Spent: 1h  (was: 50m)

> Vectorized PTF with count and distinct over partition producing incorrect 
> results.
> --
>
> Key: HIVE-24245
> URL: https://issues.apache.org/jira/browse/HIVE-24245
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, PTF-Windowing, Vectorization
>Affects Versions: 3.1.0, 3.1.2
>Reporter: Chiran Ravani
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Vectorized PTF for count and distinct over partition is broken. It produces 
> incorrect results.
> Below is the test case.
> {code}
> CREATE TABLE bigd781b_new (
>   id int,
>   txt1 string,
>   txt2 string,
>   cda_date int,
>   cda_job_name varchar(12));
> INSERT INTO bigd781b_new VALUES 
>   (1,'2010005759','7164335675012038',20200528,'load1'),
>   (2,'2010005759','7164335675012038',20200528,'load2');
> {code}
> Running below query produces incorrect results
> {code}
> SELECT
> txt1,
> txt2,
> count(distinct txt1) over(partition by txt1) as n,
> count(distinct txt2) over(partition by txt2) as m
> FROM bigd781b_new
> {code}
> as below.
> {code}
> +-+---+++
> |txt1 |   txt2| n  | m  |
> +-+---+++
> | 2010005759  | 7164335675012038  | 2  | 2  |
> | 2010005759  | 7164335675012038  | 2  | 2  |
> +-+---+++
> {code}
> While the correct output would be
> {code}
> +

[jira] [Work logged] (HIVE-24245) Vectorized PTF with count and distinct over partition producing incorrect results.

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24245?focusedWorklogId=516522&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516522
 ]

ASF GitHub Bot logged work on HIVE-24245:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 08:17
Start Date: 25/Nov/20 08:17
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1649:
URL: https://github.com/apache/hive/pull/1649#discussion_r530179906



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFEvaluatorCountDistinct.java
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.ptf;
+
+import java.util.HashSet;
+import java.util.Set;
+
+import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.ptf.WindowFrameDef;
+import org.apache.hive.common.util.Murmur3;
+
+import com.google.common.base.Preconditions;
+
+/**
+ * This class evaluates count(column) for a PTF group where a distinct keyword 
is applied to the
+ * partitioning column itself, e.g.:
+ * 
+ * SELECT
+ *   txt1,
+ *   txt2,
+ *   count(distinct txt1) over(partition by txt1) as n,
+ *   count(distinct txt2) over(partition by txt2) as m
+ * FROM example;
+ *
+ * In this case, the framework is still supposed to ensure sorting
+ * on the key (let's say txt1 for the first Reducer stage), but the original
+ * VectorPTFEvaluatorCount is not aware that a distinct keyword was applied
+ * to the key column. This case would be simple, because such function should
+ * return 1 every time. However, that's just a corner-case, a real scenario is
+ * when the partitioning column is not the same. In such cases, a real count
+ * distinct implementation is needed:
+ *
+ * SELECT
+ *   txt1,
+ *   txt2,
+ *   count(distinct txt2) over(partition by txt1) as n,
+ *   count(distinct txt1) over(partition by txt2) as m
+ * FROM example;
+ */
+public abstract class VectorPTFEvaluatorCountDistinct extends 
VectorPTFEvaluatorCount {
+
+  protected Set uniqueObjects;
+
+  public VectorPTFEvaluatorCountDistinct(WindowFrameDef windowFrameDef,
+  VectorExpression inputVecExpr, int outputColumnNum) {
+super(windowFrameDef, inputVecExpr, outputColumnNum);
+resetEvaluator();
+  }
+
+  @Override
+  public void evaluateGroupBatch(VectorizedRowBatch batch) throws 
HiveException {
+
+evaluateInputExpr(batch);
+
+// We do not filter when PTF is in reducer.
+Preconditions.checkState(!batch.selectedInUse);
+
+final int size = batch.size;
+if (size == 0) {
+  return;
+}
+ColumnVector colVector = batch.cols[inputColumnNum];
+if (colVector.isRepeating) {
+  if (colVector.noNulls || !colVector.isNull[0]) {
+countValue(colVector, 0);
+  }
+} else {
+  boolean[] batchIsNull = colVector.isNull;
+  for (int i = 0; i < size; i++) {
+if (!batchIsNull[i]) {
+  countValue(colVector, i);

Review comment:
   right! updating it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516522)
Time Spent: 50m  (was: 40m)

> Vectorized PTF with count and distinct over partition producing incorrect 
> results.
> --
>
> Key: HIVE-24245
> URL: https://issues.apache.org/jira/browse/HIVE-24245
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, PTF-Windowing, Vectorization
>

[jira] [Work logged] (HIVE-24245) Vectorized PTF with count and distinct over partition producing incorrect results.

2020-11-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24245?focusedWorklogId=516520&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-516520
 ]

ASF GitHub Bot logged work on HIVE-24245:
-

Author: ASF GitHub Bot
Created on: 25/Nov/20 08:14
Start Date: 25/Nov/20 08:14
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1649:
URL: https://github.com/apache/hive/pull/1649#discussion_r530178604



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ptf/VectorPTFEvaluatorCountDistinct.java
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.ptf;
+
+import java.util.HashSet;
+import java.util.Set;
+
+import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.plan.ptf.WindowFrameDef;
+import org.apache.hive.common.util.Murmur3;
+
+import com.google.common.base.Preconditions;
+
+/**
+ * This class evaluates count(column) for a PTF group where a distinct keyword 
is applied to the
+ * partitioning column itself, e.g.:
+ * 
+ * SELECT
+ *   txt1,
+ *   txt2,
+ *   count(distinct txt1) over(partition by txt1) as n,
+ *   count(distinct txt2) over(partition by txt2) as m
+ * FROM example;
+ *
+ * In this case, the framework is still supposed to ensure sorting
+ * on the key (let's say txt1 for the first Reducer stage), but the original
+ * VectorPTFEvaluatorCount is not aware that a distinct keyword was applied
+ * to the key column. This case would be simple, because such function should
+ * return 1 every time. However, that's just a corner-case, a real scenario is
+ * when the partitioning column is not the same. In such cases, a real count
+ * distinct implementation is needed:
+ *
+ * SELECT
+ *   txt1,
+ *   txt2,
+ *   count(distinct txt2) over(partition by txt1) as n,
+ *   count(distinct txt1) over(partition by txt2) as m
+ * FROM example;
+ */
+public abstract class VectorPTFEvaluatorCountDistinct extends 
VectorPTFEvaluatorCount {
+
+  protected Set uniqueObjects;
+
+  public VectorPTFEvaluatorCountDistinct(WindowFrameDef windowFrameDef,
+  VectorExpression inputVecExpr, int outputColumnNum) {
+super(windowFrameDef, inputVecExpr, outputColumnNum);
+resetEvaluator();
+  }
+
+  @Override
+  public void evaluateGroupBatch(VectorizedRowBatch batch) throws 
HiveException {
+
+evaluateInputExpr(batch);
+
+// We do not filter when PTF is in reducer.
+Preconditions.checkState(!batch.selectedInUse);
+
+final int size = batch.size;
+if (size == 0) {
+  return;
+}
+ColumnVector colVector = batch.cols[inputColumnNum];
+if (colVector.isRepeating) {
+  if (colVector.noNulls || !colVector.isNull[0]) {
+countValue(colVector, 0);
+  }
+} else {
+  boolean[] batchIsNull = colVector.isNull;
+  for (int i = 0; i < size; i++) {
+if (!batchIsNull[i]) {
+  countValue(colVector, i);
+}
+  }
+}
+  }
+
+  protected void countValue(ColumnVector colVector, int i) {
+Object value = getValue(colVector, i);
+if (!uniqueObjects.contains(value)) {
+  uniqueObjects.add(value);

Review comment:
   yeah, thanks, forgot that Set takes care of uniqueness :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 516520)
Time Spent: 40m  (was: 0.5h)

> Vectorized PTF with count and distinct over partition producing incorrect 
> results.
> ---

91 matches

Mail list logo