[jira] [Work logged] (HIVE-27288) Backport of HIVE-23262 : Remove dependency on activemq

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27288?focusedWorklogId=859050=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859050
 ]

ASF GitHub Bot logged work on HIVE-27288:
-

Author: ASF GitHub Bot
Created on: 26/Apr/23 05:46
Start Date: 26/Apr/23 05:46
Worklog Time Spent: 10m 
  Work Description: amanraj2520 commented on PR #4261:
URL: https://github.com/apache/hive/pull/4261#issuecomment-1522819769

   @vihangk1 Can you please approve and merge this




Issue Time Tracking
---

Worklog Id: (was: 859050)
Time Spent: 20m  (was: 10m)

> Backport of HIVE-23262 : Remove dependency on activemq
> --
>
> Key: HIVE-27288
> URL: https://issues.apache.org/jira/browse/HIVE-27288
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27186) A persistent property store

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=859049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859049
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 26/Apr/23 05:40
Start Date: 26/Apr/23 05:40
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4194:
URL: https://github.com/apache/hive/pull/4194#issuecomment-1522815351

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4194)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=BUG)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=BUG)
 [4 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4194=false=SECURITY_HOTSPOT)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4194=false=SECURITY_HOTSPOT)
 [1 Security 
Hotspot](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4194=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=CODE_SMELL)
 [103 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4194=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4194=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 859049)
Time Spent: 13h 20m  (was: 13h 10m)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> information linking the feature data and the HiveMetaStore objects it applies 
> to. Any information related to a database, a table or the cluster - like 
> statistics for example or any operational data state or data (think rolling 
> backup) -  fall in this use-case.
> Typically, accommodating such a feature requires modifying the Metastore 
> database schema by adding or altering a table. It also usually implies 
> modifying the thrift APIs to expose such meta-data to consumers.
> The proposed feature wants to solve the persistence and query/transport for 
> these types of use-cases by exposing a 'key/(meta)value' store exposed as a 
> property system.
> HOW
> A property-value model is the simple and generic exposed API.
> To provision for several usage scenarios, 

[jira] [Work logged] (HIVE-27292) Upgrade Zookeeper to 3.7.1

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27292?focusedWorklogId=859048=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859048
 ]

ASF GitHub Bot logged work on HIVE-27292:
-

Author: ASF GitHub Bot
Created on: 26/Apr/23 05:36
Start Date: 26/Apr/23 05:36
Worklog Time Spent: 10m 
  Work Description: amanraj2520 commented on PR #4264:
URL: https://github.com/apache/hive/pull/4264#issuecomment-1522812874

   @TuroczyX Raised this draft PR - https://github.com/apache/hive/pull/4270. 
Will upgrade it after testing. Meanwhile if this 3.7.1 change looks good to you 
can you please approve this.




Issue Time Tracking
---

Worklog Id: (was: 859048)
Time Spent: 1h  (was: 50m)

> Upgrade Zookeeper to 3.7.1
> --
>
> Key: HIVE-27292
> URL: https://issues.apache.org/jira/browse/HIVE-27292
> Project: Hive
>  Issue Type: Bug
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Upgrade Zookeper from 3.6.3 to 3.7.1 since 3.6.3 is in end of life. 
> https://endoflife.date/zookeeper



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27195) Drop table if Exists . fails during authorization for temporary tables

2023-04-25 Thread Srinivasu Majeti (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716536#comment-17716536
 ] 

Srinivasu Majeti commented on HIVE-27195:
-

Hi [~rtrivedi12] , Is this planned to be fixed in near future ?

> Drop table if Exists . fails during authorization for 
> temporary tables
> ---
>
> Key: HIVE-27195
> URL: https://issues.apache.org/jira/browse/HIVE-27195
> Project: Hive
>  Issue Type: Bug
>Reporter: Riju Trivedi
>Assignee: Srinivasu Majeti
>Priority: Major
>
> https://issues.apache.org/jira/browse/HIVE-20051 handles skipping 
> authorization for temporary tables. But still, the drop table if Exists fails 
> with  HiveAccessControlException.
> Steps to Repro:
> {code:java}
> use test; CREATE TEMPORARY TABLE temp_table (id int);
> drop table if exists test.temp_table;
> Error: Error while compiling statement: FAILED: HiveAccessControlException 
> Permission denied: user [rtrivedi] does not have [DROP] privilege on 
> [test/temp_table] (state=42000,code=4) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27195) Drop table if Exists . fails during authorization for temporary tables

2023-04-25 Thread Srinivasu Majeti (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivasu Majeti reassigned HIVE-27195:
---

Assignee: Srinivasu Majeti  (was: Riju Trivedi)

> Drop table if Exists . fails during authorization for 
> temporary tables
> ---
>
> Key: HIVE-27195
> URL: https://issues.apache.org/jira/browse/HIVE-27195
> Project: Hive
>  Issue Type: Bug
>Reporter: Riju Trivedi
>Assignee: Srinivasu Majeti
>Priority: Major
>
> https://issues.apache.org/jira/browse/HIVE-20051 handles skipping 
> authorization for temporary tables. But still, the drop table if Exists fails 
> with  HiveAccessControlException.
> Steps to Repro:
> {code:java}
> use test; CREATE TEMPORARY TABLE temp_table (id int);
> drop table if exists test.temp_table;
> Error: Error while compiling statement: FAILED: HiveAccessControlException 
> Permission denied: user [rtrivedi] does not have [DROP] privilege on 
> [test/temp_table] (state=42000,code=4) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27287) Upgrade Commons-text to 1.10.0 to fix CVE

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27287?focusedWorklogId=859046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859046
 ]

ASF GitHub Bot logged work on HIVE-27287:
-

Author: ASF GitHub Bot
Created on: 26/Apr/23 05:13
Start Date: 26/Apr/23 05:13
Worklog Time Spent: 10m 
  Work Description: Aggarwal-Raghav commented on PR #4260:
URL: https://github.com/apache/hive/pull/4260#issuecomment-1522795258

   @deniskuzZ, can you help in getting this merged also. Thanks!




Issue Time Tracking
---

Worklog Id: (was: 859046)
Time Spent: 1h 10m  (was: 1h)

> Upgrade Commons-text to 1.10.0 to fix CVE
> -
>
> Key: HIVE-27287
> URL: https://issues.apache.org/jira/browse/HIVE-27287
> Project: Hive
>  Issue Type: Improvement
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Apache Commons Text versions prior to 1.8 is vulnerable to 
> [CVE-2022-42889|https://nvd.nist.gov/vuln/detail/CVE-2022-42889], which 
> involves potential script execution when processing untrusted input using 
> {{{}StringLookup{}}}. Direct and transitive references to Apache Commons Text 
> prior to 1.10.0 should be upgraded to avoid the default interpolation 
> behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27032) Introduce liquibase for HMS schema evolution

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27032?focusedWorklogId=859045=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859045
 ]

ASF GitHub Bot logged work on HIVE-27032:
-

Author: ASF GitHub Bot
Created on: 26/Apr/23 04:49
Start Date: 26/Apr/23 04:49
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4060:
URL: https://github.com/apache/hive/pull/4060#issuecomment-1522776244

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4060)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4060=false=SECURITY_HOTSPOT)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4060=false=SECURITY_HOTSPOT)
 [5 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4060=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=CODE_SMELL)
 [208 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4060=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4060=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 859045)
Time Spent: 3h 40m  (was: 3.5h)

> Introduce liquibase for HMS schema evolution
> 
>
> Key: HIVE-27032
> URL: https://issues.apache.org/jira/browse/HIVE-27032
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Introduce liquibase, and replace current upgrade procedure with it.
> The Schematool CLI API should remain untouched, while under the hood, 
> liquibase should be used for HMS schema evolution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-18322) RetryingMetaStoreClient reconnect should not use ugi.doAs if not necessary

2023-04-25 Thread katty he (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716515#comment-17716515
 ] 

katty he commented on HIVE-18322:
-

this project is stucking? i also met  this problem userX is not allowed to 
impersonate userX

> RetryingMetaStoreClient reconnect should not use ugi.doAs if not necessary
> --
>
> Key: HIVE-18322
> URL: https://issues.apache.org/jira/browse/HIVE-18322
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas Nair
>Priority: Major
>
> As commented in HIVE-17853 , RetryingMetaStoreClient should also check to see 
> if current user is same as the original UGI user, and not do the ugi.doAs() 
> if it is the same. Otherwise, this can potentially cause problems where the 
> users are not privileged users (ie, there is no intent to do a "doAs").
> Without such a check, you would get errors like " userX is not allowed to 
> impersonate userX".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27285) Add TableMeta ownership for filterTableMetas

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27285?focusedWorklogId=859042=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859042
 ]

ASF GitHub Bot logged work on HIVE-27285:
-

Author: ASF GitHub Bot
Created on: 26/Apr/23 03:43
Start Date: 26/Apr/23 03:43
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4258:
URL: https://github.com/apache/hive/pull/4258#discussion_r1177302927


##
ql/src/test/queries/clientpositive/authorization_privilege_objects.q:
##
@@ -0,0 +1,20 @@
+--! qt:authorizer
+set 
hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
+set test.hive.authz.sstd.validator.outputPrivObjs=true;
+set hive.test.authz.sstd.hs2.mode=true;
+set user.name=testuser;
+
+CREATE DATABASE test_db;
+CREATE TABLE test_privs(i int);
+set user.name=testuser2;
+CREATE TABLE test_privs2(s string, i int);
+set user.name=testuser;
+SHOW DATABASES;
+SHOW TABLES;

Review Comment:
   I see, we also push owner info to authorization lawyer for 
`filter(List)`





Issue Time Tracking
---

Worklog Id: (was: 859042)
Time Spent: 2h 40m  (was: 2.5h)

> Add TableMeta ownership for filterTableMetas
> 
>
> Key: HIVE-27285
> URL: https://issues.apache.org/jira/browse/HIVE-27285
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently TableMeta does not include ownership information which makes it 
> difficult for filterTableMetas to efficiently filter based on ${OWNER} 
> privileges.
> We should add ownership information to TableMeta and utilizing it in 
> filterTableMetas authorization checks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27273) Iceberg: Upgrade iceberg to 1.2.1

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27273?focusedWorklogId=859037=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859037
 ]

ASF GitHub Bot logged work on HIVE-27273:
-

Author: ASF GitHub Bot
Created on: 26/Apr/23 01:59
Start Date: 26/Apr/23 01:59
Worklog Time Spent: 10m 
  Work Description: zhangbutao commented on PR #4252:
URL: https://github.com/apache/hive/pull/4252#issuecomment-1522652634

   @InvisibleProgrammer Thanks for sharing your thought!  The two commits 
`fede493d59f17ff2bfc0744b296d90bd36130386` and 
`333227fbd13821365cec1bdbfcb9314a239bea0f`  are realy hard to deal. We shoud 
carefully handle these commits. I also hope the author of two commits can give 
us some info  to evaluate if it is very necessary to port in current Hive 
`iceberg-handler` moduel @pvary .
   
   
   > And also, based on my experience, there can be old changes that haven't 
been ported at all.
   
   I think that is possible. That's great if you can find these valuable old 
changes. Thanks for your hard work!




Issue Time Tracking
---

Worklog Id: (was: 859037)
Time Spent: 1.5h  (was: 1h 20m)

> Iceberg:  Upgrade iceberg to 1.2.1
> --
>
> Key: HIVE-27273
> URL: https://issues.apache.org/jira/browse/HIVE-27273
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> [https://iceberg.apache.org/releases/#121-release] Iceberg1.2.1(include 
> 1.2.0) has lots of improvement, e.g. _branch commit_  and 
> _{{position_deletes}} metadata table._



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27281) Add ability of masking to Beeline q-tests

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27281?focusedWorklogId=859036=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859036
 ]

ASF GitHub Bot logged work on HIVE-27281:
-

Author: ASF GitHub Bot
Created on: 26/Apr/23 01:09
Start Date: 26/Apr/23 01:09
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4254:
URL: https://github.com/apache/hive/pull/4254#issuecomment-1522620645

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4254)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4254=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4254=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4254=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4254=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4254=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4254=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4254=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4254=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4254=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4254=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4254=false=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4254=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4254=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4254=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 859036)
Time Spent: 40m  (was: 0.5h)

> Add ability of masking to Beeline q-tests
> -
>
> Key: HIVE-27281
> URL: https://issues.apache.org/jira/browse/HIVE-27281
> Project: Hive
>  Issue Type: Improvement
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27172) Add the HMS client connection timeout config

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27172?focusedWorklogId=859035=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859035
 ]

ASF GitHub Bot logged work on HIVE-27172:
-

Author: ASF GitHub Bot
Created on: 26/Apr/23 00:08
Start Date: 26/Apr/23 00:08
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4150:
URL: https://github.com/apache/hive/pull/4150#issuecomment-1522566937

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4150)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4150=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4150=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4150=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4150=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4150=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4150=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4150=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4150=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4150=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4150=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4150=false=CODE_SMELL)
 [1 Code 
Smell](https://sonarcloud.io/project/issues?id=apache_hive=4150=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4150=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4150=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 859035)
Time Spent: 2h 10m  (was: 2h)

> Add the HMS client connection timeout config
> 
>
> Key: HIVE-27172
> URL: https://issues.apache.org/jira/browse/HIVE-27172
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently {{HiveMetastoreClient}} use {{CLIENT_SOCKET_TIMEOUT}} as both 
> socket timeout and connection timeout, it's not convenient for users to set a 
> smaller connection timeout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27298) Provide implementation of HMS thrift service that throws UnsupportedOperationException

2023-04-25 Thread John Sherman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716473#comment-17716473
 ] 

John Sherman edited comment on HIVE-27298 at 4/26/23 12:03 AM:
---

I'm inclined to go with:
a) since AbstractThriftHiveMetastore provides no value otherwise (users should 
be implementing against ThriftHiveMetastore.Iface for similar guarantees that 
AbstractThriftHiveMetastore today provides). The class name may be confusing, 
but I also do not associate a class having Abstract in the name as being an 
abstract class (others might though).

However, b has the advantage that it doesn't change the behavior that 
downstream implementations may rely on (even if I think it is pointless). 
Though, I work on one of the downstream implementations (IMPALA) and 
AbstractThriftHiveMetastore was added to support our use case. I think Impala 
is the only user of this class but I can't prove that.

This comment from AbstractThriftHiveMetastore illustrates the intent that the 
current implementation actually does not provide:
{code}
/**
 * This abstract class can be extended by any remote cache that implements HMS 
APIs.
 * The idea behind introducing this abstract class is not to break the build of 
remote cache,
 * whenever we add new HMS APIs.
 */
{code}


was (Author: jfs):
I'm inclined to go with:
a) since AbstractThriftHiveMetastore provides no value otherwise (users should 
be implementing against ThriftHiveMetastore.Iface for similar guarantees that 
AbstractThriftHiveMetastore today provides). The class name may be confusing, 
but I also do not associate a class having Abstract in the name as being an 
abstract class (others might though).

However, b has the advantage that it doesn't change the behavior that 
downstream implementations may rely on (even if I think it is pointless). 
Though, I work on one of the downstream implementations (IMPALA) and 
AbstractThriftHiveMetastore was added to support our use case. I think Impala 
is the only user of this class but I can't prove that.

> Provide implementation of HMS thrift service that throws 
> UnsupportedOperationException
> --
>
> Key: HIVE-27298
> URL: https://issues.apache.org/jira/browse/HIVE-27298
> Project: Hive
>  Issue Type: Improvement
>Reporter: John Sherman
>Priority: Major
>
> The intent of HIVE-25005 and AbstractThriftHiveMetastore class was to provide 
> default implementation for every HMS thrift method that throws 
> UnsupportedOperationException.
> However - HIVE-25005 made the class abstract, so as a result there are a 
> variety of HMS service methods that are not implemented in 
> AbstractThriftHiveMetastore.
> We should either:
> a) remove abstract from the class definition AbstractThriftHiveMetastore
> or
> b) add a new class that is not abstract.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27298) Provide implementation of HMS thrift service that throws UnsupportedOperationException

2023-04-25 Thread John Sherman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716473#comment-17716473
 ] 

John Sherman commented on HIVE-27298:
-

I'm inclined to go with:
a) since AbstractThriftHiveMetastore provides no value otherwise (users should 
be implementing against ThriftHiveMetastore.Iface for similar guarantees that 
AbstractThriftHiveMetastore today provides). The class name may be confusing, 
but I also do not associate a class having Abstract in the name as being an 
abstract class (others might though).

However, b has the advantage that it doesn't change the behavior that 
downstream implementations may rely on (even if I think it is pointless). 
Though, I work on one of the downstream implementations (IMPALA) and 
AbstractThriftHiveMetastore was added to support our use case. I think Impala 
is the only user of this class but I can't prove that.

> Provide implementation of HMS thrift service that throws 
> UnsupportedOperationException
> --
>
> Key: HIVE-27298
> URL: https://issues.apache.org/jira/browse/HIVE-27298
> Project: Hive
>  Issue Type: Improvement
>Reporter: John Sherman
>Priority: Major
>
> The intent of HIVE-25005 and AbstractThriftHiveMetastore class was to provide 
> default implementation for every HMS thrift method that throws 
> UnsupportedOperationException.
> However - HIVE-25005 made the class abstract, so as a result there are a 
> variety of HMS service methods that are not implemented in 
> AbstractThriftHiveMetastore.
> We should either:
> a) remove abstract from the class definition AbstractThriftHiveMetastore
> or
> b) add a new class that is not abstract.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27298) Provide implementation of HMS thrift service that throws UnsupportedOperationException

2023-04-25 Thread John Sherman (Jira)
John Sherman created HIVE-27298:
---

 Summary: Provide implementation of HMS thrift service that throws 
UnsupportedOperationException
 Key: HIVE-27298
 URL: https://issues.apache.org/jira/browse/HIVE-27298
 Project: Hive
  Issue Type: Improvement
Reporter: John Sherman


The intent of HIVE-25005 and AbstractThriftHiveMetastore class was to provide 
default implementation for every HMS thrift method that throws 
UnsupportedOperationException.
However - HIVE-25005 made the class abstract, so as a result there are a 
variety of HMS service methods that are not implemented in 
AbstractThriftHiveMetastore.

We should either:
a) remove abstract from the class definition AbstractThriftHiveMetastore
or
b) add a new class that is not abstract.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27295) Improve docker logging in AbstractExternalDB

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27295?focusedWorklogId=859034=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859034
 ]

ASF GitHub Bot logged work on HIVE-27295:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 23:16
Start Date: 25/Apr/23 23:16
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4268:
URL: https://github.com/apache/hive/pull/4268#issuecomment-1522534529

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4268)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4268=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=4268=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=4268=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4268=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4268=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4268=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4268=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4268=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4268=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4268=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4268=false=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4268=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4268=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4268=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 859034)
Time Spent: 1h 20m  (was: 1h 10m)

> Improve docker logging in AbstractExternalDB
> 
>
> Key: HIVE-27295
> URL: https://issues.apache.org/jira/browse/HIVE-27295
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> 1. While waiting for docker container to start properly, we should print the 
> output of docker logs command in every loop, otherwise we can miss important 
> information about the actual startup process if the docker container was oom 
> killed in the meantime. Not to mention the fact that we're currently not 
> logging the output at all in case of an error:
> https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java#L125-L127
> 2. We can include the output for docker events in the logs in case of an 
> error (like: oom killed container), which might contain useful information.
> We can have info like this:
> {code}
> 2023-04-25T08:47:08.852515314-07:00 container oom 
> 2ba12cd9cd844bb30b3158564bd68cd97f25e7a05172d111713ac9f7c1c0b1d4 
> (image=harbor.rke-us-west-04.kc.cloudera.com/docker_private_cache/cloudera_thirdparty/postgres:9.3,
>  name=qtestExternalDB-PostgresExternalDB)
> 

[jira] [Created] (HIVE-27297) KryoShim: handle and use kryo4 and kryo5 transparently

2023-04-25 Thread Jira
László Bodor created HIVE-27297:
---

 Summary: KryoShim: handle and use kryo4 and kryo5 transparently
 Key: HIVE-27297
 URL: https://issues.apache.org/jira/browse/HIVE-27297
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27294) Remove redundant qt_database_all.q for memory consumption reasons

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27294?focusedWorklogId=859032=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859032
 ]

ASF GitHub Bot logged work on HIVE-27294:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 22:34
Start Date: 25/Apr/23 22:34
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on PR #4267:
URL: https://github.com/apache/hive/pull/4267#issuecomment-1522505522

   thanks a lot @zabetak  for the review, I'll merge this once tests pass




Issue Time Tracking
---

Worklog Id: (was: 859032)
Time Spent: 0.5h  (was: 20m)

> Remove redundant qt_database_all.q for memory consumption reasons
> -
>
> Key: HIVE-27294
> URL: https://issues.apache.org/jira/browse/HIVE-27294
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, while running qt_database_all.q the qtest environment starts and 
> runs all the RDMBS docker containers at the same time in beforeTest, which 
> might end up in extreme memory consumption. This is suboptimal, and 
> considering that the test cases are all covered by single, separate qtests, 
> we can simply remove qt_database_all.q.
> {code}
> ./ql/src/test/queries/clientpositive/qt_database_postgres.q
> ./ql/src/test/queries/clientpositive/qt_database_oracle.q
> ./ql/src/test/queries/clientpositive/qt_database_mssql.q
> ./ql/src/test/queries/clientpositive/qt_database_mariadb.q
> ./ql/src/test/queries/clientpositive/qt_database_mysql.q
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27294) Remove redundant qt_database_all.q for memory consumption reasons

2023-04-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27294:

Description: 
Currently, while running qt_database_all.q the qtest environment starts and 
runs all the RDMBS docker containers at the same time in beforeTest, which 
might end up in extreme memory consumption. This is suboptimal, and considering 
that the test cases are all covered by single, separate qtests, we can simply 
remove qt_database_all.q.

{code}
./ql/src/test/queries/clientpositive/qt_database_postgres.q
./ql/src/test/queries/clientpositive/qt_database_oracle.q
./ql/src/test/queries/clientpositive/qt_database_mssql.q
./ql/src/test/queries/clientpositive/qt_database_mariadb.q
./ql/src/test/queries/clientpositive/qt_database_mysql.q
{code}

  was:
Currently while running qt_database_all.q qtest environment starts and run all 
RDMBS docker containers at the same time in beforeTest, which might end up in 
extreme memory consumption. This is suboptimal, and considering that the test 
cases are all covered by single qtests, we can simply remove qt_database_all.q.

{code}
./ql/src/test/queries/clientpositive/qt_database_postgres.q
./ql/src/test/queries/clientpositive/qt_database_oracle.q
./ql/src/test/queries/clientpositive/qt_database_mssql.q
./ql/src/test/queries/clientpositive/qt_database_mariadb.q
./ql/src/test/queries/clientpositive/qt_database_mysql.q
{code}


> Remove redundant qt_database_all.q for memory consumption reasons
> -
>
> Key: HIVE-27294
> URL: https://issues.apache.org/jira/browse/HIVE-27294
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, while running qt_database_all.q the qtest environment starts and 
> runs all the RDMBS docker containers at the same time in beforeTest, which 
> might end up in extreme memory consumption. This is suboptimal, and 
> considering that the test cases are all covered by single, separate qtests, 
> we can simply remove qt_database_all.q.
> {code}
> ./ql/src/test/queries/clientpositive/qt_database_postgres.q
> ./ql/src/test/queries/clientpositive/qt_database_oracle.q
> ./ql/src/test/queries/clientpositive/qt_database_mssql.q
> ./ql/src/test/queries/clientpositive/qt_database_mariadb.q
> ./ql/src/test/queries/clientpositive/qt_database_mysql.q
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27295) Improve docker logging in AbstractExternalDB

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27295?focusedWorklogId=859030=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859030
 ]

ASF GitHub Bot logged work on HIVE-27295:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 22:31
Start Date: 25/Apr/23 22:31
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on code in PR #4268:
URL: https://github.com/apache/hive/pull/4268#discussion_r1177128998


##
itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java:
##
@@ -99,41 +98,52 @@ private ProcessResults runCmd(String[] cmd, long 
secondsToWait)
 reader = new BufferedReader(new 
InputStreamReader(proc.getErrorStream()));
 final StringBuilder errLines = new StringBuilder();
 reader.lines().forEach(s -> errLines.append(s).append('\n'));
-LOG.info("Result size: " + lines.length() + ";" + errLines.length());
+LOG.info("Result lines#: {}(stdout);{}(stderr)",lines.length(), 
errLines.length());
 return new ProcessResults(lines.toString(), errLines.toString(), 
proc.exitValue());
 }
 
-private int runCmdAndPrintStreams(String[] cmd, long secondsToWait)
+private ProcessResults runCmdAndPrintStreams(String[] cmd, long 
secondsToWait)
 throws InterruptedException, IOException {
 ProcessResults results = runCmd(cmd, secondsToWait);
 LOG.info("Stdout from proc: " + results.stdout);
 LOG.info("Stderr from proc: " + results.stderr);
-return results.rc;
+return results;
 }
 
 
 public void launchDockerContainer() throws Exception {
 runCmdAndPrintStreams(buildRmCmd(), 600);
-if (runCmdAndPrintStreams(buildRunCmd(), 600) != 0) {
+if (runCmdAndPrintStreams(buildRunCmd(), 600).rc != 0) {
 throw new RuntimeException("Unable to start docker container");
 }
 long startTime = System.currentTimeMillis();
 ProcessResults pr;
 do {
 Thread.sleep(1000);
-pr = runCmd(buildLogCmd(), 30);
+pr = runCmdAndPrintStreams(buildLogCmd(), 30);
 if (pr.rc != 0) {
-throw new RuntimeException("Failed to get docker logs");
+  printDockerEvents();
+  throw new RuntimeException("Failed to get docker logs");
 }
 } while (startTime + MAX_STARTUP_WAIT >= System.currentTimeMillis() && 
!isContainerReady(pr));
 if (startTime + MAX_STARTUP_WAIT < System.currentTimeMillis()) {
-throw new RuntimeException("Container failed to be ready in " + 
MAX_STARTUP_WAIT/1000 +
-" seconds");
+  printDockerEvents();
+  throw new RuntimeException("Container failed to be ready in " + 
MAX_STARTUP_WAIT/1000 +
+  " seconds, check docker logs output in hive logs");
 }
 }
 
+public void printDockerEvents() {
+  try {
+runCmdAndPrintStreams(new String[] { "docker", "events", "--since", 
"24h", "--until", "0s" }, 3);
+  } catch (Exception e) {
+LOG.info("Error while getting docker events, this was an analytical 
best effort step, no further actions...",

Review Comment:
   ack





Issue Time Tracking
---

Worklog Id: (was: 859030)
Time Spent: 1h 10m  (was: 1h)

> Improve docker logging in AbstractExternalDB
> 
>
> Key: HIVE-27295
> URL: https://issues.apache.org/jira/browse/HIVE-27295
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> 1. While waiting for docker container to start properly, we should print the 
> output of docker logs command in every loop, otherwise we can miss important 
> information about the actual startup process if the docker container was oom 
> killed in the meantime. Not to mention the fact that we're currently not 
> logging the output at all in case of an error:
> https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java#L125-L127
> 2. We can include the output for docker events in the logs in case of an 
> error (like: oom killed container), which might contain useful information.
> We can have info like this:
> {code}
> 2023-04-25T08:47:08.852515314-07:00 container oom 
> 2ba12cd9cd844bb30b3158564bd68cd97f25e7a05172d111713ac9f7c1c0b1d4 
> (image=harbor.rke-us-west-04.kc.cloudera.com/docker_private_cache/cloudera_thirdparty/postgres:9.3,
>  name=qtestExternalDB-PostgresExternalDB)
> 2023-04-25T08:47:08.893742200-07:00 container die 
> 

[jira] [Work logged] (HIVE-27295) Improve docker logging in AbstractExternalDB

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27295?focusedWorklogId=859029=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859029
 ]

ASF GitHub Bot logged work on HIVE-27295:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 22:30
Start Date: 25/Apr/23 22:30
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on code in PR #4268:
URL: https://github.com/apache/hive/pull/4268#discussion_r1177128319


##
itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java:
##
@@ -99,41 +98,52 @@ private ProcessResults runCmd(String[] cmd, long 
secondsToWait)
 reader = new BufferedReader(new 
InputStreamReader(proc.getErrorStream()));
 final StringBuilder errLines = new StringBuilder();
 reader.lines().forEach(s -> errLines.append(s).append('\n'));
-LOG.info("Result size: " + lines.length() + ";" + errLines.length());
+LOG.info("Result lines#: {}(stdout);{}(stderr)",lines.length(), 
errLines.length());
 return new ProcessResults(lines.toString(), errLines.toString(), 
proc.exitValue());
 }
 
-private int runCmdAndPrintStreams(String[] cmd, long secondsToWait)
+private ProcessResults runCmdAndPrintStreams(String[] cmd, long 
secondsToWait)
 throws InterruptedException, IOException {
 ProcessResults results = runCmd(cmd, secondsToWait);
 LOG.info("Stdout from proc: " + results.stdout);
 LOG.info("Stderr from proc: " + results.stderr);
-return results.rc;
+return results;
 }
 
 
 public void launchDockerContainer() throws Exception {
 runCmdAndPrintStreams(buildRmCmd(), 600);
-if (runCmdAndPrintStreams(buildRunCmd(), 600) != 0) {
+if (runCmdAndPrintStreams(buildRunCmd(), 600).rc != 0) {
 throw new RuntimeException("Unable to start docker container");
 }
 long startTime = System.currentTimeMillis();
 ProcessResults pr;
 do {
 Thread.sleep(1000);
-pr = runCmd(buildLogCmd(), 30);
+pr = runCmdAndPrintStreams(buildLogCmd(), 30);
 if (pr.rc != 0) {
-throw new RuntimeException("Failed to get docker logs");
+  printDockerEvents();
+  throw new RuntimeException("Failed to get docker logs");
 }
 } while (startTime + MAX_STARTUP_WAIT >= System.currentTimeMillis() && 
!isContainerReady(pr));
 if (startTime + MAX_STARTUP_WAIT < System.currentTimeMillis()) {
-throw new RuntimeException("Container failed to be ready in " + 
MAX_STARTUP_WAIT/1000 +
-" seconds");
+  printDockerEvents();
+  throw new RuntimeException("Container failed to be ready in " + 
MAX_STARTUP_WAIT/1000 +
+  " seconds, check docker logs output in hive logs");
 }
 }
 
+public void printDockerEvents() {
+  try {
+runCmdAndPrintStreams(new String[] { "docker", "events", "--since", 
"24h", "--until", "0s" }, 3);
+  } catch (Exception e) {
+LOG.info("Error while getting docker events, this was an analytical 
best effort step, no further actions...",

Review Comment:
   ack, I'll edit this one too
   your message makes sense, except that "to resolve the issue" is not 100% 
true, docker events cmd just lists the docker events to make us able to guess 
what happened, so this would be the ideal one:
   ```
   A problem was encountered while attempting to retrieve Docker events (the 
system made an analytical best effort to list the events to reveal the root 
cause). No further actions are necessary.
   ```





Issue Time Tracking
---

Worklog Id: (was: 859029)
Time Spent: 1h  (was: 50m)

> Improve docker logging in AbstractExternalDB
> 
>
> Key: HIVE-27295
> URL: https://issues.apache.org/jira/browse/HIVE-27295
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> 1. While waiting for docker container to start properly, we should print the 
> output of docker logs command in every loop, otherwise we can miss important 
> information about the actual startup process if the docker container was oom 
> killed in the meantime. Not to mention the fact that we're currently not 
> logging the output at all in case of an error:
> https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java#L125-L127
> 2. We can include the output for docker events in the logs in case of an 
> error (like: oom 

[jira] [Work logged] (HIVE-27295) Improve docker logging in AbstractExternalDB

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27295?focusedWorklogId=859028=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859028
 ]

ASF GitHub Bot logged work on HIVE-27295:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 22:25
Start Date: 25/Apr/23 22:25
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on code in PR #4268:
URL: https://github.com/apache/hive/pull/4268#discussion_r1177123758


##
itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java:
##
@@ -99,41 +98,52 @@ private ProcessResults runCmd(String[] cmd, long 
secondsToWait)
 reader = new BufferedReader(new 
InputStreamReader(proc.getErrorStream()));
 final StringBuilder errLines = new StringBuilder();
 reader.lines().forEach(s -> errLines.append(s).append('\n'));
-LOG.info("Result size: " + lines.length() + ";" + errLines.length());
+LOG.info("Result lines#: {}(stdout);{}(stderr)",lines.length(), 
errLines.length());
 return new ProcessResults(lines.toString(), errLines.toString(), 
proc.exitValue());
 }
 
-private int runCmdAndPrintStreams(String[] cmd, long secondsToWait)
+private ProcessResults runCmdAndPrintStreams(String[] cmd, long 
secondsToWait)
 throws InterruptedException, IOException {
 ProcessResults results = runCmd(cmd, secondsToWait);
 LOG.info("Stdout from proc: " + results.stdout);
 LOG.info("Stderr from proc: " + results.stderr);
-return results.rc;
+return results;
 }
 
 
 public void launchDockerContainer() throws Exception {
 runCmdAndPrintStreams(buildRmCmd(), 600);
-if (runCmdAndPrintStreams(buildRunCmd(), 600) != 0) {
+if (runCmdAndPrintStreams(buildRunCmd(), 600).rc != 0) {
 throw new RuntimeException("Unable to start docker container");
 }
 long startTime = System.currentTimeMillis();
 ProcessResults pr;
 do {
 Thread.sleep(1000);
-pr = runCmd(buildLogCmd(), 30);
+pr = runCmdAndPrintStreams(buildLogCmd(), 30);
 if (pr.rc != 0) {
-throw new RuntimeException("Failed to get docker logs");
+  printDockerEvents();
+  throw new RuntimeException("Failed to get docker logs");
 }
 } while (startTime + MAX_STARTUP_WAIT >= System.currentTimeMillis() && 
!isContainerReady(pr));
 if (startTime + MAX_STARTUP_WAIT < System.currentTimeMillis()) {
-throw new RuntimeException("Container failed to be ready in " + 
MAX_STARTUP_WAIT/1000 +
-" seconds");
+  printDockerEvents();
+  throw new RuntimeException("Container failed to be ready in " + 
MAX_STARTUP_WAIT/1000 +

Review Comment:
   yeah, odd, I'll change it





Issue Time Tracking
---

Worklog Id: (was: 859028)
Time Spent: 50m  (was: 40m)

> Improve docker logging in AbstractExternalDB
> 
>
> Key: HIVE-27295
> URL: https://issues.apache.org/jira/browse/HIVE-27295
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> 1. While waiting for docker container to start properly, we should print the 
> output of docker logs command in every loop, otherwise we can miss important 
> information about the actual startup process if the docker container was oom 
> killed in the meantime. Not to mention the fact that we're currently not 
> logging the output at all in case of an error:
> https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java#L125-L127
> 2. We can include the output for docker events in the logs in case of an 
> error (like: oom killed container), which might contain useful information.
> We can have info like this:
> {code}
> 2023-04-25T08:47:08.852515314-07:00 container oom 
> 2ba12cd9cd844bb30b3158564bd68cd97f25e7a05172d111713ac9f7c1c0b1d4 
> (image=harbor.rke-us-west-04.kc.cloudera.com/docker_private_cache/cloudera_thirdparty/postgres:9.3,
>  name=qtestExternalDB-PostgresExternalDB)
> 2023-04-25T08:47:08.893742200-07:00 container die 
> 2ba12cd9cd844bb30b3158564bd68cd97f25e7a05172d111713ac9f7c1c0b1d4 (exitCode=1, 
> image=harbor.rke-us-west-04.kc.cloudera.com/docker_private_cache/cloudera_thirdparty/postgres:9.3,
>  name=qtestExternalDB-PostgresExternalDB)
> {code}
> 3. Consider adding a [--memory 
> option|https://docs.docker.com/config/containers/resource_constraints/] to 
> the docker run command with a reasonable value to 

[jira] [Work logged] (HIVE-27294) Remove redundant qt_database_all.q for memory consumption reasons

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27294?focusedWorklogId=859025=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859025
 ]

ASF GitHub Bot logged work on HIVE-27294:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 21:37
Start Date: 25/Apr/23 21:37
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4267:
URL: https://github.com/apache/hive/pull/4267#issuecomment-1522453921

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4267)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4267=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4267=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4267=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4267=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4267=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4267=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4267=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4267=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4267=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4267=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4267=false=CODE_SMELL)
 [3 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4267=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4267=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4267=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 859025)
Time Spent: 20m  (was: 10m)

> Remove redundant qt_database_all.q for memory consumption reasons
> -
>
> Key: HIVE-27294
> URL: https://issues.apache.org/jira/browse/HIVE-27294
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently while running qt_database_all.q qtest environment starts and run 
> all RDMBS docker containers at the same time in beforeTest, which might end 
> up in extreme memory consumption. This is suboptimal, and considering that 
> the test cases are all covered by single qtests, we can simply remove 
> qt_database_all.q.
> {code}
> ./ql/src/test/queries/clientpositive/qt_database_postgres.q
> ./ql/src/test/queries/clientpositive/qt_database_oracle.q
> ./ql/src/test/queries/clientpositive/qt_database_mssql.q
> ./ql/src/test/queries/clientpositive/qt_database_mariadb.q
> ./ql/src/test/queries/clientpositive/qt_database_mysql.q
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27278) Simplify correlated queries with empty inputs

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27278?focusedWorklogId=859019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859019
 ]

ASF GitHub Bot logged work on HIVE-27278:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 20:53
Start Date: 25/Apr/23 20:53
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4253:
URL: https://github.com/apache/hive/pull/4253#issuecomment-1522402023

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4253)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4253=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4253=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4253=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4253=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4253=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4253=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4253=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4253=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4253=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4253=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4253=false=CODE_SMELL)
 [10 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4253=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4253=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4253=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 859019)
Time Spent: 2h 10m  (was: 2h)

> Simplify correlated queries with empty inputs
> -
>
> Key: HIVE-27278
> URL: https://issues.apache.org/jira/browse/HIVE-27278
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The correlated query below will not produce any result no matter the content 
> of the table.
> {code:sql}
> create table t1 (id int, val varchar(10)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> create table t2 (id int, val varchar(10)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id 
> = t2.id);
> {code}
> The CBO is able to derive that part of the query is empty and ends up with 
> the following plan.
> {noformat}
> CBO PLAN:
> HiveProject(id=[$0])
>   LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{}])
> HiveTableScan(table=[[default, t1]], table:alias=[t1])
> HiveValues(tuples=[[]])
> {noformat}
> The presence of LogicalCorrelate is first redundant but also problematic 
> since many parts of the optimizer assume that queries are decorrelated and do 
> not know how to handle the LogicalCorrelate.
> In the presence of views the 

[jira] [Work logged] (HIVE-27273) Iceberg: Upgrade iceberg to 1.2.1

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27273?focusedWorklogId=859017=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859017
 ]

ASF GitHub Bot logged work on HIVE-27273:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 20:38
Start Date: 25/Apr/23 20:38
Worklog Time Spent: 10m 
  Work Description: InvisibleProgrammer commented on PR #4252:
URL: https://github.com/apache/hive/pull/4252#issuecomment-1522384762

   I started to play with the porting, let me share some extra information and 
details: 
   
   There are two modules that we 'copy' into Hive: `mr` and `hive-metastore`. 
   
   When we do a Hive-Iceberg upgrade, it is worth checking if is there anything 
that should be ported from there as well - or otherwise, we can get unexpected 
behavior. 
   
   I was able to narrow down the promising commits to 7: 
   
   To gather them, I got the git commits between the 1.1.0 and 1.2.1 tags but 
only for the mr and hive-metastore folders with those commands: 
   
   ```
   zsoltmiskolczi@zsmiskolczi-MBP16 iceberg % git log 
apache-iceberg-1.1.0...apache-iceberg-1.2.1 --pretty=tformat:"%H %s %cs" 
--reverse 

Issue Time Tracking
---

Worklog Id: (was: 859017)
Time Spent: 1h 20m  (was: 1h 10m)

> Iceberg:  Upgrade iceberg to 1.2.1
> --
>
> Key: HIVE-27273
> URL: https://issues.apache.org/jira/browse/HIVE-27273
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [https://iceberg.apache.org/releases/#121-release] Iceberg1.2.1(include 
> 1.2.0) has lots of improvement, e.g. _branch commit_  and 
> _{{position_deletes}} metadata table._



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27273) Iceberg: Upgrade iceberg to 1.2.1

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27273?focusedWorklogId=859011=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859011
 ]

ASF GitHub Bot logged work on HIVE-27273:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 20:08
Start Date: 25/Apr/23 20:08
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4252:
URL: https://github.com/apache/hive/pull/4252#issuecomment-1522352631

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4252)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4252=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4252=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4252=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=CODE_SMELL)
 [3 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4252=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4252=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 859011)
Time Spent: 1h 10m  (was: 1h)

> Iceberg:  Upgrade iceberg to 1.2.1
> --
>
> Key: HIVE-27273
> URL: https://issues.apache.org/jira/browse/HIVE-27273
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> [https://iceberg.apache.org/releases/#121-release] Iceberg1.2.1(include 
> 1.2.0) has lots of improvement, e.g. _branch commit_  and 
> _{{position_deletes}} metadata table._



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27292) Upgrade Zookeeper to 3.7.1

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27292?focusedWorklogId=859009=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859009
 ]

ASF GitHub Bot logged work on HIVE-27292:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 19:21
Start Date: 25/Apr/23 19:21
Worklog Time Spent: 10m 
  Work Description: TuroczyX commented on PR #4264:
URL: https://github.com/apache/hive/pull/4264#issuecomment-1522296277

   Agree with your points. Let's merge this one, because we could just win with 
this, and let's have a follow up PR with 3.8.1.
   
   Btw: 3.7 was even bigger release :) But I guess it should work, as I 
remember we did not had to many problem with zookeeper versions in the past.  




Issue Time Tracking
---

Worklog Id: (was: 859009)
Time Spent: 50m  (was: 40m)

> Upgrade Zookeeper to 3.7.1
> --
>
> Key: HIVE-27292
> URL: https://issues.apache.org/jira/browse/HIVE-27292
> Project: Hive
>  Issue Type: Bug
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Upgrade Zookeper from 3.6.3 to 3.7.1 since 3.6.3 is in end of life. 
> https://endoflife.date/zookeeper



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27295) Improve docker logging in AbstractExternalDB

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27295?focusedWorklogId=859008=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859008
 ]

ASF GitHub Bot logged work on HIVE-27295:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 19:17
Start Date: 25/Apr/23 19:17
Worklog Time Spent: 10m 
  Work Description: TuroczyX commented on code in PR #4268:
URL: https://github.com/apache/hive/pull/4268#discussion_r1176940073


##
itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java:
##
@@ -99,41 +98,52 @@ private ProcessResults runCmd(String[] cmd, long 
secondsToWait)
 reader = new BufferedReader(new 
InputStreamReader(proc.getErrorStream()));
 final StringBuilder errLines = new StringBuilder();
 reader.lines().forEach(s -> errLines.append(s).append('\n'));
-LOG.info("Result size: " + lines.length() + ";" + errLines.length());
+LOG.info("Result lines#: {}(stdout);{}(stderr)",lines.length(), 
errLines.length());
 return new ProcessResults(lines.toString(), errLines.toString(), 
proc.exitValue());
 }
 
-private int runCmdAndPrintStreams(String[] cmd, long secondsToWait)
+private ProcessResults runCmdAndPrintStreams(String[] cmd, long 
secondsToWait)
 throws InterruptedException, IOException {
 ProcessResults results = runCmd(cmd, secondsToWait);
 LOG.info("Stdout from proc: " + results.stdout);
 LOG.info("Stderr from proc: " + results.stderr);
-return results.rc;
+return results;
 }
 
 
 public void launchDockerContainer() throws Exception {
 runCmdAndPrintStreams(buildRmCmd(), 600);
-if (runCmdAndPrintStreams(buildRunCmd(), 600) != 0) {
+if (runCmdAndPrintStreams(buildRunCmd(), 600).rc != 0) {
 throw new RuntimeException("Unable to start docker container");
 }
 long startTime = System.currentTimeMillis();
 ProcessResults pr;
 do {
 Thread.sleep(1000);
-pr = runCmd(buildLogCmd(), 30);
+pr = runCmdAndPrintStreams(buildLogCmd(), 30);
 if (pr.rc != 0) {
-throw new RuntimeException("Failed to get docker logs");
+  printDockerEvents();
+  throw new RuntimeException("Failed to get docker logs");
 }
 } while (startTime + MAX_STARTUP_WAIT >= System.currentTimeMillis() && 
!isContainerReady(pr));
 if (startTime + MAX_STARTUP_WAIT < System.currentTimeMillis()) {
-throw new RuntimeException("Container failed to be ready in " + 
MAX_STARTUP_WAIT/1000 +
-" seconds");
+  printDockerEvents();
+  throw new RuntimeException("Container failed to be ready in " + 
MAX_STARTUP_WAIT/1000 +
+  " seconds, check docker logs output in hive logs");
 }
 }
 
+public void printDockerEvents() {
+  try {
+runCmdAndPrintStreams(new String[] { "docker", "events", "--since", 
"24h", "--until", "0s" }, 3);
+  } catch (Exception e) {
+LOG.info("Error while getting docker events, this was an analytical 
best effort step, no further actions...",

Review Comment:
   It should be warning. I guess it is more then an info, but not as important 
as an error. In an info message an "Error" could be confusing.





Issue Time Tracking
---

Worklog Id: (was: 859008)
Time Spent: 40m  (was: 0.5h)

> Improve docker logging in AbstractExternalDB
> 
>
> Key: HIVE-27295
> URL: https://issues.apache.org/jira/browse/HIVE-27295
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> 1. While waiting for docker container to start properly, we should print the 
> output of docker logs command in every loop, otherwise we can miss important 
> information about the actual startup process if the docker container was oom 
> killed in the meantime. Not to mention the fact that we're currently not 
> logging the output at all in case of an error:
> https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java#L125-L127
> 2. We can include the output for docker events in the logs in case of an 
> error (like: oom killed container), which might contain useful information.
> We can have info like this:
> {code}
> 2023-04-25T08:47:08.852515314-07:00 container oom 
> 2ba12cd9cd844bb30b3158564bd68cd97f25e7a05172d111713ac9f7c1c0b1d4 
> 

[jira] [Work logged] (HIVE-27295) Improve docker logging in AbstractExternalDB

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27295?focusedWorklogId=859007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859007
 ]

ASF GitHub Bot logged work on HIVE-27295:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 19:13
Start Date: 25/Apr/23 19:13
Worklog Time Spent: 10m 
  Work Description: TuroczyX commented on code in PR #4268:
URL: https://github.com/apache/hive/pull/4268#discussion_r1176935426


##
itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java:
##
@@ -99,41 +98,52 @@ private ProcessResults runCmd(String[] cmd, long 
secondsToWait)
 reader = new BufferedReader(new 
InputStreamReader(proc.getErrorStream()));
 final StringBuilder errLines = new StringBuilder();
 reader.lines().forEach(s -> errLines.append(s).append('\n'));
-LOG.info("Result size: " + lines.length() + ";" + errLines.length());
+LOG.info("Result lines#: {}(stdout);{}(stderr)",lines.length(), 
errLines.length());
 return new ProcessResults(lines.toString(), errLines.toString(), 
proc.exitValue());
 }
 
-private int runCmdAndPrintStreams(String[] cmd, long secondsToWait)
+private ProcessResults runCmdAndPrintStreams(String[] cmd, long 
secondsToWait)
 throws InterruptedException, IOException {
 ProcessResults results = runCmd(cmd, secondsToWait);
 LOG.info("Stdout from proc: " + results.stdout);
 LOG.info("Stderr from proc: " + results.stderr);
-return results.rc;
+return results;
 }
 
 
 public void launchDockerContainer() throws Exception {
 runCmdAndPrintStreams(buildRmCmd(), 600);
-if (runCmdAndPrintStreams(buildRunCmd(), 600) != 0) {
+if (runCmdAndPrintStreams(buildRunCmd(), 600).rc != 0) {
 throw new RuntimeException("Unable to start docker container");
 }
 long startTime = System.currentTimeMillis();
 ProcessResults pr;
 do {
 Thread.sleep(1000);
-pr = runCmd(buildLogCmd(), 30);
+pr = runCmdAndPrintStreams(buildLogCmd(), 30);
 if (pr.rc != 0) {
-throw new RuntimeException("Failed to get docker logs");
+  printDockerEvents();
+  throw new RuntimeException("Failed to get docker logs");
 }
 } while (startTime + MAX_STARTUP_WAIT >= System.currentTimeMillis() && 
!isContainerReady(pr));
 if (startTime + MAX_STARTUP_WAIT < System.currentTimeMillis()) {
-throw new RuntimeException("Container failed to be ready in " + 
MAX_STARTUP_WAIT/1000 +
-" seconds");
+  printDockerEvents();
+  throw new RuntimeException("Container failed to be ready in " + 
MAX_STARTUP_WAIT/1000 +
+  " seconds, check docker logs output in hive logs");
 }
 }
 
+public void printDockerEvents() {
+  try {
+runCmdAndPrintStreams(new String[] { "docker", "events", "--since", 
"24h", "--until", "0s" }, 3);
+  } catch (Exception e) {
+LOG.info("Error while getting docker events, this was an analytical 
best effort step, no further actions...",

Review Comment:
   Alternative text: There was a problem encountered while attempting to 
retrieve Docker events. The system made an analytical best effort to resolve 
the issue, and at this point, no further actions are necessary.





Issue Time Tracking
---

Worklog Id: (was: 859007)
Time Spent: 0.5h  (was: 20m)

> Improve docker logging in AbstractExternalDB
> 
>
> Key: HIVE-27295
> URL: https://issues.apache.org/jira/browse/HIVE-27295
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> 1. While waiting for docker container to start properly, we should print the 
> output of docker logs command in every loop, otherwise we can miss important 
> information about the actual startup process if the docker container was oom 
> killed in the meantime. Not to mention the fact that we're currently not 
> logging the output at all in case of an error:
> https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java#L125-L127
> 2. We can include the output for docker events in the logs in case of an 
> error (like: oom killed container), which might contain useful information.
> We can have info like this:
> {code}
> 2023-04-25T08:47:08.852515314-07:00 container oom 
> 2ba12cd9cd844bb30b3158564bd68cd97f25e7a05172d111713ac9f7c1c0b1d4 
> 

[jira] [Work logged] (HIVE-27295) Improve docker logging in AbstractExternalDB

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27295?focusedWorklogId=859006=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859006
 ]

ASF GitHub Bot logged work on HIVE-27295:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 19:11
Start Date: 25/Apr/23 19:11
Worklog Time Spent: 10m 
  Work Description: TuroczyX commented on code in PR #4268:
URL: https://github.com/apache/hive/pull/4268#discussion_r1176932328


##
itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java:
##
@@ -99,41 +98,52 @@ private ProcessResults runCmd(String[] cmd, long 
secondsToWait)
 reader = new BufferedReader(new 
InputStreamReader(proc.getErrorStream()));
 final StringBuilder errLines = new StringBuilder();
 reader.lines().forEach(s -> errLines.append(s).append('\n'));
-LOG.info("Result size: " + lines.length() + ";" + errLines.length());
+LOG.info("Result lines#: {}(stdout);{}(stderr)",lines.length(), 
errLines.length());
 return new ProcessResults(lines.toString(), errLines.toString(), 
proc.exitValue());
 }
 
-private int runCmdAndPrintStreams(String[] cmd, long secondsToWait)
+private ProcessResults runCmdAndPrintStreams(String[] cmd, long 
secondsToWait)
 throws InterruptedException, IOException {
 ProcessResults results = runCmd(cmd, secondsToWait);
 LOG.info("Stdout from proc: " + results.stdout);
 LOG.info("Stderr from proc: " + results.stderr);
-return results.rc;
+return results;
 }
 
 
 public void launchDockerContainer() throws Exception {
 runCmdAndPrintStreams(buildRmCmd(), 600);
-if (runCmdAndPrintStreams(buildRunCmd(), 600) != 0) {
+if (runCmdAndPrintStreams(buildRunCmd(), 600).rc != 0) {
 throw new RuntimeException("Unable to start docker container");
 }
 long startTime = System.currentTimeMillis();
 ProcessResults pr;
 do {
 Thread.sleep(1000);
-pr = runCmd(buildLogCmd(), 30);
+pr = runCmdAndPrintStreams(buildLogCmd(), 30);
 if (pr.rc != 0) {
-throw new RuntimeException("Failed to get docker logs");
+  printDockerEvents();
+  throw new RuntimeException("Failed to get docker logs");
 }
 } while (startTime + MAX_STARTUP_WAIT >= System.currentTimeMillis() && 
!isContainerReady(pr));
 if (startTime + MAX_STARTUP_WAIT < System.currentTimeMillis()) {
-throw new RuntimeException("Container failed to be ready in " + 
MAX_STARTUP_WAIT/1000 +
-" seconds");
+  printDockerEvents();
+  throw new RuntimeException("Container failed to be ready in " + 
MAX_STARTUP_WAIT/1000 +

Review Comment:
   Bit odd exception message. Minor stuff I know.
   Container initialization is failed within x seconds. Please check the hive 
logs.





Issue Time Tracking
---

Worklog Id: (was: 859006)
Time Spent: 20m  (was: 10m)

> Improve docker logging in AbstractExternalDB
> 
>
> Key: HIVE-27295
> URL: https://issues.apache.org/jira/browse/HIVE-27295
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 1. While waiting for docker container to start properly, we should print the 
> output of docker logs command in every loop, otherwise we can miss important 
> information about the actual startup process if the docker container was oom 
> killed in the meantime. Not to mention the fact that we're currently not 
> logging the output at all in case of an error:
> https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java#L125-L127
> 2. We can include the output for docker events in the logs in case of an 
> error (like: oom killed container), which might contain useful information.
> We can have info like this:
> {code}
> 2023-04-25T08:47:08.852515314-07:00 container oom 
> 2ba12cd9cd844bb30b3158564bd68cd97f25e7a05172d111713ac9f7c1c0b1d4 
> (image=harbor.rke-us-west-04.kc.cloudera.com/docker_private_cache/cloudera_thirdparty/postgres:9.3,
>  name=qtestExternalDB-PostgresExternalDB)
> 2023-04-25T08:47:08.893742200-07:00 container die 
> 2ba12cd9cd844bb30b3158564bd68cd97f25e7a05172d111713ac9f7c1c0b1d4 (exitCode=1, 
> image=harbor.rke-us-west-04.kc.cloudera.com/docker_private_cache/cloudera_thirdparty/postgres:9.3,
>  name=qtestExternalDB-PostgresExternalDB)
> {code}
> 3. Consider adding a [--memory 
> 

[jira] [Work logged] (HIVE-27285) Add TableMeta ownership for filterTableMetas

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27285?focusedWorklogId=859002=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859002
 ]

ASF GitHub Bot logged work on HIVE-27285:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 18:31
Start Date: 25/Apr/23 18:31
Worklog Time Spent: 10m 
  Work Description: jfsii commented on code in PR #4258:
URL: https://github.com/apache/hive/pull/4258#discussion_r1176886829


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreFilterHook.java:
##
@@ -85,15 +85,13 @@ default List filterCatalogs(List catalogs) 
throws MetaException
   List filterTableNames(String catName, String dbName, List 
tableList)
   throws MetaException;
 
-  // Previously this was handled by filterTableNames.  But it can't be anymore 
because we can no
-  // longer depend on a 1-1 mapping between table name and entry in the list.
   /**
* Filter a list of TableMeta objects.
* @param tableMetas list of TableMetas to filter
* @return filtered table metas
* @throws MetaException something went wrong
*/
-  List filterTableMetas(String catName,String 
dbName,List tableMetas) throws MetaException;
+  List filterTableMetas(List tableMetas) throws 
MetaException;

Review Comment:
   I've made this change - I hope it matches the intent of your comment.





Issue Time Tracking
---

Worklog Id: (was: 859002)
Time Spent: 2.5h  (was: 2h 20m)

> Add TableMeta ownership for filterTableMetas
> 
>
> Key: HIVE-27285
> URL: https://issues.apache.org/jira/browse/HIVE-27285
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently TableMeta does not include ownership information which makes it 
> difficult for filterTableMetas to efficiently filter based on ${OWNER} 
> privileges.
> We should add ownership information to TableMeta and utilizing it in 
> filterTableMetas authorization checks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27285) Add TableMeta ownership for filterTableMetas

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27285?focusedWorklogId=859001=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-859001
 ]

ASF GitHub Bot logged work on HIVE-27285:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 18:30
Start Date: 25/Apr/23 18:30
Worklog Time Spent: 10m 
  Work Description: jfsii commented on code in PR #4258:
URL: https://github.com/apache/hive/pull/4258#discussion_r1176886121


##
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/AuthorizationMetaStoreFilterHook.java:
##
@@ -133,25 +138,45 @@ private List 
getHivePrivObjects(String dbName, List
 return objs;
   }
 
-  private List getHivePrivObjects(List tableList) {
+  private HivePrivilegeObject createPrivilegeObject(String dbName, String 
tableName, String owner,

Review Comment:
   Done.





Issue Time Tracking
---

Worklog Id: (was: 859001)
Time Spent: 2h 20m  (was: 2h 10m)

> Add TableMeta ownership for filterTableMetas
> 
>
> Key: HIVE-27285
> URL: https://issues.apache.org/jira/browse/HIVE-27285
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently TableMeta does not include ownership information which makes it 
> difficult for filterTableMetas to efficiently filter based on ${OWNER} 
> privileges.
> We should add ownership information to TableMeta and utilizing it in 
> filterTableMetas authorization checks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27277) Set up github actions workflow to build and push docker image to docker hub

2023-04-25 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716378#comment-17716378
 ] 

Simhadri Govindappa edited comment on HIVE-27277 at 4/25/23 6:04 PM:
-

 

INFRA-24505 : Docker Repo created for apache hive: 
[https://hub.docker.com/r/apache/hive]

 


was (Author: simhadri-g):
Docker Repo created for apache hive: [https://hub.docker.com/r/apache/hive]

 

> Set up github actions workflow to build and push docker image to docker hub
> ---
>
> Key: HIVE-27277
> URL: https://issues.apache.org/jira/browse/HIVE-27277
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27277) Set up github actions workflow to build and push docker image to docker hub

2023-04-25 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716378#comment-17716378
 ] 

Simhadri Govindappa commented on HIVE-27277:


Docker Repo created for apache hive: [https://hub.docker.com/r/apache/hive]

 

> Set up github actions workflow to build and push docker image to docker hub
> ---
>
> Key: HIVE-27277
> URL: https://issues.apache.org/jira/browse/HIVE-27277
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27290) Upgrade com.jayway.jsonpath » json-path to 2.8.0 to fix CVEs

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27290?focusedWorklogId=858997=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858997
 ]

ASF GitHub Bot logged work on HIVE-27290:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 18:00
Start Date: 25/Apr/23 18:00
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4266:
URL: https://github.com/apache/hive/pull/4266#issuecomment-1522194954

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4266)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4266=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4266=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4266=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4266=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4266=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4266=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4266=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4266=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4266=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4266=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4266=false=CODE_SMELL)
 [3 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4266=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4266=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4266=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 858997)
Time Spent: 0.5h  (was: 20m)

> Upgrade com.jayway.jsonpath » json-path to 2.8.0 to fix CVEs
> 
>
> Key: HIVE-27290
> URL: https://issues.apache.org/jira/browse/HIVE-27290
> Project: Hive
>  Issue Type: Task
>Reporter: Devaspati Krishnatri
>Assignee: Devaspati Krishnatri
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27238) Avoid Calcite Code generation for RelMetaDataProvider on every query

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27238?focusedWorklogId=858996=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858996
 ]

ASF GitHub Bot logged work on HIVE-27238:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 17:56
Start Date: 25/Apr/23 17:56
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #4212:
URL: https://github.com/apache/hive/pull/4212#discussion_r1176852722


##
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java:
##
@@ -128,4 +139,28 @@ public static void 
initializeMetadataProviderClass(List
 // This will register the classes in the default Hive implementation
 DEFAULT.register(nodeClasses);
   }
+
+  public static synchronized HiveDefaultRelMetadataProvider get(HiveConf 
hiveConf,
+  List> nodeClasses) {
+Map confKey = getConfKey(hiveConf);
+if (ALL_PROVIDERS.containsKey(confKey)) {
+  return ALL_PROVIDERS.get(confKey);
+}
+
+HiveDefaultRelMetadataProvider newProvider =
+new HiveDefaultRelMetadataProvider(hiveConf, nodeClasses);
+ALL_PROVIDERS.put(confKey, newProvider);
+return newProvider;
+  }
+
+  private static Map getConfKey(HiveConf conf) {
+ImmutableMap.Builder bldr = new 
ImmutableMap.Builder<>();
+bldr.put(HiveConf.ConfVars.HIVE_EXECUTION_ENGINE,
+conf.getVar(HiveConf.ConfVars.HIVE_EXECUTION_ENGINE));
+bldr.put(HiveConf.ConfVars.HIVE_CBO_EXTENDED_COST_MODEL,
+conf.getBoolVar(HiveConf.ConfVars.HIVE_CBO_EXTENDED_COST_MODEL));
+bldr.put(HiveConf.ConfVars.MAPREDMAXSPLITSIZE,
+conf.getLongVar(HiveConf.ConfVars.MAPREDMAXSPLITSIZE));

Review Comment:
   Are these settings can be changed without hs2 restart?





Issue Time Tracking
---

Worklog Id: (was: 858996)
Time Spent: 40m  (was: 0.5h)

> Avoid Calcite Code generation for RelMetaDataProvider on every query
> 
>
> Key: HIVE-27238
> URL: https://issues.apache.org/jira/browse/HIVE-27238
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In CalcitePlanner, we are instantiating a new CachingRelMetadataProvider on 
> every query.  Within the Calcite code, they keep the provider key to prevent 
> a new MetadataHandler class from being created.  But by generating a new 
> provider, the cache never gets a hit so we keep instantiating new 
> MetadataHandlers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27285) Add TableMeta ownership for filterTableMetas

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27285?focusedWorklogId=858995=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858995
 ]

ASF GitHub Bot logged work on HIVE-27285:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 17:46
Start Date: 25/Apr/23 17:46
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on code in PR #4258:
URL: https://github.com/apache/hive/pull/4258#discussion_r1176842363


##
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/AuthorizationMetaStoreFilterHook.java:
##
@@ -133,25 +138,45 @@ private List 
getHivePrivObjects(String dbName, List
 return objs;
   }
 
-  private List getHivePrivObjects(List tableList) {
+  private HivePrivilegeObject createPrivilegeObject(String dbName, String 
tableName, String owner,

Review Comment:
   nit: `createPrivilegeObjectForTable`?





Issue Time Tracking
---

Worklog Id: (was: 858995)
Time Spent: 2h 10m  (was: 2h)

> Add TableMeta ownership for filterTableMetas
> 
>
> Key: HIVE-27285
> URL: https://issues.apache.org/jira/browse/HIVE-27285
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently TableMeta does not include ownership information which makes it 
> difficult for filterTableMetas to efficiently filter based on ${OWNER} 
> privileges.
> We should add ownership information to TableMeta and utilizing it in 
> filterTableMetas authorization checks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27289) Check for proxy hosts with subnet in IP results in exception

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27289?focusedWorklogId=858993=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858993
 ]

ASF GitHub Bot logged work on HIVE-27289:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 17:43
Start Date: 25/Apr/23 17:43
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on code in PR #4263:
URL: https://github.com/apache/hive/pull/4263#discussion_r1176839576


##
jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/dao/TeradataDatabaseAccessor.java:
##
@@ -0,0 +1,41 @@
+/*
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hive.storage.jdbc.dao;
+
+/**
+ * Teradata specific data accessor. This is needed because Teradata JDBC 
drivers do not support generic LIMIT and OFFSET
+ * escape functions
+ */
+public class TeradataDatabaseAccessor extends GenericJdbcDatabaseAccessor {
+
+  @Override

Review Comment:
   Fixed.





Issue Time Tracking
---

Worklog Id: (was: 858993)
Time Spent: 1h  (was: 50m)

> Check for proxy hosts with subnet in IP results in exception 
> -
>
> Key: HIVE-27289
> URL: https://issues.apache.org/jira/browse/HIVE-27289
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When running schematool for sysdb setup in a kerberized environment, the 
> check to see if the user is a super user fails when the 
> hadoop.proxyuser.hive.hosts contains a subnet in IPAddresses. (for example: 
> 192.168.0.3/23)
> {noformat}
> exec 
> /opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p11.35002917/lib/hadoop/bin/hadoop 
> jar 
> /opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p11.35002917/lib/hive/lib/hive-cli-3.1.3000.7.1.8.11-3.jar
>  org.apache.hive.beeline.schematool.HiveSchemaTool -verbose -dbType hive 
> -metaDbType mysql -initOrUpgradeSchema
> WARNING: Use "yarn jar" to launch YARN applications.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p11.35002917/jars/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p11.35002917/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Hive Session ID = 93d863d8-cbe6-47fc-8817-49074841f9f1
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by 
> org.apache.hadoop.hive.common.StringInternUtils 
> (file:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p11.35002917/jars/hive-common-3.1.3000.7.1.8.11-3.jar)
>  to field java.net.URI.string
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.hadoop.hive.common.StringInternUtils
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> 23/04/22 12:01:38 ERROR metastore.HiveMetaStoreAuthorizer: [main]: 
> HiveMetaStoreAuthorizer.onEvent(): failed
> java.lang.IllegalArgumentException: Could not parse []
>   at 
> org.apache.commons.net.util.SubnetUtils.toInteger(SubnetUtils.java:287) 
> ~[commons-net-3.6.jar:3.6]
>   at 
> org.apache.commons.net.util.SubnetUtils.access$400(SubnetUtils.java:27) 
> ~[commons-net-3.6.jar:3.6]
>   at 
> org.apache.commons.net.util.SubnetUtils$SubnetInfo.isInRange(SubnetUtils.java:125)
>  ~[commons-net-3.6.jar:3.6]
>   at org.apache.hadoop.util.MachineList.includes(MachineList.java:155) 
> ~[hadoop-common-3.1.1.7.1.8.11-3.jar:?]
>   at 
> org.apache.hadoop.hive.metastore.utils.MetaStoreUtils.checkUserHasHostProxyPrivileges(MetaStoreUtils.java:1347)
>  ~[hive-standalone-metastore-3.1.3000.7.1.8.11-3.jar:3.1.3000.7.1.8.11-3]
>   at 
> 

[jira] [Work logged] (HIVE-27289) Check for proxy hosts with subnet in IP results in exception

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27289?focusedWorklogId=858992=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858992
 ]

ASF GitHub Bot logged work on HIVE-27289:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 17:40
Start Date: 25/Apr/23 17:40
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on code in PR #4263:
URL: https://github.com/apache/hive/pull/4263#discussion_r1176836581


##
jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/dao/TeradataDatabaseAccessor.java:
##
@@ -0,0 +1,41 @@
+/*
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hive.storage.jdbc.dao;
+
+/**
+ * Teradata specific data accessor. This is needed because Teradata JDBC 
drivers do not support generic LIMIT and OFFSET
+ * escape functions
+ */
+public class TeradataDatabaseAccessor extends GenericJdbcDatabaseAccessor {
+
+  @Override

Review Comment:
   Not a nit. There should only be one file in this change, not sure how others 
got pulled in. Let me fix that.





Issue Time Tracking
---

Worklog Id: (was: 858992)
Time Spent: 50m  (was: 40m)

> Check for proxy hosts with subnet in IP results in exception 
> -
>
> Key: HIVE-27289
> URL: https://issues.apache.org/jira/browse/HIVE-27289
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When running schematool for sysdb setup in a kerberized environment, the 
> check to see if the user is a super user fails when the 
> hadoop.proxyuser.hive.hosts contains a subnet in IPAddresses. (for example: 
> 192.168.0.3/23)
> {noformat}
> exec 
> /opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p11.35002917/lib/hadoop/bin/hadoop 
> jar 
> /opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p11.35002917/lib/hive/lib/hive-cli-3.1.3000.7.1.8.11-3.jar
>  org.apache.hive.beeline.schematool.HiveSchemaTool -verbose -dbType hive 
> -metaDbType mysql -initOrUpgradeSchema
> WARNING: Use "yarn jar" to launch YARN applications.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p11.35002917/jars/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p11.35002917/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Hive Session ID = 93d863d8-cbe6-47fc-8817-49074841f9f1
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by 
> org.apache.hadoop.hive.common.StringInternUtils 
> (file:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p11.35002917/jars/hive-common-3.1.3000.7.1.8.11-3.jar)
>  to field java.net.URI.string
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.hadoop.hive.common.StringInternUtils
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> 23/04/22 12:01:38 ERROR metastore.HiveMetaStoreAuthorizer: [main]: 
> HiveMetaStoreAuthorizer.onEvent(): failed
> java.lang.IllegalArgumentException: Could not parse []
>   at 
> org.apache.commons.net.util.SubnetUtils.toInteger(SubnetUtils.java:287) 
> ~[commons-net-3.6.jar:3.6]
>   at 
> org.apache.commons.net.util.SubnetUtils.access$400(SubnetUtils.java:27) 
> ~[commons-net-3.6.jar:3.6]
>   at 
> org.apache.commons.net.util.SubnetUtils$SubnetInfo.isInRange(SubnetUtils.java:125)
>  ~[commons-net-3.6.jar:3.6]
>   at org.apache.hadoop.util.MachineList.includes(MachineList.java:155) 
> ~[hadoop-common-3.1.1.7.1.8.11-3.jar:?]
>   at 
> org.apache.hadoop.hive.metastore.utils.MetaStoreUtils.checkUserHasHostProxyPrivileges(MetaStoreUtils.java:1347)
>  

[jira] [Work logged] (HIVE-27186) A persistent property store

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=858990=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858990
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 17:13
Start Date: 25/Apr/23 17:13
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4194:
URL: https://github.com/apache/hive/pull/4194#issuecomment-1522141536

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4194)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=BUG)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=BUG)
 [4 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4194=false=SECURITY_HOTSPOT)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4194=false=SECURITY_HOTSPOT)
 [1 Security 
Hotspot](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4194=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=CODE_SMELL)
 [105 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4194=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4194=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4194=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 858990)
Time Spent: 13h 10m  (was: 13h)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> information linking the feature data and the HiveMetaStore objects it applies 
> to. Any information related to a database, a table or the cluster - like 
> statistics for example or any operational data state or data (think rolling 
> backup) -  fall in this use-case.
> Typically, accommodating such a feature requires modifying the Metastore 
> database schema by adding or altering a table. It also usually implies 
> modifying the thrift APIs to expose such meta-data to consumers.
> The proposed feature wants to solve the persistence and query/transport for 
> these types of use-cases by exposing a 'key/(meta)value' store exposed as a 
> property system.
> HOW
> A property-value model is the simple and generic exposed API.
> To provision for several usage scenarios, the 

[jira] [Work logged] (HIVE-27058) Backport of HIVE-24316: Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27058?focusedWorklogId=858988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858988
 ]

ASF GitHub Bot logged work on HIVE-27058:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 17:11
Start Date: 25/Apr/23 17:11
Worklog Time Spent: 10m 
  Work Description: Diksha628 commented on PR #4192:
URL: https://github.com/apache/hive/pull/4192#issuecomment-1522138799

   @vihangk1 , @sankarh , can you please merge this?
   




Issue Time Tracking
---

Worklog Id: (was: 858988)
Time Spent: 20m  (was: 10m)

> Backport of HIVE-24316: Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
> -
>
> Key: HIVE-27058
> URL: https://issues.apache.org/jira/browse/HIVE-27058
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Diksha
>Assignee: Diksha
>Priority: Major
>  Labels: hive-3.2.0-must, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Backport of HIVE-24316: Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27172) Add the HMS client connection timeout config

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27172?focusedWorklogId=858987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858987
 ]

ASF GitHub Bot logged work on HIVE-27172:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 17:08
Start Date: 25/Apr/23 17:08
Worklog Time Spent: 10m 
  Work Description: wecharyu commented on code in PR #4150:
URL: https://github.com/apache/hive/pull/4150#discussion_r1176806249


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java:
##
@@ -402,6 +402,8 @@ public enum ConfVars {
 "has an infinite lifetime."),
 CLIENT_SOCKET_TIMEOUT("metastore.client.socket.timeout", 
"hive.metastore.client.socket.timeout", 600,
 TimeUnit.SECONDS, "MetaStore Client socket timeout in seconds"),
+CLIENT_CONNECTION_TIMEOUT("metastore.client.connection.timeout", 
"hive.metastore.client.connection.timeout", 10,

Review Comment:
   Yes, I have changed the default connectionTimeout to be the same as 
socketTimeout. But IMHO it's not a good practice because if we set 
socketTimeout and not set connectionTimeout, the default long time will be 
gained for connection, I think we should change the behavior by default where 
socketTimeout = connectionTimeout.





Issue Time Tracking
---

Worklog Id: (was: 858987)
Time Spent: 2h  (was: 1h 50m)

> Add the HMS client connection timeout config
> 
>
> Key: HIVE-27172
> URL: https://issues.apache.org/jira/browse/HIVE-27172
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently {{HiveMetastoreClient}} use {{CLIENT_SOCKET_TIMEOUT}} as both 
> socket timeout and connection timeout, it's not convenient for users to set a 
> smaller connection timeout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27292) Upgrade Zookeeper to 3.7.1

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27292?focusedWorklogId=858985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858985
 ]

ASF GitHub Bot logged work on HIVE-27292:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 16:49
Start Date: 25/Apr/23 16:49
Worklog Time Spent: 10m 
  Work Description: amanraj2520 commented on PR #4264:
URL: https://github.com/apache/hive/pull/4264#issuecomment-1522112435

   @TuroczyX I agree we can upgrade to 3.8.x but according to me it will be a 
major version upgrade and it would require some time to test it. Personally, I 
do not have an issue with this, it is just that I have run all the tests in my 
local keeping 3.7.1 in mind. I suggest that we can create a follow up ticket to 
upgrade it to 3.8.x. But if the community agrees, I can bump it to 3.8.x. Let 
me know your thoughts.




Issue Time Tracking
---

Worklog Id: (was: 858985)
Time Spent: 40m  (was: 0.5h)

> Upgrade Zookeeper to 3.7.1
> --
>
> Key: HIVE-27292
> URL: https://issues.apache.org/jira/browse/HIVE-27292
> Project: Hive
>  Issue Type: Bug
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Upgrade Zookeper from 3.6.3 to 3.7.1 since 3.6.3 is in end of life. 
> https://endoflife.date/zookeeper



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27295) Improve docker logging in AbstractExternalDB

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27295?focusedWorklogId=858982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858982
 ]

ASF GitHub Bot logged work on HIVE-27295:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 16:15
Start Date: 25/Apr/23 16:15
Worklog Time Spent: 10m 
  Work Description: abstractdog opened a new pull request, #4268:
URL: https://github.com/apache/hive/pull/4268

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 858982)
Remaining Estimate: 0h
Time Spent: 10m

> Improve docker logging in AbstractExternalDB
> 
>
> Key: HIVE-27295
> URL: https://issues.apache.org/jira/browse/HIVE-27295
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 1. While waiting for docker container to start properly, we should print the 
> output of docker logs command in every loop, otherwise we can miss important 
> information about the actual startup process if the docker container was oom 
> killed in the meantime. Not to mention the fact that we're currently not 
> logging the output at all in case of an error:
> https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java#L125-L127
> 2. We can include the output for docker events in the logs in case of an 
> error (like: oom killed container), which might contain useful information.
> We can have info like this:
> {code}
> 2023-04-25T08:47:08.852515314-07:00 container oom 
> 2ba12cd9cd844bb30b3158564bd68cd97f25e7a05172d111713ac9f7c1c0b1d4 
> (image=harbor.rke-us-west-04.kc.cloudera.com/docker_private_cache/cloudera_thirdparty/postgres:9.3,
>  name=qtestExternalDB-PostgresExternalDB)
> 2023-04-25T08:47:08.893742200-07:00 container die 
> 2ba12cd9cd844bb30b3158564bd68cd97f25e7a05172d111713ac9f7c1c0b1d4 (exitCode=1, 
> image=harbor.rke-us-west-04.kc.cloudera.com/docker_private_cache/cloudera_thirdparty/postgres:9.3,
>  name=qtestExternalDB-PostgresExternalDB)
> {code}
> 3. Consider adding a [--memory 
> option|https://docs.docker.com/config/containers/resource_constraints/] to 
> the docker run command with a reasonable value to make the RDBMS docker image 
> running process stable and independent from system settings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27295) Improve docker logging in AbstractExternalDB

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27295:
--
Labels: pull-request-available  (was: )

> Improve docker logging in AbstractExternalDB
> 
>
> Key: HIVE-27295
> URL: https://issues.apache.org/jira/browse/HIVE-27295
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 1. While waiting for docker container to start properly, we should print the 
> output of docker logs command in every loop, otherwise we can miss important 
> information about the actual startup process if the docker container was oom 
> killed in the meantime. Not to mention the fact that we're currently not 
> logging the output at all in case of an error:
> https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java#L125-L127
> 2. We can include the output for docker events in the logs in case of an 
> error (like: oom killed container), which might contain useful information.
> We can have info like this:
> {code}
> 2023-04-25T08:47:08.852515314-07:00 container oom 
> 2ba12cd9cd844bb30b3158564bd68cd97f25e7a05172d111713ac9f7c1c0b1d4 
> (image=harbor.rke-us-west-04.kc.cloudera.com/docker_private_cache/cloudera_thirdparty/postgres:9.3,
>  name=qtestExternalDB-PostgresExternalDB)
> 2023-04-25T08:47:08.893742200-07:00 container die 
> 2ba12cd9cd844bb30b3158564bd68cd97f25e7a05172d111713ac9f7c1c0b1d4 (exitCode=1, 
> image=harbor.rke-us-west-04.kc.cloudera.com/docker_private_cache/cloudera_thirdparty/postgres:9.3,
>  name=qtestExternalDB-PostgresExternalDB)
> {code}
> 3. Consider adding a [--memory 
> option|https://docs.docker.com/config/containers/resource_constraints/] to 
> the docker run command with a reasonable value to make the RDBMS docker image 
> running process stable and independent from system settings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27295) Improve docker logging in AbstractExternalDB

2023-04-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27295:

Description: 
1. While waiting for docker container to start properly, we should print the 
output of docker logs command in every loop, otherwise we can miss important 
information about the actual startup process if the docker container was oom 
killed in the meantime. Not to mention the fact that we're currently not 
logging the output at all in case of an error:
https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java#L125-L127

2. We can include the output for docker events in the logs in case of an error 
(like: oom killed container), which might contain useful information.
We can have info like this:
{code}
2023-04-25T08:47:08.852515314-07:00 container oom 
2ba12cd9cd844bb30b3158564bd68cd97f25e7a05172d111713ac9f7c1c0b1d4 
(image=harbor.rke-us-west-04.kc.cloudera.com/docker_private_cache/cloudera_thirdparty/postgres:9.3,
 name=qtestExternalDB-PostgresExternalDB)
2023-04-25T08:47:08.893742200-07:00 container die 
2ba12cd9cd844bb30b3158564bd68cd97f25e7a05172d111713ac9f7c1c0b1d4 (exitCode=1, 
image=harbor.rke-us-west-04.kc.cloudera.com/docker_private_cache/cloudera_thirdparty/postgres:9.3,
 name=qtestExternalDB-PostgresExternalDB)
{code}

3. Consider adding a [--memory 
option|https://docs.docker.com/config/containers/resource_constraints/] to the 
docker run command with a reasonable value to make the RDBMS docker image 
running process stable and independent from system settings.

  was:
1. While waiting for docker container to start properly, we should print the 
output of docker logs command in every loop, otherwise we can miss important 
information about the actual startup process if the docker container was oom 
killed in the meantime. Not to mention the fact that we're currently not 
logging the output at all in case of an error:
https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java#L125-L127

2. We can include the output for docker events in the logs in case of an error 
(like: oom killed container), which might contain useful information.

3. Consider adding a [--memory 
option|https://docs.docker.com/config/containers/resource_constraints/] to the 
docker run command with a reasonable value to make the RDBMS docker image 
running process stable and independent from system settings.


> Improve docker logging in AbstractExternalDB
> 
>
> Key: HIVE-27295
> URL: https://issues.apache.org/jira/browse/HIVE-27295
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> 1. While waiting for docker container to start properly, we should print the 
> output of docker logs command in every loop, otherwise we can miss important 
> information about the actual startup process if the docker container was oom 
> killed in the meantime. Not to mention the fact that we're currently not 
> logging the output at all in case of an error:
> https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java#L125-L127
> 2. We can include the output for docker events in the logs in case of an 
> error (like: oom killed container), which might contain useful information.
> We can have info like this:
> {code}
> 2023-04-25T08:47:08.852515314-07:00 container oom 
> 2ba12cd9cd844bb30b3158564bd68cd97f25e7a05172d111713ac9f7c1c0b1d4 
> (image=harbor.rke-us-west-04.kc.cloudera.com/docker_private_cache/cloudera_thirdparty/postgres:9.3,
>  name=qtestExternalDB-PostgresExternalDB)
> 2023-04-25T08:47:08.893742200-07:00 container die 
> 2ba12cd9cd844bb30b3158564bd68cd97f25e7a05172d111713ac9f7c1c0b1d4 (exitCode=1, 
> image=harbor.rke-us-west-04.kc.cloudera.com/docker_private_cache/cloudera_thirdparty/postgres:9.3,
>  name=qtestExternalDB-PostgresExternalDB)
> {code}
> 3. Consider adding a [--memory 
> option|https://docs.docker.com/config/containers/resource_constraints/] to 
> the docker run command with a reasonable value to make the RDBMS docker image 
> running process stable and independent from system settings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27278) Simplify correlated queries with empty inputs

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27278?focusedWorklogId=858978=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858978
 ]

ASF GitHub Bot logged work on HIVE-27278:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 15:29
Start Date: 25/Apr/23 15:29
Worklog Time Spent: 10m 
  Work Description: zabetak commented on PR #4253:
URL: https://github.com/apache/hive/pull/4253#issuecomment-1522002557

   > Does any follow-up required once 
[CALCITE-5568](https://issues.apache.org/jira/browse/CALCITE-5568) gets 
resolved?
   Probably will need to apply the respective changes in `HiveRelDecorrelator`. 
Logged https://issues.apache.org/jira/browse/HIVE-27296 for tracking this down.




Issue Time Tracking
---

Worklog Id: (was: 858978)
Time Spent: 2h  (was: 1h 50m)

> Simplify correlated queries with empty inputs
> -
>
> Key: HIVE-27278
> URL: https://issues.apache.org/jira/browse/HIVE-27278
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The correlated query below will not produce any result no matter the content 
> of the table.
> {code:sql}
> create table t1 (id int, val varchar(10)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> create table t2 (id int, val varchar(10)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id 
> = t2.id);
> {code}
> The CBO is able to derive that part of the query is empty and ends up with 
> the following plan.
> {noformat}
> CBO PLAN:
> HiveProject(id=[$0])
>   LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{}])
> HiveTableScan(table=[[default, t1]], table:alias=[t1])
> HiveValues(tuples=[[]])
> {noformat}
> The presence of LogicalCorrelate is first redundant but also problematic 
> since many parts of the optimizer assume that queries are decorrelated and do 
> not know how to handle the LogicalCorrelate.
> In the presence of views the same query can lead to the following exception 
> during compilation.
> {code:sql}
> CREATE MATERIALIZED VIEW v1 AS SELECT id FROM t2;
> EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id 
> = t2.id);
> {code}
> {noformat}
> org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not 
> enough rules to produce a node with desired properties: convention=HIVE, 
> sort=[], dist=any. All the inputs have relevant nodes, however the cost is 
> still infinite.
> Root: rel#185:RelSubset#3.HIVE.[].any
> Original rel:
> HiveProject(id=[$0]): rowcount = 4.0, cumulative cost = {20.0 rows, 13.0 cpu, 
> 0.0 io}, id = 178
>   LogicalCorrelate(correlation=[$cor0], joinType=[semi], 
> requiredColumns=[{}]): rowcount = 4.0, cumulative cost = {16.0 rows, 9.0 cpu, 
> 0.0 io}, id = 176
> HiveTableScan(table=[[default, t1]], table:alias=[t1]): rowcount = 4.0, 
> cumulative cost = {4.0 rows, 5.0 cpu, 0.0 io}, id = 111
> HiveValues(tuples=[[]]): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 
> cpu, 0.0 io}, id = 139
> Sets:
> Set#0, type: RecordType(INTEGER id, VARCHAR(10) val, BIGINT 
> BLOCK__OFFSET__INSIDE__FILE, VARCHAR(2147483647) INPUT__FILE__NAME, 
> RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid) ROW__ID, BOOLEAN 
> ROW__IS__DELETED)
>   rel#180:RelSubset#0.HIVE.[].any, best=rel#111
>   rel#111:HiveTableScan.HIVE.[].any(table=[default, 
> t1],htColumns=[0, 1, 2, 3, 4, 
> 5],insideView=false,plKey=default.t1;,table:alias=t1,tableScanTrait=null), 
> rowcount=4.0, cumulative cost={4.0 rows, 5.0 cpu, 0.0 io}
> Set#1, type: RecordType(NULL _o__c0)
>   rel#181:RelSubset#1.HIVE.[].any, best=rel#139
>   rel#139:HiveValues.HIVE.[].any(type=RecordType(NULL 
> _o__c0),tuples=[]), rowcount=1.0, cumulative cost={1.0 rows, 1.0 cpu, 0.0 io}
> Set#2, type: RecordType(INTEGER id, VARCHAR(10) val, BIGINT 
> BLOCK__OFFSET__INSIDE__FILE, VARCHAR(2147483647) INPUT__FILE__NAME, 
> RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid) ROW__ID, BOOLEAN 
> ROW__IS__DELETED)
>   rel#183:RelSubset#2.NONE.[].any, best=null
>   
> rel#182:LogicalCorrelate.NONE.[].any(left=RelSubset#180,right=RelSubset#181,correlation=$cor0,joinType=semi,requiredColumns={}),
>  rowcount=4.0, cumulative cost={inf}
> Set#3, type: RecordType(INTEGER id)
>   rel#185:RelSubset#3.HIVE.[].any, best=null
>   
> rel#184:HiveProject.HIVE.[].any(input=RelSubset#183,inputs=0,synthetic=false),
>  rowcount=4.0, cumulative cost={inf}
> Graphviz:
> digraph G {
>   root 

[jira] [Updated] (HIVE-27296) HiveRelDecorrelator does not handle correlation with Values

2023-04-25 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27296:
---
Description: 
The {{HiveRelDecorrelator}} does not cope well with {{Values}} expressions and 
when such expression exists in the plan it fails to remove the respective 
{{{}Correlate{}}}.

In HIVE-27278, we discovered a query that has a correlation over an empty 
{{Values}} expression.
{code:sql}
EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id = 
t2.id);{code}
The CBO plan after decorrelation is shown below.
{noformat}
HiveProject(id=[$0])
  LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{}])
    HiveTableScan(table=[[default, t1]], table:alias=[t1])
    HiveValues(tuples=[[]])
{noformat}
Although, in HIVE-27278 we could find a solution for a plan that contains an 
empty {{Values}} there can be queries with correlations on non-empty {{Values}} 
and for those we don't have a solution at the moment.

Normally after decorrelation we shouldn't have any {{Correlate}} expressions in 
the plan.

The problem starts from 
[HiveRelDecorrelator.decorrelate(Values)|https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java#L471]
 that returns null when it encounters the {{Values}} expression.

Later, in 
[HiveRelDecorrelator.decorrelate(Correlate)|https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java#L1247]
 it will bail out when treating the {{Correlate}} since one of the inputs is 
not rewritten.

The problem is still there in latest Calcite (CALCITE-5568).

  was:
The {{HiveRelDecorrelator}} does not cope well with {{Values}} expressions and 
when such expression exists in the plan it fails to remove the respective 
{{Correlate}}.

In HIVE-27298, we discovered a query that has a correlation over an empty 
{{Values}} expression.
{code:sql}
EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id = 
t2.id);{code}

The CBO plan after decorrelation is shown below.
{noformat}
HiveProject(id=[$0])
  LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{}])
    HiveTableScan(table=[[default, t1]], table:alias=[t1])
    HiveValues(tuples=[[]])
{noformat}

Although, in HIVE-27298 we could find a solution for a plan that contains an 
empty {{Values}} there can be queries with correlations on non-empty {{Values}} 
and for those we don't have a solution at the moment.

Normally after decorrelation we shouldn't have any {{Correlate}} expressions in 
the plan.

The problem starts from 
[HiveRelDecorrelator.decorrelate(Values)|https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java#L471]
 that returns null when it encounters the {{Values}} expression.

Later, in 
[HiveRelDecorrelator.decorrelate(Correlate)|https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java#L1247]
 it will bail out when treating the {{Correlate}} since one of the inputs is 
not rewritten. 

The problem is still there in latest Calcite (CALCITE-5568).


> HiveRelDecorrelator does not handle correlation with Values
> ---
>
> Key: HIVE-27296
> URL: https://issues.apache.org/jira/browse/HIVE-27296
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The {{HiveRelDecorrelator}} does not cope well with {{Values}} expressions 
> and when such expression exists in the plan it fails to remove the respective 
> {{{}Correlate{}}}.
> In HIVE-27278, we discovered a query that has a correlation over an empty 
> {{Values}} expression.
> {code:sql}
> EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id 
> = t2.id);{code}
> The CBO plan after decorrelation is shown below.
> {noformat}
> HiveProject(id=[$0])
>   LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{}])
>     HiveTableScan(table=[[default, t1]], table:alias=[t1])
>     HiveValues(tuples=[[]])
> {noformat}
> Although, in HIVE-27278 we could find a solution for a plan that contains an 
> empty {{Values}} there can be queries with correlations on non-empty 
> {{Values}} and for those we don't have a solution at the moment.
> Normally after decorrelation we shouldn't have any {{Correlate}} expressions 
> in the plan.
> The problem starts from 
> 

[jira] [Created] (HIVE-27296) HiveRelDecorrelator does not handle correlation with Values

2023-04-25 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27296:
--

 Summary: HiveRelDecorrelator does not handle correlation with 
Values
 Key: HIVE-27296
 URL: https://issues.apache.org/jira/browse/HIVE-27296
 Project: Hive
  Issue Type: Bug
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The {{HiveRelDecorrelator}} does not cope well with {{Values}} expressions and 
when such expression exists in the plan it fails to remove the respective 
{{Correlate}}.

In HIVE-27298, we discovered a query that has a correlation over an empty 
{{Values}} expression.
{code:sql}
EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id = 
t2.id);{code}

The CBO plan after decorrelation is shown below.
{noformat}
HiveProject(id=[$0])
  LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{}])
    HiveTableScan(table=[[default, t1]], table:alias=[t1])
    HiveValues(tuples=[[]])
{noformat}

Although, in HIVE-27298 we could find a solution for a plan that contains an 
empty {{Values}} there can be queries with correlations on non-empty {{Values}} 
and for those we don't have a solution at the moment.

Normally after decorrelation we shouldn't have any {{Correlate}} expressions in 
the plan.

The problem starts from 
[HiveRelDecorrelator.decorrelate(Values)|https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java#L471]
 that returns null when it encounters the {{Values}} expression.

Later, in 
[HiveRelDecorrelator.decorrelate(Correlate)|https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java#L1247]
 it will bail out when treating the {{Correlate}} since one of the inputs is 
not rewritten. 

The problem is still there in latest Calcite (CALCITE-5568).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27295) Improve docker logging in AbstractExternalDB

2023-04-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27295:

Description: 
1. While waiting for docker container to start properly, we should print the 
output of docker logs command in every loop, otherwise we can miss important 
information about the actual startup process if the docker container was oom 
killed in the meantime. Not to mention the fact that we're currently not 
logging the output at all in case of an error:
https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java#L125-L127

2. We can include the output for docker events in the logs in case of an error 
(like: oom killed container), which might contain useful information.

3. Consider adding a [--memory 
option|https://docs.docker.com/config/containers/resource_constraints/] to the 
docker run command with a reasonable value to make the RDBMS docker image 
running process stable and independent from system settings.

  was:
1. While waiting for docker container to start properly, we should print the 
output of docker logs command in every loop, otherwise we can miss important 
information if the docker container is oom killed in the meantime. Not to 
mention the fact that we're currently not logging the output at all in case of 
an error:
https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java#L125-L127

2. We can include the output for docker events in the logs in case of an error 
(like: oom killed container), which might contain useful information.

3. Consider adding a [--memory 
option|https://docs.docker.com/config/containers/resource_constraints/] to the 
docker run command with a reasonable value to make the RDBMS docker image 
running process stable and independent from system settings.


> Improve docker logging in AbstractExternalDB
> 
>
> Key: HIVE-27295
> URL: https://issues.apache.org/jira/browse/HIVE-27295
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> 1. While waiting for docker container to start properly, we should print the 
> output of docker logs command in every loop, otherwise we can miss important 
> information about the actual startup process if the docker container was oom 
> killed in the meantime. Not to mention the fact that we're currently not 
> logging the output at all in case of an error:
> https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java#L125-L127
> 2. We can include the output for docker events in the logs in case of an 
> error (like: oom killed container), which might contain useful information.
> 3. Consider adding a [--memory 
> option|https://docs.docker.com/config/containers/resource_constraints/] to 
> the docker run command with a reasonable value to make the RDBMS docker image 
> running process stable and independent from system settings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27295) Improve docker logging in AbstractExternalDB

2023-04-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27295:

Description: 
1. While waiting for docker container to start properly, we should print the 
output of docker logs command in every loop, otherwise we can miss important 
information if the docker container is oom killed in the meantime. Not to 
mention the fact that we're currently not logging the output at all in case of 
an error:
https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java#L125-L127

2. We can include the output for docker events in the logs in case of an error 
(like: oom killed container), which might contain useful information.

3. Consider adding a [--memory 
option|https://docs.docker.com/config/containers/resource_constraints/] to the 
docker run command with a reasonable value to make the RDBMS docker image 
running process stable and independent from system settings.

> Improve docker logging in AbstractExternalDB
> 
>
> Key: HIVE-27295
> URL: https://issues.apache.org/jira/browse/HIVE-27295
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> 1. While waiting for docker container to start properly, we should print the 
> output of docker logs command in every loop, otherwise we can miss important 
> information if the docker container is oom killed in the meantime. Not to 
> mention the fact that we're currently not logging the output at all in case 
> of an error:
> https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java#L125-L127
> 2. We can include the output for docker events in the logs in case of an 
> error (like: oom killed container), which might contain useful information.
> 3. Consider adding a [--memory 
> option|https://docs.docker.com/config/containers/resource_constraints/] to 
> the docker run command with a reasonable value to make the RDBMS docker image 
> running process stable and independent from system settings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27278) Simplify correlated queries with empty inputs

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27278?focusedWorklogId=858966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858966
 ]

ASF GitHub Bot logged work on HIVE-27278:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 15:04
Start Date: 25/Apr/23 15:04
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #4253:
URL: https://github.com/apache/hive/pull/4253#discussion_r1176658557


##
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRemoveEmptySingleRules.java:
##
@@ -192,6 +178,79 @@ public interface JoinRightEmptyRuleConfig extends 
PruneEmptyRule.Config {
 }
   }
 
+  private static RelNode padWithNulls(RelBuilder builder, RelNode input, 
RelDataType resultType,
+  boolean leftPadding) {
+int padding = resultType.getFieldCount() - 
input.getRowType().getFieldCount();
+List nullLiterals = Collections.nCopies(padding, 
builder.literal(null));
+builder.push(input);
+if (leftPadding) {
+  builder.project(concat(nullLiterals, builder.fields()));
+} else {
+  builder.project(concat(builder.fields(), nullLiterals));
+}
+return builder.convert(resultType, true).build();
+  }
+
+  public static final RelOptRule CORRELATE_RIGHT_INSTANCE = 
RelRule.Config.EMPTY
+  .withOperandSupplier(b0 ->
+  b0.operand(Correlate.class).inputs(
+  b1 -> b1.operand(RelNode.class).anyInputs(),
+  b2 -> 
b2.operand(Values.class).predicate(Values::isEmpty).noInputs()))
+  .withDescription("PruneEmptyCorrelate(right)")
+  .withRelBuilderFactory(HiveRelFactories.HIVE_BUILDER)
+  .as(CorrelateEmptyRuleConfig.class)
+  .toRule();
+  public static final RelOptRule CORRELATE_LEFT_INSTANCE = RelRule.Config.EMPTY
+  .withOperandSupplier(b0 ->
+  b0.operand(Correlate.class).inputs(
+  b1 -> 
b1.operand(Values.class).predicate(Values::isEmpty).noInputs(),
+  b2 -> b2.operand(RelNode.class).anyInputs()))
+  .withDescription("PruneEmptyCorrelate(left)")
+  .withRelBuilderFactory(HiveRelFactories.HIVE_BUILDER)
+  .as(CorrelateEmptyRuleConfig.class)
+  .toRule();
+  
+  /** Configuration for rule that prunes a correlate if one of its inputs is 
empty. */
+  public interface CorrelateEmptyRuleConfig extends PruneEmptyRule.Config {

Review Comment:
   The proposed refactoring makes sense and the code is indeed more readable. 
Fixed in 
https://github.com/apache/hive/pull/4253/commits/7cf9e827d7b06b15157825ca65ca40486338841f





Issue Time Tracking
---

Worklog Id: (was: 858966)
Time Spent: 1h 50m  (was: 1h 40m)

> Simplify correlated queries with empty inputs
> -
>
> Key: HIVE-27278
> URL: https://issues.apache.org/jira/browse/HIVE-27278
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The correlated query below will not produce any result no matter the content 
> of the table.
> {code:sql}
> create table t1 (id int, val varchar(10)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> create table t2 (id int, val varchar(10)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id 
> = t2.id);
> {code}
> The CBO is able to derive that part of the query is empty and ends up with 
> the following plan.
> {noformat}
> CBO PLAN:
> HiveProject(id=[$0])
>   LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{}])
> HiveTableScan(table=[[default, t1]], table:alias=[t1])
> HiveValues(tuples=[[]])
> {noformat}
> The presence of LogicalCorrelate is first redundant but also problematic 
> since many parts of the optimizer assume that queries are decorrelated and do 
> not know how to handle the LogicalCorrelate.
> In the presence of views the same query can lead to the following exception 
> during compilation.
> {code:sql}
> CREATE MATERIALIZED VIEW v1 AS SELECT id FROM t2;
> EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id 
> = t2.id);
> {code}
> {noformat}
> org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not 
> enough rules to produce a node with desired properties: convention=HIVE, 
> sort=[], dist=any. All the inputs have relevant nodes, however the cost is 
> still infinite.
> Root: rel#185:RelSubset#3.HIVE.[].any
> Original rel:
> HiveProject(id=[$0]): rowcount = 4.0, cumulative cost = {20.0 rows, 13.0 cpu, 
> 0.0 io}, id = 178
>   LogicalCorrelate(correlation=[$cor0], joinType=[semi], 
> requiredColumns=[{}]): 

[jira] [Work logged] (HIVE-27186) A persistent property store

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27186?focusedWorklogId=858965=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858965
 ]

ASF GitHub Bot logged work on HIVE-27186:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 15:03
Start Date: 25/Apr/23 15:03
Worklog Time Spent: 10m 
  Work Description: henrib commented on code in PR #4194:
URL: https://github.com/apache/hive/pull/4194#discussion_r1176656863


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java:
##
@@ -5665,6 +5678,212 @@ private String getGuidFromDB() throws MetaException {
 return null;
   }
 
+  @Override
+  public boolean runInTransaction(Runnable exec) throws MetaException {
+boolean success = false;
+Transaction tx = null;
+try {
+  if (openTransaction()) {
+exec.run();
+success = commitTransaction();
+  }
+} catch (Exception e) {
+  LOG.warn("Metastore operation failed", e);
+} finally {
+  rollbackAndCleanup(success, null);
+}
+return success;
+  }
+
+  @Override
+  public boolean dropProperties(String key) throws MetaException {
+boolean success = false;
+Transaction tx = null;
+Query query = null;
+try {
+  if (openTransaction()) {
+query = pm.newQuery(MMetastoreDBProperties.class, "this.propertyKey == 
key");
+query.declareParameters("java.lang.String key");
+Collection properties = 
(Collection) query.execute(key);
+if (!properties.isEmpty()) {
+  pm.deletePersistentAll(properties);
+}
+success = commitTransaction();
+  }
+} catch (Exception e) {
+  LOG.warn("Metastore property drop failed", e);
+} finally {
+  rollbackAndCleanup(success, query);
+}
+return success;
+  }
+
+  @Override
+  public MMetastoreDBProperties putProperties(String key, String value, String 
description,  byte[] content) throws MetaException {
+boolean success = false;
+try {
+  if (openTransaction()) {
+//pm.currentTransaction().setOptimistic(false);
+// fetch first to determine new vs update
+MMetastoreDBProperties properties = doGetProperties(key, null);
+final boolean newInstance;
+if (properties == null) {
+  newInstance = true;
+  properties = new MMetastoreDBProperties();
+  properties.setPropertykey(key);
+} else {
+  newInstance = false;
+}
+properties.setDescription(description);
+properties.setPropertyValue(value);
+properties.setPropertyContent(content);
+LOG.debug("Attempting to add property {} for the metastore db", key);
+properties.setDescription("Metastore property "
++ (newInstance ? "created" : "updated")
++ " " + 
LocalDateTime.now().format(DateTimeFormatter.ofPattern("-MM-dd 
HH:mm:ss.SSS")));
+if (newInstance) {
+  pm.makePersistent(properties);
+}
+success = commitTransaction();
+if (success) {
+  LOG.info("Metastore property {} created successfully", key);
+  return properties;
+}
+  }
+} catch (Exception e) {
+  LOG.warn("Metastore property save failed", e);
+} finally {
+  rollbackAndCleanup(success, null);
+}
+return null;
+  }
+
+  @Override
+  public boolean renameProperties(String mapKey, String newKey) throws 
MetaException {
+boolean success = false;
+Transaction tx = null;
+Query query = null;

Review Comment:
   query is cleaned up in finally block; QueryWrapper would not take care of 
the tx aspect.





Issue Time Tracking
---

Worklog Id: (was: 858965)
Time Spent: 13h  (was: 12h 50m)

> A persistent property store 
> 
>
> Key: HIVE-27186
> URL: https://issues.apache.org/jira/browse/HIVE-27186
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 4.0.0-alpha-2
>Reporter: Henri Biestro
>Assignee: Henri Biestro
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> WHAT
> A persistent property store usable as a support facility for any metadata 
> augmentation feature.
> WHY
> When adding new meta-data oriented features, we usually need to persist 
> information linking the feature data and the HiveMetaStore objects it applies 
> to. Any information related to a database, a table or the cluster - like 
> statistics for example or any operational data state or data (think rolling 
> backup) -  fall in this use-case.
> Typically, accommodating such a feature requires modifying the Metastore 
> database schema by adding or altering a table. It also usually 

[jira] [Created] (HIVE-27295) Improve docker logging in AbstractExternalDB

2023-04-25 Thread Jira
László Bodor created HIVE-27295:
---

 Summary: Improve docker logging in AbstractExternalDB
 Key: HIVE-27295
 URL: https://issues.apache.org/jira/browse/HIVE-27295
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27295) Improve docker logging in AbstractExternalDB

2023-04-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-27295:
---

Assignee: László Bodor

> Improve docker logging in AbstractExternalDB
> 
>
> Key: HIVE-27295
> URL: https://issues.apache.org/jira/browse/HIVE-27295
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-27295) Improve docker logging in AbstractExternalDB

2023-04-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-27295 started by László Bodor.
---
> Improve docker logging in AbstractExternalDB
> 
>
> Key: HIVE-27295
> URL: https://issues.apache.org/jira/browse/HIVE-27295
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27294) Remove redundant qt_database_all.q for memory consumption reasons

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27294?focusedWorklogId=858964=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858964
 ]

ASF GitHub Bot logged work on HIVE-27294:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 15:00
Start Date: 25/Apr/23 15:00
Worklog Time Spent: 10m 
  Work Description: abstractdog opened a new pull request, #4267:
URL: https://github.com/apache/hive/pull/4267

   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 858964)
Remaining Estimate: 0h
Time Spent: 10m

> Remove redundant qt_database_all.q for memory consumption reasons
> -
>
> Key: HIVE-27294
> URL: https://issues.apache.org/jira/browse/HIVE-27294
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently while running qt_database_all.q qtest environment starts and run 
> all RDMBS docker containers at the same time in beforeTest, which might end 
> up in extreme memory consumption. This is suboptimal, and considering that 
> the test cases are all covered by single qtests, we can simply remove 
> qt_database_all.q.
> {code}
> ./ql/src/test/queries/clientpositive/qt_database_postgres.q
> ./ql/src/test/queries/clientpositive/qt_database_oracle.q
> ./ql/src/test/queries/clientpositive/qt_database_mssql.q
> ./ql/src/test/queries/clientpositive/qt_database_mariadb.q
> ./ql/src/test/queries/clientpositive/qt_database_mysql.q
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27294) Remove redundant qt_database_all.q for memory consumption reasons

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27294:
--
Labels: pull-request-available  (was: )

> Remove redundant qt_database_all.q for memory consumption reasons
> -
>
> Key: HIVE-27294
> URL: https://issues.apache.org/jira/browse/HIVE-27294
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently while running qt_database_all.q qtest environment starts and run 
> all RDMBS docker containers at the same time in beforeTest, which might end 
> up in extreme memory consumption. This is suboptimal, and considering that 
> the test cases are all covered by single qtests, we can simply remove 
> qt_database_all.q.
> {code}
> ./ql/src/test/queries/clientpositive/qt_database_postgres.q
> ./ql/src/test/queries/clientpositive/qt_database_oracle.q
> ./ql/src/test/queries/clientpositive/qt_database_mssql.q
> ./ql/src/test/queries/clientpositive/qt_database_mariadb.q
> ./ql/src/test/queries/clientpositive/qt_database_mysql.q
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27294) Remove redundant qt_database_all.q for memory consumption reasons

2023-04-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27294:

Description: Currently while running qt_database_all.q qtest environment 
starts and run all RDMBS docker containers which might end up in extreme memory 
consumption. This is suboptimal, and considering that the test cases are all 
covered by single qtests, we can simply remove 

> Remove redundant qt_database_all.q for memory consumption reasons
> -
>
> Key: HIVE-27294
> URL: https://issues.apache.org/jira/browse/HIVE-27294
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> Currently while running qt_database_all.q qtest environment starts and run 
> all RDMBS docker containers which might end up in extreme memory consumption. 
> This is suboptimal, and considering that the test cases are all covered by 
> single qtests, we can simply remove 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27294) Remove redundant qt_database_all.q for memory consumption reasons

2023-04-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27294:

Description: 
Currently while running qt_database_all.q qtest environment starts and run all 
RDMBS docker containers at the same time in beforeTest, which might end up in 
extreme memory consumption. This is suboptimal, and considering that the test 
cases are all covered by single qtests, we can simply remove qt_database_all.q.

{code}
./ql/src/test/queries/clientpositive/qt_database_postgres.q
./ql/src/test/queries/clientpositive/qt_database_oracle.q
./ql/src/test/queries/clientpositive/qt_database_mssql.q
./ql/src/test/queries/clientpositive/qt_database_mariadb.q
./ql/src/test/queries/clientpositive/qt_database_mysql.q
{code}

  was:Currently while running qt_database_all.q qtest environment starts and 
run all RDMBS docker containers which might end up in extreme memory 
consumption. This is suboptimal, and considering that the test cases are all 
covered by single qtests, we can simply remove 


> Remove redundant qt_database_all.q for memory consumption reasons
> -
>
> Key: HIVE-27294
> URL: https://issues.apache.org/jira/browse/HIVE-27294
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> Currently while running qt_database_all.q qtest environment starts and run 
> all RDMBS docker containers at the same time in beforeTest, which might end 
> up in extreme memory consumption. This is suboptimal, and considering that 
> the test cases are all covered by single qtests, we can simply remove 
> qt_database_all.q.
> {code}
> ./ql/src/test/queries/clientpositive/qt_database_postgres.q
> ./ql/src/test/queries/clientpositive/qt_database_oracle.q
> ./ql/src/test/queries/clientpositive/qt_database_mssql.q
> ./ql/src/test/queries/clientpositive/qt_database_mariadb.q
> ./ql/src/test/queries/clientpositive/qt_database_mysql.q
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27294) Remove redundant qt_database_all.q for memory consumption reasons

2023-04-25 Thread Jira
László Bodor created HIVE-27294:
---

 Summary: Remove redundant qt_database_all.q for memory consumption 
reasons
 Key: HIVE-27294
 URL: https://issues.apache.org/jira/browse/HIVE-27294
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27294) Remove redundant qt_database_all.q for memory consumption reasons

2023-04-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-27294:
---

Assignee: László Bodor

> Remove redundant qt_database_all.q for memory consumption reasons
> -
>
> Key: HIVE-27294
> URL: https://issues.apache.org/jira/browse/HIVE-27294
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27285) Add TableMeta ownership for filterTableMetas

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27285?focusedWorklogId=858958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858958
 ]

ASF GitHub Bot logged work on HIVE-27285:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 14:49
Start Date: 25/Apr/23 14:49
Worklog Time Spent: 10m 
  Work Description: jfsii commented on code in PR #4258:
URL: https://github.com/apache/hive/pull/4258#discussion_r1176613457


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreFilterHook.java:
##
@@ -85,15 +85,13 @@ default List filterCatalogs(List catalogs) 
throws MetaException
   List filterTableNames(String catName, String dbName, List 
tableList)
   throws MetaException;
 
-  // Previously this was handled by filterTableNames.  But it can't be anymore 
because we can no
-  // longer depend on a 1-1 mapping between table name and entry in the list.
   /**
* Filter a list of TableMeta objects.
* @param tableMetas list of TableMetas to filter
* @return filtered table metas
* @throws MetaException something went wrong
*/
-  List filterTableMetas(String catName,String 
dbName,List tableMetas) throws MetaException;
+  List filterTableMetas(List tableMetas) throws 
MetaException;

Review Comment:
   I'll make this change. I had hoped to just drop this interface method since 
I didn't see any indication of other systems implementing MetaStoreFilterHooks 
(I generally think less dead code laying around the better), but I guess it is 
better to be safe here.



##
ql/src/test/queries/clientpositive/authorization_privilege_objects.q:
##
@@ -0,0 +1,20 @@
+--! qt:authorizer
+set 
hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
+set test.hive.authz.sstd.validator.outputPrivObjs=true;
+set hive.test.authz.sstd.hs2.mode=true;
+set user.name=testuser;
+
+CREATE DATABASE test_db;
+CREATE TABLE test_privs(i int);
+set user.name=testuser2;
+CREATE TABLE test_privs2(s string, i int);
+set user.name=testuser;
+SHOW DATABASES;
+SHOW TABLES;

Review Comment:
   I am unsure what you are trying to highlight.
   
   The SHOW TABLEs might hit getTableMeta - however the purpose of this test 
isn't to specifically test getTableMeta, it is to show the actual 
HivePrivilegeObjects that end up getting passed to the authorization 
implementation(s). I couldn't find any other test that verified the 
HivePrivilegeObject being generated for various commands. I added a few other 
query types - SELECTs/INSERTs for example to add some coverage for them, but 
this test could be expanded to include many more statements (I just felt maybe 
trying to cover them all is a bit out of scope for this PR).



##
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/minihms/AbstractMetaStoreService.java:
##
@@ -99,7 +99,7 @@ public void start(Map 
metastoreOverlay,
* @return The client connected to this service
* @throws MetaException if any Exception occurs during client configuration
*/
-  public IMetaStoreClient getClient() throws MetaException {

Review Comment:
   It allows access to methods like setProcessorCapabilities in the children 
tests. I did not see any indication the tests were specifically designed to 
test the IMetaStoreClient, so I felt it was safe to expose HiveMetaStoreClient 
to have better access to HMSClient methods.





Issue Time Tracking
---

Worklog Id: (was: 858958)
Time Spent: 2h  (was: 1h 50m)

> Add TableMeta ownership for filterTableMetas
> 
>
> Key: HIVE-27285
> URL: https://issues.apache.org/jira/browse/HIVE-27285
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently TableMeta does not include ownership information which makes it 
> difficult for filterTableMetas to efficiently filter based on ${OWNER} 
> privileges.
> We should add ownership information to TableMeta and utilizing it in 
> filterTableMetas authorization checks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-25953) Drop HiveRelMdPredicates::getPredicates(Join...) to use that of RelMdPredicates

2023-04-25 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716305#comment-17716305
 ] 

Stamatis Zampetakis commented on HIVE-25953:


The differences between 
[HiveRelMdPredicates|https://github.com/apache/hive/blob/ac48a8b080648096b545034882003ff7847d60b8/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java]
 and 
[RelMdPredicates|https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMdPredicates.java]
 (on Calcite 1.25.0) as far as it concerns the {{RelOptPredicateList 
getPredicates(Join join, RelMetadataQuery mq)}} and directly related data 
structures used by this method are outlined below.

+Hive only+
 * Possibility to infer predicates from ANTI joins (HIVE-23716)
 * Support pull-up of predicates without input references (HIVE-13803)

+Calcite only+
 * Using object equality instead of string equality when comparing RexNode 
expressions (CALCITE-2632)
 * Explicit simplification on predicates pulled from joins inputs 
(CALCITE-2205/CALCITE-2604)

In order to safely drop the in-house implementation of {{RelOptPredicateList 
getPredicates(Join join, RelMetadataQuery mq)}} and use the one from Calcite we 
should port the Hive specific changes to Calcite (assuming that the Calcite 
only changes are always beneficial).

The possibility to infer predicates from ANTI joins is an improvement that will 
land soon in Calcite (CALCITE-5675).

The pull-up of predicates without input references is debatable and probably 
should be dropped from Hive rather than landing in Calcite. The feature was 
introduced explicitly by HIVE-13803 in an attempt to pull "false" predicates 
(which essentially do not reference any input) from one side of the join and 
propagate them into the other side of the join. However, after HIVE-26524 the 
pruning rules are able to remove completely entire joins so the false 
predicates do not ever appear in the plan.

Someone can argue that we can still have predicates that do not reference any 
inputs (such as UNIX_TIMESTAMP() > 1681909077836) but the benefits of moving 
them around in the plan is less obvious. Consider the following SQL query and 
the respective plan.
{code:sql}
EXPLAIN CBO SELECT * 
FROM (SELECT ename, did FROM emp WHERE UNIX_TIMESTAMP() > 1681909077836) e
INNER JOIN dept d ON d.did = e.did;
{code}
{noformat}
HiveJoin(condition=[=($2, $1)], joinType=[inner], algorithm=[none], cost=[not 
available])
  HiveProject(ename=[$1], did=[$2])
    HiveFilter(condition=[AND(>(UNIX_TIMESTAMP(), 1681909077836), IS NOT 
NULL($2))])
      HiveTableScan(table=[[default, emp]], table:alias=[emp])
  HiveProject(did=[$0], dname=[$1])
    HiveFilter(condition=[AND(>(UNIX_TIMESTAMP(), 1681909077836), IS NOT 
NULL($0))])
      HiveTableScan(table=[[default, dept]], table:alias=[d])
{noformat}
Observe that due to special Hive logic of pulling predicates we can pull 
{{>(UNIX_TIMESTAMP(), 1681909077836)}} from the left side of the join and push 
it to the right. Note that this pull/push logic is only valid for deterministic 
predicates 
([https://github.com/apache/calcite/blob/e7375ae745ec18ce9df68b4945bb521ae49a053c/core/src/main/java/org/apache/calcite/sql/SqlOperator.java#L1048]).
 If the predicate is not deterministic the it is not valid to transfer the 
predicate above a filter 
([https://github.com/apache/calcite/blob/e7375ae745ec18ce9df68b4945bb521ae49a053c/core/src/main/java/org/apache/calcite/rel/metadata/RelMdPredicates.java#L305]);
 consider for example {{{}RAND(){}}}.
      
Based on the definition of a deterministic operator, the same input always 
gives the same output. This means that a function that does not reference any 
inputs and at the same time is deterministic, can be evaluated statically at 
compile time; the result will be either true or false and will be further 
simplified and dissappear from the plan.

In the end, the special logic for handling predicates without input references 
is useless when certain classic rules are present. The reduction and pruning 
rules are present in Hive so the pull/push logic is redundant. One caveat is 
the UNIX_TIMESTAMP() > 1681909077836 example shown above that even though it is 
deterministic it is not reduced to a constant due to HIVE-27291; other similar 
UDFs such as CURRENT_TIMESTAMP, CURRENT_DATE, etc., are reduced as expected. 
HIVE-27291 is an edge-case and should be fixed in the near future but it is not 
really blocking for this ticket.

> Drop HiveRelMdPredicates::getPredicates(Join...) to use that of 
> RelMdPredicates
> ---
>
> Key: HIVE-25953
> URL: https://issues.apache.org/jira/browse/HIVE-25953
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 4.0.0
>   

[jira] [Work logged] (HIVE-27032) Introduce liquibase for HMS schema evolution

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27032?focusedWorklogId=858953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858953
 ]

ASF GitHub Bot logged work on HIVE-27032:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 14:35
Start Date: 25/Apr/23 14:35
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4060:
URL: https://github.com/apache/hive/pull/4060#issuecomment-1521901272

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4060)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4060=false=SECURITY_HOTSPOT)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4060=false=SECURITY_HOTSPOT)
 [5 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4060=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=CODE_SMELL)
 [211 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4060=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4060=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 858953)
Time Spent: 3.5h  (was: 3h 20m)

> Introduce liquibase for HMS schema evolution
> 
>
> Key: HIVE-27032
> URL: https://issues.apache.org/jira/browse/HIVE-27032
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Introduce liquibase, and replace current upgrade procedure with it.
> The Schematool CLI API should remain untouched, while under the hood, 
> liquibase should be used for HMS schema evolution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27163) Column stats are not getting published after an insert query into an external table with custom location

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27163?focusedWorklogId=858948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858948
 ]

ASF GitHub Bot logged work on HIVE-27163:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 14:04
Start Date: 25/Apr/23 14:04
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on PR #4228:
URL: https://github.com/apache/hive/pull/4228#issuecomment-1521850342

   > @dengzhhu653 Could you please describe the root cause and the fix of this 
issue in the description of the PR or the Jira. It would be useful to 
understand some details when someone bumps into this in the future.
   
   Sure, will do that




Issue Time Tracking
---

Worklog Id: (was: 858948)
Time Spent: 1h 40m  (was: 1.5h)

> Column stats are not getting published after an insert query into an external 
> table with custom location
> 
>
> Key: HIVE-27163
> URL: https://issues.apache.org/jira/browse/HIVE-27163
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Test case details are below
> *test.q*
> {noformat}
> set hive.stats.column.autogather=true;
> set hive.stats.autogather=true;
> dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir}/test;
> create external table test_custom(age int, name string) stored as orc 
> location '/tmp/test';
> insert into test_custom select 1, 'test';
> desc formatted test_custom age;{noformat}
> *test.q.out*
>  
>  
> {noformat}
>  A masked pattern was here 
> PREHOOK: type: CREATETABLE
>  A masked pattern was here 
> PREHOOK: Output: database:default
> PREHOOK: Output: default@test_custom
>  A masked pattern was here 
> POSTHOOK: type: CREATETABLE
>  A masked pattern was here 
> POSTHOOK: Output: database:default
> POSTHOOK: Output: default@test_custom
> PREHOOK: query: insert into test_custom select 1, 'test'
> PREHOOK: type: QUERY
> PREHOOK: Input: _dummy_database@_dummy_table
> PREHOOK: Output: default@test_custom
> POSTHOOK: query: insert into test_custom select 1, 'test'
> POSTHOOK: type: QUERY
> POSTHOOK: Input: _dummy_database@_dummy_table
> POSTHOOK: Output: default@test_custom
> POSTHOOK: Lineage: test_custom.age SIMPLE []
> POSTHOOK: Lineage: test_custom.name SIMPLE []
> PREHOOK: query: desc formatted test_custom age
> PREHOOK: type: DESCTABLE
> PREHOOK: Input: default@test_custom
> POSTHOOK: query: desc formatted test_custom age
> POSTHOOK: type: DESCTABLE
> POSTHOOK: Input: default@test_custom
> col_name                age
> data_type               int
> min
> max
> num_nulls
> distinct_count
> avg_col_len
> max_col_len
> num_trues
> num_falses
> bit_vector
> comment                 from deserializer{noformat}
> As we can see from desc formatted output, column stats were not populated
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26982) Select * from a table containing timestamp column with default defined using TIMESTAMPLOCALTZ fails with error " ORC doesn't handle primitive category TIMESTAMPLOCALT

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26982?focusedWorklogId=858947=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858947
 ]

ASF GitHub Bot logged work on HIVE-26982:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 13:41
Start Date: 25/Apr/23 13:41
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4265:
URL: https://github.com/apache/hive/pull/4265#issuecomment-1521812340

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4265)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4265=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4265=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4265=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4265=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4265=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4265=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4265=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4265=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4265=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4265=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4265=false=CODE_SMELL)
 [5 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4265=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4265=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4265=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 858947)
Time Spent: 0.5h  (was: 20m)

> Select * from a table containing timestamp column with default defined using 
> TIMESTAMPLOCALTZ fails with error " ORC doesn't handle primitive category 
> TIMESTAMPLOCALTZ"
> 
>
> Key: HIVE-26982
> URL: https://issues.apache.org/jira/browse/HIVE-26982
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Dharmik Thakkar
>Assignee: Zoltán Rátkai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Select * from a table containing timestamp column with default defined using 
> TIMESTAMPLOCALTZ fails with error " ORC doesn't handle primitive category 
> TIMESTAMPLOCALTZ"
> Logs
> {code:java}
> 2023-01-24T20:37:48,831 INFO  [pool-2-thread-1] jdbc.TestDriver: Beginning 
> Test at 2023-01-24 20:37:48,831
> 2023-01-24T20:37:48,833 INFO  [pool-2-thread-1] jdbc.TestDriver: BEGIN MAIN
> 2023-01-24T20:37:48,834 INFO  [pool-9-thread-1] jdbc.TestDriver: Running 
> SessionGroup{name='SG_JZSL3SA0OG', initialDelay=0, repeats=1, repeatDelay=0}
> 2023-01-24T20:37:48,834 INFO  [pool-9-thread-1] jdbc.TestDriver: Connecting 
> as user 'hrt_qa'
> 2023-01-24T20:37:49,173 INFO  [pool-9-thread-1] jdbc.TestDriver: Query: drop 
> table if exists 

[jira] [Work logged] (HIVE-27287) Upgrade Commons-text to 1.10.0 to fix CVE

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27287?focusedWorklogId=858944=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858944
 ]

ASF GitHub Bot logged work on HIVE-27287:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 13:20
Start Date: 25/Apr/23 13:20
Worklog Time Spent: 10m 
  Work Description: TuroczyX commented on PR #4260:
URL: https://github.com/apache/hive/pull/4260#issuecomment-1521779927

   +1




Issue Time Tracking
---

Worklog Id: (was: 858944)
Time Spent: 1h  (was: 50m)

> Upgrade Commons-text to 1.10.0 to fix CVE
> -
>
> Key: HIVE-27287
> URL: https://issues.apache.org/jira/browse/HIVE-27287
> Project: Hive
>  Issue Type: Improvement
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Apache Commons Text versions prior to 1.8 is vulnerable to 
> [CVE-2022-42889|https://nvd.nist.gov/vuln/detail/CVE-2022-42889], which 
> involves potential script execution when processing untrusted input using 
> {{{}StringLookup{}}}. Direct and transitive references to Apache Commons Text 
> prior to 1.10.0 should be upgraded to avoid the default interpolation 
> behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27290) Upgrade com.jayway.jsonpath » json-path to 2.8.0 to fix CVEs

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27290?focusedWorklogId=858943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858943
 ]

ASF GitHub Bot logged work on HIVE-27290:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 13:17
Start Date: 25/Apr/23 13:17
Worklog Time Spent: 10m 
  Work Description: TuroczyX commented on PR #4266:
URL: https://github.com/apache/hive/pull/4266#issuecomment-1521775847

   LGTM. Also fix several CVE's https://www.cve.org/CVERecord?id=CVE-2023-1370 




Issue Time Tracking
---

Worklog Id: (was: 858943)
Time Spent: 20m  (was: 10m)

> Upgrade com.jayway.jsonpath » json-path to 2.8.0 to fix CVEs
> 
>
> Key: HIVE-27290
> URL: https://issues.apache.org/jira/browse/HIVE-27290
> Project: Hive
>  Issue Type: Task
>Reporter: Devaspati Krishnatri
>Assignee: Devaspati Krishnatri
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27292) Upgrade Zookeeper to 3.7.1

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27292?focusedWorklogId=858942=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858942
 ]

ASF GitHub Bot logged work on HIVE-27292:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 13:15
Start Date: 25/Apr/23 13:15
Worklog Time Spent: 10m 
  Work Description: TuroczyX commented on PR #4264:
URL: https://github.com/apache/hive/pull/4264#issuecomment-1521774166

   The change is legit, but won't be better to go to 3.8.1? 
   3.7.1 will be deprecated 2 weeks from now




Issue Time Tracking
---

Worklog Id: (was: 858942)
Time Spent: 0.5h  (was: 20m)

> Upgrade Zookeeper to 3.7.1
> --
>
> Key: HIVE-27292
> URL: https://issues.apache.org/jira/browse/HIVE-27292
> Project: Hive
>  Issue Type: Bug
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Upgrade Zookeper from 3.6.3 to 3.7.1 since 3.6.3 is in end of life. 
> https://endoflife.date/zookeeper



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27286) Upgrade jettison version to 1.5.4 to address CVE-2023-1436

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27286?focusedWorklogId=858941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858941
 ]

ASF GitHub Bot logged work on HIVE-27286:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 13:14
Start Date: 25/Apr/23 13:14
Worklog Time Spent: 10m 
  Work Description: shreeyasand commented on PR #4259:
URL: https://github.com/apache/hive/pull/4259#issuecomment-1521772435

   @TuroczyX could you please help merge this PR?




Issue Time Tracking
---

Worklog Id: (was: 858941)
Time Spent: 1h 50m  (was: 1h 40m)

> Upgrade jettison version to 1.5.4 to address CVE-2023-1436
> --
>
> Key: HIVE-27286
> URL: https://issues.apache.org/jira/browse/HIVE-27286
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sand Shreeya
>Assignee: Sand Shreeya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26982) Select * from a table containing timestamp column with default defined using TIMESTAMPLOCALTZ fails with error " ORC doesn't handle primitive category TIMESTAMPLOCALT

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26982?focusedWorklogId=858939=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858939
 ]

ASF GitHub Bot logged work on HIVE-26982:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 13:13
Start Date: 25/Apr/23 13:13
Worklog Time Spent: 10m 
  Work Description: TuroczyX commented on PR #4265:
URL: https://github.com/apache/hive/pull/4265#issuecomment-1521770868

   Code is easy and seems legit. Just could you please write a test to cover it?




Issue Time Tracking
---

Worklog Id: (was: 858939)
Time Spent: 20m  (was: 10m)

> Select * from a table containing timestamp column with default defined using 
> TIMESTAMPLOCALTZ fails with error " ORC doesn't handle primitive category 
> TIMESTAMPLOCALTZ"
> 
>
> Key: HIVE-26982
> URL: https://issues.apache.org/jira/browse/HIVE-26982
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Dharmik Thakkar
>Assignee: Zoltán Rátkai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Select * from a table containing timestamp column with default defined using 
> TIMESTAMPLOCALTZ fails with error " ORC doesn't handle primitive category 
> TIMESTAMPLOCALTZ"
> Logs
> {code:java}
> 2023-01-24T20:37:48,831 INFO  [pool-2-thread-1] jdbc.TestDriver: Beginning 
> Test at 2023-01-24 20:37:48,831
> 2023-01-24T20:37:48,833 INFO  [pool-2-thread-1] jdbc.TestDriver: BEGIN MAIN
> 2023-01-24T20:37:48,834 INFO  [pool-9-thread-1] jdbc.TestDriver: Running 
> SessionGroup{name='SG_JZSL3SA0OG', initialDelay=0, repeats=1, repeatDelay=0}
> 2023-01-24T20:37:48,834 INFO  [pool-9-thread-1] jdbc.TestDriver: Connecting 
> as user 'hrt_qa'
> 2023-01-24T20:37:49,173 INFO  [pool-9-thread-1] jdbc.TestDriver: Query: drop 
> table if exists t1_default
> 2023-01-24T20:37:49,237 INFO  [Thread-64] jdbc.TestDriver: INFO  : Compiling 
> command(queryId=hive_20230124203749_09b0f95f-4cf1-4c2c-9f08-1b91fdb4a6ca): 
> drop table if exists t1_default
> 2023-01-24T20:37:49,299 INFO  [Thread-64] jdbc.TestDriver: INFO  : Semantic 
> Analysis Completed (retrial = false)
> 2023-01-24T20:37:49,299 INFO  [Thread-64] jdbc.TestDriver: INFO  : Created 
> Hive schema: Schema(fieldSchemas:null, properties:null)
> 2023-01-24T20:37:49,299 INFO  [Thread-64] jdbc.TestDriver: INFO  : Completed 
> compiling 
> command(queryId=hive_20230124203749_09b0f95f-4cf1-4c2c-9f08-1b91fdb4a6ca); 
> Time taken: 0.031 seconds
> 2023-01-24T20:37:49,299 INFO  [Thread-64] jdbc.TestDriver: INFO  : Executing 
> command(queryId=hive_20230124203749_09b0f95f-4cf1-4c2c-9f08-1b91fdb4a6ca): 
> drop table if exists t1_default
> 2023-01-24T20:37:49,299 INFO  [Thread-64] jdbc.TestDriver: INFO  : Starting 
> task [Stage-0:DDL] in serial mode
> 2023-01-24T20:37:49,299 INFO  [Thread-64] jdbc.TestDriver: INFO  : Completed 
> executing 
> command(queryId=hive_20230124203749_09b0f95f-4cf1-4c2c-9f08-1b91fdb4a6ca); 
> Time taken: 0.012 seconds
> 2023-01-24T20:37:49,299 INFO  [Thread-64] jdbc.TestDriver: INFO  : OK
> 2023-01-24T20:37:49,416 INFO  [pool-9-thread-1] jdbc.TestDriver: No output to 
> verify
> 2023-01-24T20:37:49,416 INFO  [pool-9-thread-1] jdbc.TestDriver: Query: 
> create table t1_default ( t tinyint default 1Y,   si smallint default 1S, 
> i int default 1,b bigint default 1L, f double default 
> double(5.7), d double, s varchar(25) default cast('col1' as 
> varchar(25)), dc decimal(38,18), bo varchar(5), v varchar(25),
>  c char(25) default cast('var1' as char(25)), ts timestamp DEFAULT 
> TIMESTAMP'2016-02-22 12:45:07.0', dt date default 
> cast('2015-03-12' as DATE), tz timestamp with local time zone DEFAULT 
> TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles') STORED AS 
> TEXTFILE
> 2023-01-24T20:37:49,476 INFO  [Thread-65] jdbc.TestDriver: INFO  : Compiling 
> command(queryId=hive_20230124203749_75ffcf31-6bd6-46d7-ba02-f39efb2c4279): 
> create table t1_default ( t tinyint default 1Y,   si smallint default 1S, 
> i int default 1,b bigint default 1L, f double default 
> double(5.7), d double, s varchar(25) default cast('col1' as 
> varchar(25)), dc decimal(38,18), bo varchar(5), v varchar(25),
>  c char(25) default cast('var1' as char(25)), ts timestamp DEFAULT 
> TIMESTAMP'2016-02-22 12:45:07.0', dt date default 
> cast('2015-03-12' as DATE), tz timestamp with local time zone DEFAULT 
> TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles') STORED AS 
> TEXTFILE
> 

[jira] [Work logged] (HIVE-27286) Upgrade jettison version to 1.5.4 to address CVE-2023-1436

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27286?focusedWorklogId=858938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858938
 ]

ASF GitHub Bot logged work on HIVE-27286:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 13:12
Start Date: 25/Apr/23 13:12
Worklog Time Spent: 10m 
  Work Description: shreeyasand commented on PR #4259:
URL: https://github.com/apache/hive/pull/4259#issuecomment-1521769280

   @TuroczyX could you please help merge this PR?




Issue Time Tracking
---

Worklog Id: (was: 858938)
Time Spent: 1h 40m  (was: 1.5h)

> Upgrade jettison version to 1.5.4 to address CVE-2023-1436
> --
>
> Key: HIVE-27286
> URL: https://issues.apache.org/jira/browse/HIVE-27286
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sand Shreeya
>Assignee: Sand Shreeya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27292) Upgrade Zookeeper to 3.7.1

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27292?focusedWorklogId=858936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858936
 ]

ASF GitHub Bot logged work on HIVE-27292:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 13:10
Start Date: 25/Apr/23 13:10
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4264:
URL: https://github.com/apache/hive/pull/4264#issuecomment-1521765614

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4264)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4264=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4264=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4264=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4264=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4264=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4264=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4264=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4264=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4264=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4264=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4264=false=CODE_SMELL)
 [3 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4264=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4264=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4264=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 858936)
Time Spent: 20m  (was: 10m)

> Upgrade Zookeeper to 3.7.1
> --
>
> Key: HIVE-27292
> URL: https://issues.apache.org/jira/browse/HIVE-27292
> Project: Hive
>  Issue Type: Bug
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Upgrade Zookeper from 3.6.3 to 3.7.1 since 3.6.3 is in end of life. 
> https://endoflife.date/zookeeper



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27163) Column stats are not getting published after an insert query into an external table with custom location

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27163?focusedWorklogId=858931=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858931
 ]

ASF GitHub Bot logged work on HIVE-27163:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 12:57
Start Date: 25/Apr/23 12:57
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on PR #4228:
URL: https://github.com/apache/hive/pull/4228#issuecomment-1521745434

   @dengzhhu653 
   Could you please describe the root cause and the fix of this issue in the 
description of the PR or the Jira. It would be useful to understand some 
details when someone bumps into this in the future.




Issue Time Tracking
---

Worklog Id: (was: 858931)
Time Spent: 1.5h  (was: 1h 20m)

> Column stats are not getting published after an insert query into an external 
> table with custom location
> 
>
> Key: HIVE-27163
> URL: https://issues.apache.org/jira/browse/HIVE-27163
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Test case details are below
> *test.q*
> {noformat}
> set hive.stats.column.autogather=true;
> set hive.stats.autogather=true;
> dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir}/test;
> create external table test_custom(age int, name string) stored as orc 
> location '/tmp/test';
> insert into test_custom select 1, 'test';
> desc formatted test_custom age;{noformat}
> *test.q.out*
>  
>  
> {noformat}
>  A masked pattern was here 
> PREHOOK: type: CREATETABLE
>  A masked pattern was here 
> PREHOOK: Output: database:default
> PREHOOK: Output: default@test_custom
>  A masked pattern was here 
> POSTHOOK: type: CREATETABLE
>  A masked pattern was here 
> POSTHOOK: Output: database:default
> POSTHOOK: Output: default@test_custom
> PREHOOK: query: insert into test_custom select 1, 'test'
> PREHOOK: type: QUERY
> PREHOOK: Input: _dummy_database@_dummy_table
> PREHOOK: Output: default@test_custom
> POSTHOOK: query: insert into test_custom select 1, 'test'
> POSTHOOK: type: QUERY
> POSTHOOK: Input: _dummy_database@_dummy_table
> POSTHOOK: Output: default@test_custom
> POSTHOOK: Lineage: test_custom.age SIMPLE []
> POSTHOOK: Lineage: test_custom.name SIMPLE []
> PREHOOK: query: desc formatted test_custom age
> PREHOOK: type: DESCTABLE
> PREHOOK: Input: default@test_custom
> POSTHOOK: query: desc formatted test_custom age
> POSTHOOK: type: DESCTABLE
> POSTHOOK: Input: default@test_custom
> col_name                age
> data_type               int
> min
> max
> num_nulls
> distinct_count
> avg_col_len
> max_col_len
> num_trues
> num_falses
> bit_vector
> comment                 from deserializer{noformat}
> As we can see from desc formatted output, column stats were not populated
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27163) Column stats are not getting published after an insert query into an external table with custom location

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27163?focusedWorklogId=858928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858928
 ]

ASF GitHub Bot logged work on HIVE-27163:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 12:54
Start Date: 25/Apr/23 12:54
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #4228:
URL: https://github.com/apache/hive/pull/4228#discussion_r1176442212


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableDesc.java:
##
@@ -921,14 +924,23 @@ public Table toTable(HiveConf conf) throws HiveException {
 // When replicating the statistics for a table will be obtained from the 
source. Do not
 // reset it on replica.
 if (replicationSpec == null || !replicationSpec.isInReplicationScope()) {
-  if (!this.isCTAS && (tbl.getPath() == null || (!isExternal() && 
tbl.isEmpty( {
-if (!tbl.isPartitioned() && 
conf.getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) {
-  
StatsSetupConst.setStatsStateForCreateTable(tbl.getTTable().getParameters(),
-  MetaStoreUtils.getColumnNames(tbl.getCols()), 
StatsSetupConst.TRUE);
-}
-  } else {
-
StatsSetupConst.setStatsStateForCreateTable(tbl.getTTable().getParameters(), 
null,
-StatsSetupConst.FALSE);
+  // Remove COLUMN_STATS_ACCURATE=true from table's parameter, let the HMS 
determine if
+  // there is need to add column stats dependent on the table's location 
in case the metastore transformer
+  // sets or alters the table's location.
+  
StatsSetupConst.setStatsStateForCreateTable(tbl.getTTable().getParameters(), 
null,
+  StatsSetupConst.FALSE);
+  if (!this.isCTAS && !tbl.isPartitioned() && !tbl.isTemporary() &&
+  conf.getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) {
+// Put the flag into the dictionary in order not to pollute the table,
+// ObjectDictionary is meant to convey repeatitive messages.
+ObjectDictionary dictionary = tbl.getTTable().isSetDictionary() ?
+tbl.getTTable().getDictionary() : new ObjectDictionary();
+String cols = 
MetaStoreUtils.getColumnNames(tbl.getCols()).stream().collect(Collectors.joining("\0"));

Review Comment:
   Please extract `"\0"` to a constant.



##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java:
##
@@ -508,6 +511,48 @@ public static void clearQuickStats(Map 
params) {
 params.remove(StatsSetupConst.NUM_ERASURE_CODED_FILES);
   }
 
+  public static void updateTableStatsForCreateTable(Warehouse wh, Database db, 
Table tbl,
+  EnvironmentContext envContext, Configuration conf, Path tblPath, boolean 
newDir)
+  throws MetaException {
+// If the created table is a view, skip generating the stats
+if (MetaStoreUtils.isView(tbl)) {
+  return;
+}
+assert tblPath != null;
+if (tbl.isSetDictionary() && tbl.getDictionary().getValues() != null) {
+  List values = tbl.getDictionary().getValues().
+  remove(StatsSetupConst.STATS_FOR_CREATE_TABLE);
+  java.nio.ByteBuffer buffer;
+  if (values != null && values.size() > 0 && (buffer = 
values.get(0)).hasArray()) {
+String val = new String(buffer.array(), StandardCharsets.UTF_8);
+if (StatsSetupConst.TRUE.equals(val)) {
+  try {
+boolean isIcebergTable =
+
HiveMetaHook.ICEBERG.equalsIgnoreCase(tbl.getParameters().get(HiveMetaHook.TABLE_TYPE));
+PathFilter pathFilter = isIcebergTable ?
+path -> !"metadata".equals(path.getName()) : 
FileUtils.HIDDEN_FILES_PATH_FILTER;

Review Comment:
   This part is too Iceberg specific. Is it possible to get this filter via 
storage handler api?



##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableDesc.java:
##
@@ -921,14 +924,23 @@ public Table toTable(HiveConf conf) throws HiveException {
 // When replicating the statistics for a table will be obtained from the 
source. Do not
 // reset it on replica.
 if (replicationSpec == null || !replicationSpec.isInReplicationScope()) {
-  if (!this.isCTAS && (tbl.getPath() == null || (!isExternal() && 
tbl.isEmpty( {
-if (!tbl.isPartitioned() && 
conf.getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) {
-  
StatsSetupConst.setStatsStateForCreateTable(tbl.getTTable().getParameters(),
-  MetaStoreUtils.getColumnNames(tbl.getCols()), 
StatsSetupConst.TRUE);
-}
-  } else {
-
StatsSetupConst.setStatsStateForCreateTable(tbl.getTTable().getParameters(), 
null,
-StatsSetupConst.FALSE);
+  // Remove COLUMN_STATS_ACCURATE=true from table's parameter, let the HMS 
determine if
+  // there is need to add column 

[jira] [Updated] (HIVE-27290) Upgrade com.jayway.jsonpath » json-path to 2.8.0 to fix CVEs

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27290:
--
Labels: pull-request-available  (was: )

> Upgrade com.jayway.jsonpath » json-path to 2.8.0 to fix CVEs
> 
>
> Key: HIVE-27290
> URL: https://issues.apache.org/jira/browse/HIVE-27290
> Project: Hive
>  Issue Type: Task
>Reporter: Devaspati Krishnatri
>Assignee: Devaspati Krishnatri
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27290) Upgrade com.jayway.jsonpath » json-path to 2.8.0 to fix CVEs

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27290?focusedWorklogId=858923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858923
 ]

ASF GitHub Bot logged work on HIVE-27290:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 12:46
Start Date: 25/Apr/23 12:46
Worklog Time Spent: 10m 
  Work Description: ReshmaFegade2022 opened a new pull request, #4266:
URL: https://github.com/apache/hive/pull/4266

   
   
   ### What changes were proposed in this pull request?
   Upgrade json-path to 2.8.0 so dependent josn-smart lib also get upgrade to 
2.4.10
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 858923)
Remaining Estimate: 0h
Time Spent: 10m

> Upgrade com.jayway.jsonpath » json-path to 2.8.0 to fix CVEs
> 
>
> Key: HIVE-27290
> URL: https://issues.apache.org/jira/browse/HIVE-27290
> Project: Hive
>  Issue Type: Task
>Reporter: Devaspati Krishnatri
>Assignee: Devaspati Krishnatri
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27293) Vectorization: Incorrect results with nvl for ORC table

2023-04-25 Thread Riju Trivedi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi updated HIVE-27293:

Attachment: esource.txt

> Vectorization: Incorrect results with nvl for ORC table
> ---
>
> Key: HIVE-27293
> URL: https://issues.apache.org/jira/browse/HIVE-27293
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: Riju Trivedi
>Priority: Major
> Attachments: esource.txt, vectorization_nvl.q
>
>
> Attached repro.q file and data file used to reproduce the issue.
> {code:java}
> Insert overwrite table etarget
> select mt.*, floor(rand() * 1) as bdata_no from (select nvl(np.client_id,' 
> '),nvl(np.id_enddate,cast(0 as decimal(10,0))),nvl(np.client_gender,' 
> '),nvl(np.birthday,cast(0 as decimal(10,0))),nvl(np.nationality,' 
> '),nvl(np.address_zipcode,' '),nvl(np.income,cast(0 as 
> decimal(15,2))),nvl(np.address,' '),nvl(np.part_date,cast(0 as int)) from 
> (select * from esource where part_date = 20230414) np) mt;
>  {code}
> Outcome:
> {code:java}
> select client_id,birthday,income from etarget; 
> 15678   0  0.00
> 67891  19313  -1.00
> 12345  0  0.00{code}
> Expected Result :
> {code:java}
> select client_id,birthday,income from etarget; 
> 12345 19613 -1.00
> 67891 19313 -1.00 
> 15678 0  0.00{code}
> Disabling hive.vectorized.use.vectorized.input.format produces correct output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27293) Vectorization: Incorrect results with nvl for ORC table

2023-04-25 Thread Riju Trivedi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi updated HIVE-27293:

Description: 
Attached repro.q file and data file used to reproduce the issue.
{code:java}
Insert overwrite table etarget
select mt.*, floor(rand() * 1) as bdata_no from (select nvl(np.client_id,' 
'),nvl(np.id_enddate,cast(0 as decimal(10,0))),nvl(np.client_gender,' 
'),nvl(np.birthday,cast(0 as decimal(10,0))),nvl(np.nationality,' 
'),nvl(np.address_zipcode,' '),nvl(np.income,cast(0 as 
decimal(15,2))),nvl(np.address,' '),nvl(np.part_date,cast(0 as int)) from 
(select * from esource where part_date = 20230414) np) mt;
 {code}
Outcome:
{code:java}
select client_id,birthday,income from etarget; 
15678   0  0.00
67891  19313  -1.00
12345  0  0.00{code}
Expected Result :
{code:java}
select client_id,birthday,income from etarget; 
12345 19613 -1.00
67891 19313 -1.00 
15678 0  0.00{code}
Disabling hive.vectorized.use.vectorized.input.format produces correct output.

  was:
Attached repro.q file and data file used to reproduce the issue.
{code:java}
Insert overwrite table etarget
select mt.*, floor(rand() * 1) as bdata_no from (select nvl(np.client_id,' 
'),nvl(np.id_enddate,cast(0 as decimal(10,0))),nvl(np.client_gender,' 
'),nvl(np.birthday,cast(0 as decimal(10,0))),nvl(np.nationality,' 
'),nvl(np.address_zipcode,' '),nvl(np.income,cast(0 as 
decimal(15,2))),nvl(np.address,' '),nvl(np.part_date,cast(0 as int)) from 
(select * from esource where part_date = 20230414) np) mt;
 {code}
Outcome:
{code:java}
select client_id,birthday,income from etarget;
889004570706    0   0.00
889004570838    19880313    -1.00
889005389931    0   0.00 {code}
Expected Result :
{code:java}
select client_id,birthday,income from etarget;
889004570706    0   0.00
889004570838    19880313    -1.00
889005389931    19880613    -1.00 {code}
Disabling hive.vectorized.use.vectorized.input.format produces correct output.


> Vectorization: Incorrect results with nvl for ORC table
> ---
>
> Key: HIVE-27293
> URL: https://issues.apache.org/jira/browse/HIVE-27293
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: Riju Trivedi
>Priority: Major
> Attachments: vectorization_nvl.q
>
>
> Attached repro.q file and data file used to reproduce the issue.
> {code:java}
> Insert overwrite table etarget
> select mt.*, floor(rand() * 1) as bdata_no from (select nvl(np.client_id,' 
> '),nvl(np.id_enddate,cast(0 as decimal(10,0))),nvl(np.client_gender,' 
> '),nvl(np.birthday,cast(0 as decimal(10,0))),nvl(np.nationality,' 
> '),nvl(np.address_zipcode,' '),nvl(np.income,cast(0 as 
> decimal(15,2))),nvl(np.address,' '),nvl(np.part_date,cast(0 as int)) from 
> (select * from esource where part_date = 20230414) np) mt;
>  {code}
> Outcome:
> {code:java}
> select client_id,birthday,income from etarget; 
> 15678   0  0.00
> 67891  19313  -1.00
> 12345  0  0.00{code}
> Expected Result :
> {code:java}
> select client_id,birthday,income from etarget; 
> 12345 19613 -1.00
> 67891 19313 -1.00 
> 15678 0  0.00{code}
> Disabling hive.vectorized.use.vectorized.input.format produces correct output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27293) Vectorization: Incorrect results with nvl for ORC table

2023-04-25 Thread Riju Trivedi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi updated HIVE-27293:

Attachment: (was: esource.txt)

> Vectorization: Incorrect results with nvl for ORC table
> ---
>
> Key: HIVE-27293
> URL: https://issues.apache.org/jira/browse/HIVE-27293
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: Riju Trivedi
>Priority: Major
> Attachments: vectorization_nvl.q
>
>
> Attached repro.q file and data file used to reproduce the issue.
> {code:java}
> Insert overwrite table etarget
> select mt.*, floor(rand() * 1) as bdata_no from (select nvl(np.client_id,' 
> '),nvl(np.id_enddate,cast(0 as decimal(10,0))),nvl(np.client_gender,' 
> '),nvl(np.birthday,cast(0 as decimal(10,0))),nvl(np.nationality,' 
> '),nvl(np.address_zipcode,' '),nvl(np.income,cast(0 as 
> decimal(15,2))),nvl(np.address,' '),nvl(np.part_date,cast(0 as int)) from 
> (select * from esource where part_date = 20230414) np) mt;
>  {code}
> Outcome:
> {code:java}
> select client_id,birthday,income from etarget;
> 889004570706    0   0.00
> 889004570838    19880313    -1.00
> 889005389931    0   0.00 {code}
> Expected Result :
> {code:java}
> select client_id,birthday,income from etarget;
> 889004570706    0   0.00
> 889004570838    19880313    -1.00
> 889005389931    19880613    -1.00 {code}
> Disabling hive.vectorized.use.vectorized.input.format produces correct output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27271) Client connection to HS2 fails when transportMode=http, ssl=true, sslTrustStore specified without trustStorePassword in the JDBC URL

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27271?focusedWorklogId=858911=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858911
 ]

ASF GitHub Bot logged work on HIVE-27271:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 11:36
Start Date: 25/Apr/23 11:36
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4262:
URL: https://github.com/apache/hive/pull/4262#issuecomment-1521638877

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4262)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4262=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4262=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4262=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4262=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4262=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4262=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4262=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4262=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4262=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4262=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4262=false=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4262=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4262=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4262=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 858911)
Time Spent: 0.5h  (was: 20m)

> Client connection to HS2 fails when transportMode=http, ssl=true, 
> sslTrustStore specified without trustStorePassword in the JDBC URL
> 
>
> Key: HIVE-27271
> URL: https://issues.apache.org/jira/browse/HIVE-27271
> Project: Hive
>  Issue Type: Bug
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-04-19-14-27-23-665.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> *[Description]*
> Client connection to HS2 fails with transportMode as http, ssl is enabled, 
> sslTrustStore is specified without trustStorePassword in the JDBC URL. Where 
> as with transportMode as binary, connection is successful without 
> trustStorePassword in the connection URL.
> trustStorePassword is not a necessary parameter in connection URL. Connection 
> can be established without it.
> From the javadocs 
> [Link|https://docs.oracle.com/javase/7/docs/api/java/security/KeyStore.html#load(java.io.InputStream,%20char%5B%5D)]
>  A password may be given to unlock the keystore (e.g. the keystore resides on 
> a hardware token device), or to check the integrity of the keystore data. If 
> a password is not given for integrity checking, then integrity checking is 

[jira] [Work logged] (HIVE-27278) Simplify correlated queries with empty inputs

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27278?focusedWorklogId=858912=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858912
 ]

ASF GitHub Bot logged work on HIVE-27278:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 11:36
Start Date: 25/Apr/23 11:36
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #4253:
URL: https://github.com/apache/hive/pull/4253#discussion_r1176385809


##
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRemoveEmptySingleRules.java:
##
@@ -192,6 +178,79 @@ public interface JoinRightEmptyRuleConfig extends 
PruneEmptyRule.Config {
 }
   }
 
+  private static RelNode padWithNulls(RelBuilder builder, RelNode input, 
RelDataType resultType,
+  boolean leftPadding) {
+int padding = resultType.getFieldCount() - 
input.getRowType().getFieldCount();
+List nullLiterals = Collections.nCopies(padding, 
builder.literal(null));
+builder.push(input);
+if (leftPadding) {
+  builder.project(concat(nullLiterals, builder.fields()));
+} else {
+  builder.project(concat(builder.fields(), nullLiterals));
+}
+return builder.convert(resultType, true).build();
+  }
+
+  public static final RelOptRule CORRELATE_RIGHT_INSTANCE = 
RelRule.Config.EMPTY
+  .withOperandSupplier(b0 ->
+  b0.operand(Correlate.class).inputs(
+  b1 -> b1.operand(RelNode.class).anyInputs(),
+  b2 -> 
b2.operand(Values.class).predicate(Values::isEmpty).noInputs()))
+  .withDescription("PruneEmptyCorrelate(right)")
+  .withRelBuilderFactory(HiveRelFactories.HIVE_BUILDER)
+  .as(CorrelateEmptyRuleConfig.class)
+  .toRule();
+  public static final RelOptRule CORRELATE_LEFT_INSTANCE = RelRule.Config.EMPTY
+  .withOperandSupplier(b0 ->
+  b0.operand(Correlate.class).inputs(
+  b1 -> 
b1.operand(Values.class).predicate(Values::isEmpty).noInputs(),
+  b2 -> b2.operand(RelNode.class).anyInputs()))
+  .withDescription("PruneEmptyCorrelate(left)")
+  .withRelBuilderFactory(HiveRelFactories.HIVE_BUILDER)
+  .as(CorrelateEmptyRuleConfig.class)
+  .toRule();
+  
+  /** Configuration for rule that prunes a correlate if one of its inputs is 
empty. */
+  public interface CorrelateEmptyRuleConfig extends PruneEmptyRule.Config {

Review Comment:
   When the `Values` is on the left side you only need the `Correlate`. The 
also `if` goes away at Calcite upgrade.
   ```
   onMatch(RelOptRuleCall call) {
 final Correlate corr = call.rel(0);
 RelBuilder b = call.builder();
 call.transformTo(b.push(corr).empty().build());
   }
   ```
   I leave this up to you. :)





Issue Time Tracking
---

Worklog Id: (was: 858912)
Time Spent: 1.5h  (was: 1h 20m)

> Simplify correlated queries with empty inputs
> -
>
> Key: HIVE-27278
> URL: https://issues.apache.org/jira/browse/HIVE-27278
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The correlated query below will not produce any result no matter the content 
> of the table.
> {code:sql}
> create table t1 (id int, val varchar(10)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> create table t2 (id int, val varchar(10)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id 
> = t2.id);
> {code}
> The CBO is able to derive that part of the query is empty and ends up with 
> the following plan.
> {noformat}
> CBO PLAN:
> HiveProject(id=[$0])
>   LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{}])
> HiveTableScan(table=[[default, t1]], table:alias=[t1])
> HiveValues(tuples=[[]])
> {noformat}
> The presence of LogicalCorrelate is first redundant but also problematic 
> since many parts of the optimizer assume that queries are decorrelated and do 
> not know how to handle the LogicalCorrelate.
> In the presence of views the same query can lead to the following exception 
> during compilation.
> {code:sql}
> CREATE MATERIALIZED VIEW v1 AS SELECT id FROM t2;
> EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id 
> = t2.id);
> {code}
> {noformat}
> org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not 
> enough rules to produce a node with desired properties: convention=HIVE, 
> sort=[], dist=any. All the inputs have relevant nodes, however the cost is 
> still infinite.
> Root: rel#185:RelSubset#3.HIVE.[].any
> Original rel:
> HiveProject(id=[$0]): rowcount = 4.0, 

[jira] [Work logged] (HIVE-27278) Simplify correlated queries with empty inputs

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27278?focusedWorklogId=858913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858913
 ]

ASF GitHub Bot logged work on HIVE-27278:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 11:36
Start Date: 25/Apr/23 11:36
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on PR #4253:
URL: https://github.com/apache/hive/pull/4253#issuecomment-1521639063

   @zabetak 
   Does any follow-up required once 
[CALCITE-5568](https://issues.apache.org/jira/browse/CALCITE-5568) gets 
resolved?
   In the mean time you can merge this patch.




Issue Time Tracking
---

Worklog Id: (was: 858913)
Time Spent: 1h 40m  (was: 1.5h)

> Simplify correlated queries with empty inputs
> -
>
> Key: HIVE-27278
> URL: https://issues.apache.org/jira/browse/HIVE-27278
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The correlated query below will not produce any result no matter the content 
> of the table.
> {code:sql}
> create table t1 (id int, val varchar(10)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> create table t2 (id int, val varchar(10)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id 
> = t2.id);
> {code}
> The CBO is able to derive that part of the query is empty and ends up with 
> the following plan.
> {noformat}
> CBO PLAN:
> HiveProject(id=[$0])
>   LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{}])
> HiveTableScan(table=[[default, t1]], table:alias=[t1])
> HiveValues(tuples=[[]])
> {noformat}
> The presence of LogicalCorrelate is first redundant but also problematic 
> since many parts of the optimizer assume that queries are decorrelated and do 
> not know how to handle the LogicalCorrelate.
> In the presence of views the same query can lead to the following exception 
> during compilation.
> {code:sql}
> CREATE MATERIALIZED VIEW v1 AS SELECT id FROM t2;
> EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id 
> = t2.id);
> {code}
> {noformat}
> org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not 
> enough rules to produce a node with desired properties: convention=HIVE, 
> sort=[], dist=any. All the inputs have relevant nodes, however the cost is 
> still infinite.
> Root: rel#185:RelSubset#3.HIVE.[].any
> Original rel:
> HiveProject(id=[$0]): rowcount = 4.0, cumulative cost = {20.0 rows, 13.0 cpu, 
> 0.0 io}, id = 178
>   LogicalCorrelate(correlation=[$cor0], joinType=[semi], 
> requiredColumns=[{}]): rowcount = 4.0, cumulative cost = {16.0 rows, 9.0 cpu, 
> 0.0 io}, id = 176
> HiveTableScan(table=[[default, t1]], table:alias=[t1]): rowcount = 4.0, 
> cumulative cost = {4.0 rows, 5.0 cpu, 0.0 io}, id = 111
> HiveValues(tuples=[[]]): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 
> cpu, 0.0 io}, id = 139
> Sets:
> Set#0, type: RecordType(INTEGER id, VARCHAR(10) val, BIGINT 
> BLOCK__OFFSET__INSIDE__FILE, VARCHAR(2147483647) INPUT__FILE__NAME, 
> RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid) ROW__ID, BOOLEAN 
> ROW__IS__DELETED)
>   rel#180:RelSubset#0.HIVE.[].any, best=rel#111
>   rel#111:HiveTableScan.HIVE.[].any(table=[default, 
> t1],htColumns=[0, 1, 2, 3, 4, 
> 5],insideView=false,plKey=default.t1;,table:alias=t1,tableScanTrait=null), 
> rowcount=4.0, cumulative cost={4.0 rows, 5.0 cpu, 0.0 io}
> Set#1, type: RecordType(NULL _o__c0)
>   rel#181:RelSubset#1.HIVE.[].any, best=rel#139
>   rel#139:HiveValues.HIVE.[].any(type=RecordType(NULL 
> _o__c0),tuples=[]), rowcount=1.0, cumulative cost={1.0 rows, 1.0 cpu, 0.0 io}
> Set#2, type: RecordType(INTEGER id, VARCHAR(10) val, BIGINT 
> BLOCK__OFFSET__INSIDE__FILE, VARCHAR(2147483647) INPUT__FILE__NAME, 
> RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid) ROW__ID, BOOLEAN 
> ROW__IS__DELETED)
>   rel#183:RelSubset#2.NONE.[].any, best=null
>   
> rel#182:LogicalCorrelate.NONE.[].any(left=RelSubset#180,right=RelSubset#181,correlation=$cor0,joinType=semi,requiredColumns={}),
>  rowcount=4.0, cumulative cost={inf}
> Set#3, type: RecordType(INTEGER id)
>   rel#185:RelSubset#3.HIVE.[].any, best=null
>   
> rel#184:HiveProject.HIVE.[].any(input=RelSubset#183,inputs=0,synthetic=false),
>  rowcount=4.0, cumulative cost={inf}
> Graphviz:
> digraph G {
>   root [style=filled,label="Root"];
>   subgraph cluster0{
>   label="Set 0 RecordType(INTEGER id, VARCHAR(10) 

[jira] [Work logged] (HIVE-27273) Iceberg: Upgrade iceberg to 1.2.1

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27273?focusedWorklogId=858910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858910
 ]

ASF GitHub Bot logged work on HIVE-27273:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 11:32
Start Date: 25/Apr/23 11:32
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4252:
URL: https://github.com/apache/hive/pull/4252#issuecomment-1521634951

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4252)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4252=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4252=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4252=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4252=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4252=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4252=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 858910)
Time Spent: 1h  (was: 50m)

> Iceberg:  Upgrade iceberg to 1.2.1
> --
>
> Key: HIVE-27273
> URL: https://issues.apache.org/jira/browse/HIVE-27273
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://iceberg.apache.org/releases/#121-release] Iceberg1.2.1(include 
> 1.2.0) has lots of improvement, e.g. _branch commit_  and 
> _{{position_deletes}} metadata table._



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27293) Vectorization: Incorrect results with nvl for ORC table

2023-04-25 Thread Riju Trivedi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi updated HIVE-27293:

Attachment: (was: vectorization_nvl.q)

> Vectorization: Incorrect results with nvl for ORC table
> ---
>
> Key: HIVE-27293
> URL: https://issues.apache.org/jira/browse/HIVE-27293
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: Riju Trivedi
>Priority: Major
> Attachments: esource.txt, vectorization_nvl.q
>
>
> Attached repro.q file and data file used to reproduce the issue.
> {code:java}
> Insert overwrite table etarget
> select mt.*, floor(rand() * 1) as bdata_no from (select nvl(np.client_id,' 
> '),nvl(np.id_enddate,cast(0 as decimal(10,0))),nvl(np.client_gender,' 
> '),nvl(np.birthday,cast(0 as decimal(10,0))),nvl(np.nationality,' 
> '),nvl(np.address_zipcode,' '),nvl(np.income,cast(0 as 
> decimal(15,2))),nvl(np.address,' '),nvl(np.part_date,cast(0 as int)) from 
> (select * from esource where part_date = 20230414) np) mt;
>  {code}
> Outcome:
> {code:java}
> select client_id,birthday,income from etarget;
> 889004570706    0   0.00
> 889004570838    19880313    -1.00
> 889005389931    0   0.00 {code}
> Expected Result :
> {code:java}
> select client_id,birthday,income from etarget;
> 889004570706    0   0.00
> 889004570838    19880313    -1.00
> 889005389931    19880613    -1.00 {code}
> Disabling hive.vectorized.use.vectorized.input.format produces correct output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27293) Vectorization: Incorrect results with nvl for ORC table

2023-04-25 Thread Riju Trivedi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi updated HIVE-27293:

Attachment: vectorization_nvl.q

> Vectorization: Incorrect results with nvl for ORC table
> ---
>
> Key: HIVE-27293
> URL: https://issues.apache.org/jira/browse/HIVE-27293
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: Riju Trivedi
>Priority: Major
> Attachments: esource.txt, vectorization_nvl.q
>
>
> Attached repro.q file and data file used to reproduce the issue.
> {code:java}
> Insert overwrite table etarget
> select mt.*, floor(rand() * 1) as bdata_no from (select nvl(np.client_id,' 
> '),nvl(np.id_enddate,cast(0 as decimal(10,0))),nvl(np.client_gender,' 
> '),nvl(np.birthday,cast(0 as decimal(10,0))),nvl(np.nationality,' 
> '),nvl(np.address_zipcode,' '),nvl(np.income,cast(0 as 
> decimal(15,2))),nvl(np.address,' '),nvl(np.part_date,cast(0 as int)) from 
> (select * from esource where part_date = 20230414) np) mt;
>  {code}
> Outcome:
> {code:java}
> select client_id,birthday,income from etarget;
> 889004570706    0   0.00
> 889004570838    19880313    -1.00
> 889005389931    0   0.00 {code}
> Expected Result :
> {code:java}
> select client_id,birthday,income from etarget;
> 889004570706    0   0.00
> 889004570838    19880313    -1.00
> 889005389931    19880613    -1.00 {code}
> Disabling hive.vectorized.use.vectorized.input.format produces correct output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27293) Vectorization: Incorrect results with nvl for ORC table

2023-04-25 Thread Riju Trivedi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi updated HIVE-27293:

Attachment: (was: vector_nvl.q)

> Vectorization: Incorrect results with nvl for ORC table
> ---
>
> Key: HIVE-27293
> URL: https://issues.apache.org/jira/browse/HIVE-27293
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: Riju Trivedi
>Priority: Major
> Attachments: esource.txt, vectorization_nvl.q
>
>
> Attached repro.q file and data file used to reproduce the issue.
> {code:java}
> Insert overwrite table etarget
> select mt.*, floor(rand() * 1) as bdata_no from (select nvl(np.client_id,' 
> '),nvl(np.id_enddate,cast(0 as decimal(10,0))),nvl(np.client_gender,' 
> '),nvl(np.birthday,cast(0 as decimal(10,0))),nvl(np.nationality,' 
> '),nvl(np.address_zipcode,' '),nvl(np.income,cast(0 as 
> decimal(15,2))),nvl(np.address,' '),nvl(np.part_date,cast(0 as int)) from 
> (select * from esource where part_date = 20230414) np) mt;
>  {code}
> Outcome:
> {code:java}
> select client_id,birthday,income from etarget;
> 889004570706    0   0.00
> 889004570838    19880313    -1.00
> 889005389931    0   0.00 {code}
> Expected Result :
> {code:java}
> select client_id,birthday,income from etarget;
> 889004570706    0   0.00
> 889004570838    19880313    -1.00
> 889005389931    19880613    -1.00 {code}
> Disabling hive.vectorized.use.vectorized.input.format produces correct output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27293) Vectorization: Incorrect results with nvl for ORC table

2023-04-25 Thread Riju Trivedi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi updated HIVE-27293:

Attachment: vectorization_nvl.q

> Vectorization: Incorrect results with nvl for ORC table
> ---
>
> Key: HIVE-27293
> URL: https://issues.apache.org/jira/browse/HIVE-27293
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: Riju Trivedi
>Priority: Major
> Attachments: esource.txt, vectorization_nvl.q
>
>
> Attached repro.q file and data file used to reproduce the issue.
> {code:java}
> Insert overwrite table etarget
> select mt.*, floor(rand() * 1) as bdata_no from (select nvl(np.client_id,' 
> '),nvl(np.id_enddate,cast(0 as decimal(10,0))),nvl(np.client_gender,' 
> '),nvl(np.birthday,cast(0 as decimal(10,0))),nvl(np.nationality,' 
> '),nvl(np.address_zipcode,' '),nvl(np.income,cast(0 as 
> decimal(15,2))),nvl(np.address,' '),nvl(np.part_date,cast(0 as int)) from 
> (select * from esource where part_date = 20230414) np) mt;
>  {code}
> Outcome:
> {code:java}
> select client_id,birthday,income from etarget;
> 889004570706    0   0.00
> 889004570838    19880313    -1.00
> 889005389931    0   0.00 {code}
> Expected Result :
> {code:java}
> select client_id,birthday,income from etarget;
> 889004570706    0   0.00
> 889004570838    19880313    -1.00
> 889005389931    19880613    -1.00 {code}
> Disabling hive.vectorized.use.vectorized.input.format produces correct output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27293) Vectorization: Incorrect results with nvl for ORC table

2023-04-25 Thread Riju Trivedi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi updated HIVE-27293:

Attachment: vector_nvl.q

> Vectorization: Incorrect results with nvl for ORC table
> ---
>
> Key: HIVE-27293
> URL: https://issues.apache.org/jira/browse/HIVE-27293
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: Riju Trivedi
>Priority: Major
> Attachments: esource.txt, vector_nvl.q
>
>
> Attached repro.q file and data file used to reproduce the issue.
> {code:java}
> Insert overwrite table etarget
> select mt.*, floor(rand() * 1) as bdata_no from (select nvl(np.client_id,' 
> '),nvl(np.id_enddate,cast(0 as decimal(10,0))),nvl(np.client_gender,' 
> '),nvl(np.birthday,cast(0 as decimal(10,0))),nvl(np.nationality,' 
> '),nvl(np.address_zipcode,' '),nvl(np.income,cast(0 as 
> decimal(15,2))),nvl(np.address,' '),nvl(np.part_date,cast(0 as int)) from 
> (select * from esource where part_date = 20230414) np) mt;
>  {code}
> Outcome:
> {code:java}
> select client_id,birthday,income from etarget;
> 889004570706    0   0.00
> 889004570838    19880313    -1.00
> 889005389931    0   0.00 {code}
> Expected Result :
> {code:java}
> select client_id,birthday,income from etarget;
> 889004570706    0   0.00
> 889004570838    19880313    -1.00
> 889005389931    19880613    -1.00 {code}
> Disabling hive.vectorized.use.vectorized.input.format produces correct output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27293) Vectorization: Incorrect results with nvl for ORC table

2023-04-25 Thread Riju Trivedi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi updated HIVE-27293:

Attachment: esource.txt

> Vectorization: Incorrect results with nvl for ORC table
> ---
>
> Key: HIVE-27293
> URL: https://issues.apache.org/jira/browse/HIVE-27293
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: Riju Trivedi
>Priority: Major
> Attachments: esource.txt
>
>
> Attached repro.q file and data file used to reproduce the issue.
> {code:java}
> Insert overwrite table etarget
> select mt.*, floor(rand() * 1) as bdata_no from (select nvl(np.client_id,' 
> '),nvl(np.id_enddate,cast(0 as decimal(10,0))),nvl(np.client_gender,' 
> '),nvl(np.birthday,cast(0 as decimal(10,0))),nvl(np.nationality,' 
> '),nvl(np.address_zipcode,' '),nvl(np.income,cast(0 as 
> decimal(15,2))),nvl(np.address,' '),nvl(np.part_date,cast(0 as int)) from 
> (select * from esource where part_date = 20230414) np) mt;
>  {code}
> Outcome:
> {code:java}
> select client_id,birthday,income from etarget;
> 889004570706    0   0.00
> 889004570838    19880313    -1.00
> 889005389931    0   0.00 {code}
> Expected Result :
> {code:java}
> select client_id,birthday,income from etarget;
> 889004570706    0   0.00
> 889004570838    19880313    -1.00
> 889005389931    19880613    -1.00 {code}
> Disabling hive.vectorized.use.vectorized.input.format produces correct output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27293) Vectorization: Incorrect results with nvl for ORC table

2023-04-25 Thread Riju Trivedi (Jira)
Riju Trivedi created HIVE-27293:
---

 Summary: Vectorization: Incorrect results with nvl for ORC table
 Key: HIVE-27293
 URL: https://issues.apache.org/jira/browse/HIVE-27293
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 4.0.0-alpha-2
Reporter: Riju Trivedi
 Attachments: esource.txt

Attached repro.q file and data file used to reproduce the issue.
{code:java}
Insert overwrite table etarget
select mt.*, floor(rand() * 1) as bdata_no from (select nvl(np.client_id,' 
'),nvl(np.id_enddate,cast(0 as decimal(10,0))),nvl(np.client_gender,' 
'),nvl(np.birthday,cast(0 as decimal(10,0))),nvl(np.nationality,' 
'),nvl(np.address_zipcode,' '),nvl(np.income,cast(0 as 
decimal(15,2))),nvl(np.address,' '),nvl(np.part_date,cast(0 as int)) from 
(select * from esource where part_date = 20230414) np) mt;
 {code}
Outcome:
{code:java}
select client_id,birthday,income from etarget;
889004570706    0   0.00
889004570838    19880313    -1.00
889005389931    0   0.00 {code}
Expected Result :
{code:java}
select client_id,birthday,income from etarget;
889004570706    0   0.00
889004570838    19880313    -1.00
889005389931    19880613    -1.00 {code}
Disabling hive.vectorized.use.vectorized.input.format produces correct output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27285) Add TableMeta ownership for filterTableMetas

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27285?focusedWorklogId=858897=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858897
 ]

ASF GitHub Bot logged work on HIVE-27285:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 10:35
Start Date: 25/Apr/23 10:35
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on PR #4258:
URL: https://github.com/apache/hive/pull/4258#issuecomment-1521560192

   The change overall looks good to me, leave some minor comments.




Issue Time Tracking
---

Worklog Id: (was: 858897)
Time Spent: 1h 50m  (was: 1h 40m)

> Add TableMeta ownership for filterTableMetas
> 
>
> Key: HIVE-27285
> URL: https://issues.apache.org/jira/browse/HIVE-27285
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently TableMeta does not include ownership information which makes it 
> difficult for filterTableMetas to efficiently filter based on ${OWNER} 
> privileges.
> We should add ownership information to TableMeta and utilizing it in 
> filterTableMetas authorization checks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27285) Add TableMeta ownership for filterTableMetas

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27285?focusedWorklogId=858896=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858896
 ]

ASF GitHub Bot logged work on HIVE-27285:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 10:34
Start Date: 25/Apr/23 10:34
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4258:
URL: https://github.com/apache/hive/pull/4258#discussion_r1176325528


##
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/minihms/AbstractMetaStoreService.java:
##
@@ -99,7 +99,7 @@ public void start(Map 
metastoreOverlay,
* @return The client connected to this service
* @throws MetaException if any Exception occurs during client configuration
*/
-  public IMetaStoreClient getClient() throws MetaException {

Review Comment:
   nit: Is there any particular reason to change it to return 
`HiveMetaStoreClient`





Issue Time Tracking
---

Worklog Id: (was: 858896)
Time Spent: 1h 40m  (was: 1.5h)

> Add TableMeta ownership for filterTableMetas
> 
>
> Key: HIVE-27285
> URL: https://issues.apache.org/jira/browse/HIVE-27285
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently TableMeta does not include ownership information which makes it 
> difficult for filterTableMetas to efficiently filter based on ${OWNER} 
> privileges.
> We should add ownership information to TableMeta and utilizing it in 
> filterTableMetas authorization checks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27285) Add TableMeta ownership for filterTableMetas

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27285?focusedWorklogId=858895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858895
 ]

ASF GitHub Bot logged work on HIVE-27285:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 10:33
Start Date: 25/Apr/23 10:33
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4258:
URL: https://github.com/apache/hive/pull/4258#discussion_r1176324646


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreFilterHook.java:
##
@@ -85,15 +85,13 @@ default List filterCatalogs(List catalogs) 
throws MetaException
   List filterTableNames(String catName, String dbName, List 
tableList)
   throws MetaException;
 
-  // Previously this was handled by filterTableNames.  But it can't be anymore 
because we can no
-  // longer depend on a 1-1 mapping between table name and entry in the list.
   /**
* Filter a list of TableMeta objects.
* @param tableMetas list of TableMetas to filter
* @return filtered table metas
* @throws MetaException something went wrong
*/
-  List filterTableMetas(String catName,String 
dbName,List tableMetas) throws MetaException;
+  List filterTableMetas(List tableMetas) throws 
MetaException;

Review Comment:
   nit: I'm not sure the right way, perhaps we can mark the original method 
DEPRECATED and print WARN message





Issue Time Tracking
---

Worklog Id: (was: 858895)
Time Spent: 1.5h  (was: 1h 20m)

> Add TableMeta ownership for filterTableMetas
> 
>
> Key: HIVE-27285
> URL: https://issues.apache.org/jira/browse/HIVE-27285
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently TableMeta does not include ownership information which makes it 
> difficult for filterTableMetas to efficiently filter based on ${OWNER} 
> privileges.
> We should add ownership information to TableMeta and utilizing it in 
> filterTableMetas authorization checks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27285) Add TableMeta ownership for filterTableMetas

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27285?focusedWorklogId=858894=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858894
 ]

ASF GitHub Bot logged work on HIVE-27285:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 10:29
Start Date: 25/Apr/23 10:29
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #4258:
URL: https://github.com/apache/hive/pull/4258#discussion_r1176320733


##
ql/src/test/queries/clientpositive/authorization_privilege_objects.q:
##
@@ -0,0 +1,20 @@
+--! qt:authorizer
+set 
hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
+set test.hive.authz.sstd.validator.outputPrivObjs=true;
+set hive.test.authz.sstd.hs2.mode=true;
+set user.name=testuser;
+
+CREATE DATABASE test_db;
+CREATE TABLE test_privs(i int);
+set user.name=testuser2;
+CREATE TABLE test_privs2(s string, i int);
+set user.name=testuser;
+SHOW DATABASES;
+SHOW TABLES;

Review Comment:
   nit: I'm wondering these queries doesn't  reach the `getTableMeta` method.



##
ql/src/test/queries/clientpositive/authorization_privilege_objects.q:
##
@@ -0,0 +1,20 @@
+--! qt:authorizer
+set 
hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
+set test.hive.authz.sstd.validator.outputPrivObjs=true;
+set hive.test.authz.sstd.hs2.mode=true;
+set user.name=testuser;
+
+CREATE DATABASE test_db;
+CREATE TABLE test_privs(i int);
+set user.name=testuser2;
+CREATE TABLE test_privs2(s string, i int);
+set user.name=testuser;
+SHOW DATABASES;
+SHOW TABLES;

Review Comment:
   nit: I'm wondering these queries don't  reach the `getTableMeta` method.





Issue Time Tracking
---

Worklog Id: (was: 858894)
Time Spent: 1h 20m  (was: 1h 10m)

> Add TableMeta ownership for filterTableMetas
> 
>
> Key: HIVE-27285
> URL: https://issues.apache.org/jira/browse/HIVE-27285
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently TableMeta does not include ownership information which makes it 
> difficult for filterTableMetas to efficiently filter based on ${OWNER} 
> privileges.
> We should add ownership information to TableMeta and utilizing it in 
> filterTableMetas authorization checks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27278) Simplify correlated queries with empty inputs

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27278?focusedWorklogId=858889=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858889
 ]

ASF GitHub Bot logged work on HIVE-27278:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 10:12
Start Date: 25/Apr/23 10:12
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #4253:
URL: https://github.com/apache/hive/pull/4253#discussion_r1176303967


##
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRemoveEmptySingleRules.java:
##
@@ -192,6 +178,79 @@ public interface JoinRightEmptyRuleConfig extends 
PruneEmptyRule.Config {
 }
   }
 
+  private static RelNode padWithNulls(RelBuilder builder, RelNode input, 
RelDataType resultType,
+  boolean leftPadding) {
+int padding = resultType.getFieldCount() - 
input.getRowType().getFieldCount();
+List nullLiterals = Collections.nCopies(padding, 
builder.literal(null));
+builder.push(input);
+if (leftPadding) {
+  builder.project(concat(nullLiterals, builder.fields()));
+} else {
+  builder.project(concat(builder.fields(), nullLiterals));
+}
+return builder.convert(resultType, true).build();
+  }
+
+  public static final RelOptRule CORRELATE_RIGHT_INSTANCE = 
RelRule.Config.EMPTY
+  .withOperandSupplier(b0 ->
+  b0.operand(Correlate.class).inputs(
+  b1 -> b1.operand(RelNode.class).anyInputs(),
+  b2 -> 
b2.operand(Values.class).predicate(Values::isEmpty).noInputs()))
+  .withDescription("PruneEmptyCorrelate(right)")
+  .withRelBuilderFactory(HiveRelFactories.HIVE_BUILDER)
+  .as(CorrelateEmptyRuleConfig.class)
+  .toRule();
+  public static final RelOptRule CORRELATE_LEFT_INSTANCE = RelRule.Config.EMPTY
+  .withOperandSupplier(b0 ->
+  b0.operand(Correlate.class).inputs(
+  b1 -> 
b1.operand(Values.class).predicate(Values::isEmpty).noInputs(),
+  b2 -> b2.operand(RelNode.class).anyInputs()))
+  .withDescription("PruneEmptyCorrelate(left)")
+  .withRelBuilderFactory(HiveRelFactories.HIVE_BUILDER)
+  .as(CorrelateEmptyRuleConfig.class)
+  .toRule();
+  
+  /** Configuration for rule that prunes a correlate if one of its inputs is 
empty. */
+  public interface CorrelateEmptyRuleConfig extends PruneEmptyRule.Config {

Review Comment:
   If split in separate classes we would have to duplicate the following 
fragment:
   ```
   @Override
   default PruneEmptyRule toRule() {
 return new PruneEmptyRule(this) {
   @Override
   public void onMatch(RelOptRuleCall call) {
 if (Bug.CALCITE_5669_FIXED) {
   throw new IllegalStateException(
   "Class is redundant after fix is merged into Calcite");
 }
 final Correlate corr = call.rel(0);
 final RelNode left = call.rel(1);
 final RelNode right = call.rel(2);
 final RelNode newRel;
 RelBuilder b = call.builder();
   ```
   Don't have a strong opinion on this; let me know what you prefer and will do 
the refactoring.



##
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRemoveEmptySingleRules.java:
##
@@ -192,6 +178,79 @@ public interface JoinRightEmptyRuleConfig extends 
PruneEmptyRule.Config {
 }
   }
 
+  private static RelNode padWithNulls(RelBuilder builder, RelNode input, 
RelDataType resultType,
+  boolean leftPadding) {
+int padding = resultType.getFieldCount() - 
input.getRowType().getFieldCount();
+List nullLiterals = Collections.nCopies(padding, 
builder.literal(null));
+builder.push(input);
+if (leftPadding) {
+  builder.project(concat(nullLiterals, builder.fields()));
+} else {
+  builder.project(concat(builder.fields(), nullLiterals));
+}
+return builder.convert(resultType, true).build();
+  }
+
+  public static final RelOptRule CORRELATE_RIGHT_INSTANCE = 
RelRule.Config.EMPTY
+  .withOperandSupplier(b0 ->
+  b0.operand(Correlate.class).inputs(
+  b1 -> b1.operand(RelNode.class).anyInputs(),
+  b2 -> 
b2.operand(Values.class).predicate(Values::isEmpty).noInputs()))
+  .withDescription("PruneEmptyCorrelate(right)")
+  .withRelBuilderFactory(HiveRelFactories.HIVE_BUILDER)
+  .as(CorrelateEmptyRuleConfig.class)
+  .toRule();
+  public static final RelOptRule CORRELATE_LEFT_INSTANCE = RelRule.Config.EMPTY
+  .withOperandSupplier(b0 ->
+  b0.operand(Correlate.class).inputs(
+  b1 -> 
b1.operand(Values.class).predicate(Values::isEmpty).noInputs(),
+  b2 -> b2.operand(RelNode.class).anyInputs()))
+  .withDescription("PruneEmptyCorrelate(left)")
+  .withRelBuilderFactory(HiveRelFactories.HIVE_BUILDER)
+  .as(CorrelateEmptyRuleConfig.class)
+  

[jira] [Work logged] (HIVE-27278) Simplify correlated queries with empty inputs

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27278?focusedWorklogId=858887=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858887
 ]

ASF GitHub Bot logged work on HIVE-27278:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 10:07
Start Date: 25/Apr/23 10:07
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #4253:
URL: https://github.com/apache/hive/pull/4253#discussion_r1176298845


##
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRemoveEmptySingleRules.java:
##
@@ -192,6 +178,79 @@ public interface JoinRightEmptyRuleConfig extends 
PruneEmptyRule.Config {
 }
   }
 
+  private static RelNode padWithNulls(RelBuilder builder, RelNode input, 
RelDataType resultType,
+  boolean leftPadding) {

Review Comment:
   This will lead to duplicating some code that I would like to avoid hence 
this refactoring.





Issue Time Tracking
---

Worklog Id: (was: 858887)
Time Spent: 1h 10m  (was: 1h)

> Simplify correlated queries with empty inputs
> -
>
> Key: HIVE-27278
> URL: https://issues.apache.org/jira/browse/HIVE-27278
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The correlated query below will not produce any result no matter the content 
> of the table.
> {code:sql}
> create table t1 (id int, val varchar(10)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> create table t2 (id int, val varchar(10)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id 
> = t2.id);
> {code}
> The CBO is able to derive that part of the query is empty and ends up with 
> the following plan.
> {noformat}
> CBO PLAN:
> HiveProject(id=[$0])
>   LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{}])
> HiveTableScan(table=[[default, t1]], table:alias=[t1])
> HiveValues(tuples=[[]])
> {noformat}
> The presence of LogicalCorrelate is first redundant but also problematic 
> since many parts of the optimizer assume that queries are decorrelated and do 
> not know how to handle the LogicalCorrelate.
> In the presence of views the same query can lead to the following exception 
> during compilation.
> {code:sql}
> CREATE MATERIALIZED VIEW v1 AS SELECT id FROM t2;
> EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id 
> = t2.id);
> {code}
> {noformat}
> org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not 
> enough rules to produce a node with desired properties: convention=HIVE, 
> sort=[], dist=any. All the inputs have relevant nodes, however the cost is 
> still infinite.
> Root: rel#185:RelSubset#3.HIVE.[].any
> Original rel:
> HiveProject(id=[$0]): rowcount = 4.0, cumulative cost = {20.0 rows, 13.0 cpu, 
> 0.0 io}, id = 178
>   LogicalCorrelate(correlation=[$cor0], joinType=[semi], 
> requiredColumns=[{}]): rowcount = 4.0, cumulative cost = {16.0 rows, 9.0 cpu, 
> 0.0 io}, id = 176
> HiveTableScan(table=[[default, t1]], table:alias=[t1]): rowcount = 4.0, 
> cumulative cost = {4.0 rows, 5.0 cpu, 0.0 io}, id = 111
> HiveValues(tuples=[[]]): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 
> cpu, 0.0 io}, id = 139
> Sets:
> Set#0, type: RecordType(INTEGER id, VARCHAR(10) val, BIGINT 
> BLOCK__OFFSET__INSIDE__FILE, VARCHAR(2147483647) INPUT__FILE__NAME, 
> RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid) ROW__ID, BOOLEAN 
> ROW__IS__DELETED)
>   rel#180:RelSubset#0.HIVE.[].any, best=rel#111
>   rel#111:HiveTableScan.HIVE.[].any(table=[default, 
> t1],htColumns=[0, 1, 2, 3, 4, 
> 5],insideView=false,plKey=default.t1;,table:alias=t1,tableScanTrait=null), 
> rowcount=4.0, cumulative cost={4.0 rows, 5.0 cpu, 0.0 io}
> Set#1, type: RecordType(NULL _o__c0)
>   rel#181:RelSubset#1.HIVE.[].any, best=rel#139
>   rel#139:HiveValues.HIVE.[].any(type=RecordType(NULL 
> _o__c0),tuples=[]), rowcount=1.0, cumulative cost={1.0 rows, 1.0 cpu, 0.0 io}
> Set#2, type: RecordType(INTEGER id, VARCHAR(10) val, BIGINT 
> BLOCK__OFFSET__INSIDE__FILE, VARCHAR(2147483647) INPUT__FILE__NAME, 
> RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid) ROW__ID, BOOLEAN 
> ROW__IS__DELETED)
>   rel#183:RelSubset#2.NONE.[].any, best=null
>   
> rel#182:LogicalCorrelate.NONE.[].any(left=RelSubset#180,right=RelSubset#181,correlation=$cor0,joinType=semi,requiredColumns={}),
>  rowcount=4.0, cumulative cost={inf}
> Set#3, type: RecordType(INTEGER id)
>   rel#185:RelSubset#3.HIVE.[].any, best=null
>   

[jira] [Work logged] (HIVE-27278) Simplify correlated queries with empty inputs

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27278?focusedWorklogId=858886=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858886
 ]

ASF GitHub Bot logged work on HIVE-27278:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 10:05
Start Date: 25/Apr/23 10:05
Worklog Time Spent: 10m 
  Work Description: zabetak commented on PR #4253:
URL: https://github.com/apache/hive/pull/4253#issuecomment-1521522423

   > This is a good contribution to Calcite and the patch looks good to me.
   > But normally we should not have LogicalCorrelate after the decorrelation 
step in Hive. Did you check why it is failed in case of the query mentioned in 
the jira?
   
   The decorrelator does not cope well with values. Initially, it will return 
null when it encounters the `LogicalValues` 
(https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java#L471)
 and then it will bail out when treating the `LogicalCorrelate` 
(https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java#L1247)
 since one of the inputs is not rewritten. The problem is still there in recent 
Calcite (CALCITE-5568).
   
   We can/could treat the problem in the decorrelator in two ways:
   a) By changing the `decorrelateRel` method(s); for sure the one for 
`decorrelateRel(LogicalCorrelate)`
   b) Putting the new rules and some more (reductions) inside 
`HiveRelDecorrelator#decorrelate(root)` 
(https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java#L238).
   
   @kasakrisz Let me know if you prefer that we explore one of the alternative 
paths mentioned above now or if we could postpone for a follow-up.




Issue Time Tracking
---

Worklog Id: (was: 858886)
Time Spent: 1h  (was: 50m)

> Simplify correlated queries with empty inputs
> -
>
> Key: HIVE-27278
> URL: https://issues.apache.org/jira/browse/HIVE-27278
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The correlated query below will not produce any result no matter the content 
> of the table.
> {code:sql}
> create table t1 (id int, val varchar(10)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> create table t2 (id int, val varchar(10)) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id 
> = t2.id);
> {code}
> The CBO is able to derive that part of the query is empty and ends up with 
> the following plan.
> {noformat}
> CBO PLAN:
> HiveProject(id=[$0])
>   LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{}])
> HiveTableScan(table=[[default, t1]], table:alias=[t1])
> HiveValues(tuples=[[]])
> {noformat}
> The presence of LogicalCorrelate is first redundant but also problematic 
> since many parts of the optimizer assume that queries are decorrelated and do 
> not know how to handle the LogicalCorrelate.
> In the presence of views the same query can lead to the following exception 
> during compilation.
> {code:sql}
> CREATE MATERIALIZED VIEW v1 AS SELECT id FROM t2;
> EXPLAIN CBO SELECT id FROM t1 WHERE NULL IN (SELECT NULL FROM t2 where t1.id 
> = t2.id);
> {code}
> {noformat}
> org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not 
> enough rules to produce a node with desired properties: convention=HIVE, 
> sort=[], dist=any. All the inputs have relevant nodes, however the cost is 
> still infinite.
> Root: rel#185:RelSubset#3.HIVE.[].any
> Original rel:
> HiveProject(id=[$0]): rowcount = 4.0, cumulative cost = {20.0 rows, 13.0 cpu, 
> 0.0 io}, id = 178
>   LogicalCorrelate(correlation=[$cor0], joinType=[semi], 
> requiredColumns=[{}]): rowcount = 4.0, cumulative cost = {16.0 rows, 9.0 cpu, 
> 0.0 io}, id = 176
> HiveTableScan(table=[[default, t1]], table:alias=[t1]): rowcount = 4.0, 
> cumulative cost = {4.0 rows, 5.0 cpu, 0.0 io}, id = 111
> HiveValues(tuples=[[]]): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 
> cpu, 0.0 io}, id = 139
> Sets:
> Set#0, type: RecordType(INTEGER id, VARCHAR(10) val, BIGINT 
> BLOCK__OFFSET__INSIDE__FILE, VARCHAR(2147483647) INPUT__FILE__NAME, 
> RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid) ROW__ID, BOOLEAN 
> ROW__IS__DELETED)
>   rel#180:RelSubset#0.HIVE.[].any, best=rel#111
>   

[jira] [Work logged] (HIVE-26982) Select * from a table containing timestamp column with default defined using TIMESTAMPLOCALTZ fails with error " ORC doesn't handle primitive category TIMESTAMPLOCALT

2023-04-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26982?focusedWorklogId=858885=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-858885
 ]

ASF GitHub Bot logged work on HIVE-26982:
-

Author: ASF GitHub Bot
Created on: 25/Apr/23 09:56
Start Date: 25/Apr/23 09:56
Worklog Time Spent: 10m 
  Work Description: zratkai opened a new pull request, #4265:
URL: https://github.com/apache/hive/pull/4265

   
   ### What changes were proposed in this pull request?
   This PR fixes issue :  "ORC doesn't handle primitive category 
TIMESTAMPLOCALTZ".
   
   
   ### Why are the changes needed?
   Fix TIMESTAMPLOCALTZ type handling.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, if user has a table containing timestamp column with default defined 
using TIMESTAMPLOCALTZ it fixes the  ORC doesn't handle primitive category 
TIMESTAMPLOCALTZ" exception-
   
   
   ### How was this patch tested?
   Jenkins pipeline.
   




Issue Time Tracking
---

Worklog Id: (was: 858885)
Remaining Estimate: 0h
Time Spent: 10m

> Select * from a table containing timestamp column with default defined using 
> TIMESTAMPLOCALTZ fails with error " ORC doesn't handle primitive category 
> TIMESTAMPLOCALTZ"
> 
>
> Key: HIVE-26982
> URL: https://issues.apache.org/jira/browse/HIVE-26982
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Dharmik Thakkar
>Assignee: Zoltán Rátkai
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Select * from a table containing timestamp column with default defined using 
> TIMESTAMPLOCALTZ fails with error " ORC doesn't handle primitive category 
> TIMESTAMPLOCALTZ"
> Logs
> {code:java}
> 2023-01-24T20:37:48,831 INFO  [pool-2-thread-1] jdbc.TestDriver: Beginning 
> Test at 2023-01-24 20:37:48,831
> 2023-01-24T20:37:48,833 INFO  [pool-2-thread-1] jdbc.TestDriver: BEGIN MAIN
> 2023-01-24T20:37:48,834 INFO  [pool-9-thread-1] jdbc.TestDriver: Running 
> SessionGroup{name='SG_JZSL3SA0OG', initialDelay=0, repeats=1, repeatDelay=0}
> 2023-01-24T20:37:48,834 INFO  [pool-9-thread-1] jdbc.TestDriver: Connecting 
> as user 'hrt_qa'
> 2023-01-24T20:37:49,173 INFO  [pool-9-thread-1] jdbc.TestDriver: Query: drop 
> table if exists t1_default
> 2023-01-24T20:37:49,237 INFO  [Thread-64] jdbc.TestDriver: INFO  : Compiling 
> command(queryId=hive_20230124203749_09b0f95f-4cf1-4c2c-9f08-1b91fdb4a6ca): 
> drop table if exists t1_default
> 2023-01-24T20:37:49,299 INFO  [Thread-64] jdbc.TestDriver: INFO  : Semantic 
> Analysis Completed (retrial = false)
> 2023-01-24T20:37:49,299 INFO  [Thread-64] jdbc.TestDriver: INFO  : Created 
> Hive schema: Schema(fieldSchemas:null, properties:null)
> 2023-01-24T20:37:49,299 INFO  [Thread-64] jdbc.TestDriver: INFO  : Completed 
> compiling 
> command(queryId=hive_20230124203749_09b0f95f-4cf1-4c2c-9f08-1b91fdb4a6ca); 
> Time taken: 0.031 seconds
> 2023-01-24T20:37:49,299 INFO  [Thread-64] jdbc.TestDriver: INFO  : Executing 
> command(queryId=hive_20230124203749_09b0f95f-4cf1-4c2c-9f08-1b91fdb4a6ca): 
> drop table if exists t1_default
> 2023-01-24T20:37:49,299 INFO  [Thread-64] jdbc.TestDriver: INFO  : Starting 
> task [Stage-0:DDL] in serial mode
> 2023-01-24T20:37:49,299 INFO  [Thread-64] jdbc.TestDriver: INFO  : Completed 
> executing 
> command(queryId=hive_20230124203749_09b0f95f-4cf1-4c2c-9f08-1b91fdb4a6ca); 
> Time taken: 0.012 seconds
> 2023-01-24T20:37:49,299 INFO  [Thread-64] jdbc.TestDriver: INFO  : OK
> 2023-01-24T20:37:49,416 INFO  [pool-9-thread-1] jdbc.TestDriver: No output to 
> verify
> 2023-01-24T20:37:49,416 INFO  [pool-9-thread-1] jdbc.TestDriver: Query: 
> create table t1_default ( t tinyint default 1Y,   si smallint default 1S, 
> i int default 1,b bigint default 1L, f double default 
> double(5.7), d double, s varchar(25) default cast('col1' as 
> varchar(25)), dc decimal(38,18), bo varchar(5), v varchar(25),
>  c char(25) default cast('var1' as char(25)), ts timestamp DEFAULT 
> TIMESTAMP'2016-02-22 12:45:07.0', dt date default 
> cast('2015-03-12' as DATE), tz timestamp with local time zone DEFAULT 
> TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles') STORED AS 
> TEXTFILE
> 2023-01-24T20:37:49,476 INFO  [Thread-65] jdbc.TestDriver: INFO  : Compiling 
> command(queryId=hive_20230124203749_75ffcf31-6bd6-46d7-ba02-f39efb2c4279): 
> create table t1_default ( t tinyint default 1Y,   si smallint default 1S, 
> i int default 1,b bigint default 1L, f double default 
> double(5.7), d double, s 

  1   2   >