[jira] [Created] (HIVE-27355) Iceberg: Create table can be slow due to file listing for stats

2023-05-17 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-27355:
---

 Summary: Iceberg: Create table can be slow due to file listing for 
stats
 Key: HIVE-27355
 URL: https://issues.apache.org/jira/browse/HIVE-27355
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: Rajesh Balamohan


Stacktrace can be different for hive master branch. But issue is, stats need 
not be populated for iceberg tables and currently it is doing recursive calls 
causing delays during table creation (e.g CTAS).

 
{noformat}
at 
org.apache.hadoop.hive.common.FileUtils.listStatusRecursively(FileUtils.java:329)
at 
org.apache.hadoop.hive.common.FileUtils.listStatusRecursively(FileUtils.java:330)
at 
org.apache.hadoop.hive.common.FileUtils.listStatusRecursively(FileUtils.java:330)
at 
org.apache.hadoop.hive.common.HiveStatsUtils.getFileStatusRecurse(HiveStatsUtils.java:61)
at 
org.apache.hadoop.hive.metastore.Warehouse.getFileStatusesForUnpartitionedTable(Warehouse.java:581)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.updateTableStatsFast(MetaStoreUtils.java:201)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.updateTableStatsFast(MetaStoreUtils.java:194)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1445)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1502)
at sun.reflect.GeneratedMethodAccessor118.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy49.create_table_with_environment_context(Unknown 
Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2419)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:755)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:743)
at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
 {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27354) Iceberg listing all files during commit can cause delays in large tables

2023-05-17 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-27354:
---

 Summary: Iceberg listing all files during commit can cause delays 
in large tables
 Key: HIVE-27354
 URL: https://issues.apache.org/jira/browse/HIVE-27354
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


When committing table with create table, iceberg invokes HMS APIs. This 
internally lists all files in the folder for updating stats. This is not needed 
for iceberg tables.

Following is the stacktrace for later reference.

{noformat}
 at 
org.apache.hadoop.hive.common.FileUtils.listStatusRecursively(FileUtils.java:329)
at 
org.apache.hadoop.hive.common.FileUtils.listStatusRecursively(FileUtils.java:330)
at 
org.apache.hadoop.hive.common.FileUtils.listStatusRecursively(FileUtils.java:330)
at 
org.apache.hadoop.hive.common.HiveStatsUtils.getFileStatusRecurse(HiveStatsUtils.java:61)
at 
org.apache.hadoop.hive.metastore.Warehouse.getFileStatusesForUnpartitionedTable(Warehouse.java:581)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.updateTableStatsFast(MetaStoreUtils.java:201)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.updateTableStatsFast(MetaStoreUtils.java:194)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1445)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1502)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy69.create_table_with_environment_context(Unknown 
Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2419)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:755)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:743)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) 
...
...
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
at com.sun.proxy.$Proxy70.createTable(Unknown Source)
at 
org.apache.iceberg.hive.HiveTableOperations.lambda$persistTable$4(HiveTableOperations.java:405)
at 
org.apache.iceberg.hive.HiveTableOperations$$Lambda$4533/374509974.run(Unknown 
Source)
at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:58)
at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51)
at 
org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:82)
at 
org.apache.iceberg.hive.HiveTableOperations.persistTable(HiveTableOperations.java:403)
at 
org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:327)
at 
org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:135)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-14907) Hive Metastore should use repeatable-read consistency level

2023-05-17 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-14907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723749#comment-17723749
 ] 

Michael Smith commented on HIVE-14907:
--

The current version of DataNucleus fails if repeatable-read is configured while 
using Oracle database. It should fall back to serializable, except DataNucleus 
incorrectly expects Oracle to support repeatable-read.

> Hive Metastore should use repeatable-read consistency level
> ---
>
> Key: HIVE-14907
> URL: https://issues.apache.org/jira/browse/HIVE-14907
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Lenni Kuff
>Priority: Major
>
> Currently HMS uses the "read-committed" consistency level which is the 
> default for DataNucleus. This could cause potential problems since the state 
> visible to each transaction can actually see updates from other transactions, 
>  so it is very difficult to reason about any code that reads multiple pieces 
> of data.
> Instead it should use "repeatable-read" consistency which guarantees that any 
> transaction only sees the state at the beginning of a transaction plus any 
> updates done within a transaction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27317) Temporary (local) session files cleanup improvements

2023-05-17 Thread Sercan Tekin (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sercan Tekin updated HIVE-27317:

Fix Version/s: 4.0.0

> Temporary (local) session files cleanup improvements
> 
>
> Key: HIVE-27317
> URL: https://issues.apache.org/jira/browse/HIVE-27317
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.1.3
>Reporter: Sercan Tekin
>Assignee: Sercan Tekin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-27317.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When Hive session is killed, no chance for shutdown hook to clean-up tmp 
> files.
> There is a Hive service to clean residual files 
> https://issues.apache.org/jira/browse/HIVE-13429, and later on its execution 
> is scheduled inside HS2 https://issues.apache.org/jira/browse/HIVE-15068 to 
> make sure not to leave any temp file behind. But this service cleans up only 
> HDFS temp files, there are still residual files/dirs in 
> *HiveConf.ConfVars.LOCALSCRATCHDIR* location as follows;
> {code:java}
> > ll /tmp/user/97c4ef50-5e80-480e-a6f0-4f779050852b*
> drwx-- 2 user user 4096 Oct 29 10:09 97c4ef50-5e80-480e-a6f0-4f779050852b
> -rw--- 1 user user    0 Oct 29 10:09 
> 97c4ef50-5e80-480e-a6f0-4f779050852b10571819313894728966.pipeout
> -rw--- 1 user user    0 Oct 29 10:09 
> 97c4ef50-5e80-480e-a6f0-4f779050852b16013956055489853961.pipeout
> -rw--- 1 user user    0 Oct 29 10:09 
> 97c4ef50-5e80-480e-a6f0-4f779050852b4383913570068173450.pipeout
> -rw--- 1 user user    0 Oct 29 10:09 
> 97c4ef50-5e80-480e-a6f0-4f779050852b889740171428672108.pipeout {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27322) Iceberg: metadata location overrides can cause data breach

2023-05-17 Thread Janos Kovacs (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723728#comment-17723728
 ] 

Janos Kovacs edited comment on HIVE-27322 at 5/17/23 9:04 PM:
--

Found that even with metadata_location set, the authorizer gets wrong location:
{noformat}
2023-05-17 19:38:51,867 INFO  org.apache.hadoop.hive.ql.Driver: 
[a49356b4-1b7a-4c9d-9b70-81af12c0465f HiveServer2-Handler-Pool: Thread-253]: 
Compiling 
command(queryId=hive_20230517193851_8b9f0ad7-2ae1-4078-b76a-e51c31321b0b): 
CREATE EXTERNAL TABLE default.policytestth (txt string, secret string) STORED 
BY ICEBERG 
TBLPROPERTIES (
  
'metadata_location'='hdfs://test.local.host:8020/warehouse/tablespace/external/hive/policytest/metadata/1-a3e46c1b-318b-4b46-886a-c6ea591f63c1.metadata.json')
...
2023-05-17 19:38:51,898 DEBUG 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler: 
[a49356b4-1b7a-4c9d-9b70-81af12c0465f HiveServer2-Handler-Pool: Thread-253]: 
Iceberg storage handler authorization URI 
iceberg://default/policytestth?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Fpolicytestth%2Fmetadata%2Fdummy.metadata.json
{noformat}
 


was (Author: kovjanos):
Found that even with metadata_location set, the authorizer gets wrong location:
{noformat}
2023-05-17 19:38:51,867 INFO  org.apache.hadoop.hive.ql.Driver: 
[a49356b4-1b7a-4c9d-9b70-81af12c0465f HiveServer2-Handler-Pool: Thread-253]: 
Compiling 
command(queryId=hive_20230517193851_8b9f0ad7-2ae1-4078-b76a-e51c31321b0b): 
CREATE EXTERNAL TABLE default.policytestth (txt string, secret string) STORED 
BY ICEBERG 
TBLPROPERTIES (
  
'metadata_location'='hdfs://test.local.host:8020/warehouse/tablespace/external/hive/policytest/metadata/1-a3e46c1b-318b-4b46-886a-c6ea591f63c1.metadata.json')
...
2023-05-17 19:38:51,898 DEBUG 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler: 
[a49356b4-1b7a-4c9d-9b70-81af12c0465f HiveServer2-Handler-Pool: Thread-253]: 
Iceberg storage handler authorization URI 
iceberg://default/policytestth?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Fpolicytestth%2Fmetadata%2Fdummy.metadata.json
{noformat}
 

> Iceberg: metadata location overrides can cause data breach
> --
>
> Key: HIVE-27322
> URL: https://issues.apache.org/jira/browse/HIVE-27322
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-alpha-2
>Reporter: Janos Kovacs
>Priority: Blocker
>
> Set to bug/blocker instead of enhancement due to its security related nature, 
> Hive4 should not be released w/o fix for this. Please reset if needed.
>  
> Context: 
>  * There are some core tables with sensitive data that users can only query 
> with data masking enforced (e.g. via Ranger). Let's assume this is the 
> `default.icebergsecured` table.
>  * An end-user can only access the masked form of the sensitive data as 
> expected...
>  * The users also have privilege to create new tables in their own sandbox 
> databases - let's assume this is the `default.trojanhorse` table for now.
>  * The user can create a malicious table that exposes the sensitive data 
> non-masked leading to a possible data breach.
>  * Hive runs with doAs=false to be able to enforce FGAC and prevent end-user 
> direct file-system access needs
> Repro:
>  * First make sure the data is secured by the masking policy:
> {noformat}
> 
> beeline -e "
> DROP TABLE IF EXISTS default.icebergsecured PURGE;
> CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string) 
> STORED BY ICEBERG;
> INSERT INTO default.icebergsecured VALUES ('You might be allowed to see 
> this.','You are NOT allowed to see this!');
> "
> 
> beeline -e "
> SELECT * FROM default.icebergsecured;
> "
> +++
> | icebergsecured.txt | icebergsecured.secret  |
> +++
> | You might be allowed to see this.  | MASKED BY RANGER FOR SECURITY  |
> +++
> {noformat}
>  * Now let the user to create the malicious table exposing the sensitive data:
> {noformat}
> 
> SECURED_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
>  beeline -e "DESCRIBE FORMATTED default.icebergsecured;" 2>/dev/null |grep 
> metadata_location  |grep -v previous_metadata_location | awk '{print $5}')
> beeline -e "
> DROP TABLE IF EXISTS default.trojanhorse;
> CREATE EXTERNAL TABLE default.trojanhorse (txt string, secret string) STORED 
> BY ICEBERG
> TBLPROPERTIES (
>   'metadata_location'='${SECURED_META_LOCATION}');
> SELECT * FROM default.trojanhorse;
> "
> ++---+
> | 

[jira] [Commented] (HIVE-27322) Iceberg: metadata location overrides can cause data breach

2023-05-17 Thread Janos Kovacs (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723728#comment-17723728
 ] 

Janos Kovacs commented on HIVE-27322:
-

Found that even with metadata_location set, the authorizer gets wrong location:
{noformat}
2023-05-17 19:38:51,867 INFO  org.apache.hadoop.hive.ql.Driver: 
[a49356b4-1b7a-4c9d-9b70-81af12c0465f HiveServer2-Handler-Pool: Thread-253]: 
Compiling 
command(queryId=hive_20230517193851_8b9f0ad7-2ae1-4078-b76a-e51c31321b0b): 
CREATE EXTERNAL TABLE default.policytestth (txt string, secret string) STORED 
BY ICEBERG 
TBLPROPERTIES (
  
'metadata_location'='hdfs://test.local.host:8020/warehouse/tablespace/external/hive/policytest/metadata/1-a3e46c1b-318b-4b46-886a-c6ea591f63c1.metadata.json')
...
2023-05-17 19:38:51,898 DEBUG 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler: 
[a49356b4-1b7a-4c9d-9b70-81af12c0465f HiveServer2-Handler-Pool: Thread-253]: 
Iceberg storage handler authorization URI 
iceberg://default/policytestth?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Fpolicytestth%2Fmetadata%2Fdummy.metadata.json
{noformat}
 

> Iceberg: metadata location overrides can cause data breach
> --
>
> Key: HIVE-27322
> URL: https://issues.apache.org/jira/browse/HIVE-27322
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-alpha-2
>Reporter: Janos Kovacs
>Priority: Blocker
>
> Set to bug/blocker instead of enhancement due to its security related nature, 
> Hive4 should not be released w/o fix for this. Please reset if needed.
>  
> Context: 
>  * There are some core tables with sensitive data that users can only query 
> with data masking enforced (e.g. via Ranger). Let's assume this is the 
> `default.icebergsecured` table.
>  * An end-user can only access the masked form of the sensitive data as 
> expected...
>  * The users also have privilege to create new tables in their own sandbox 
> databases - let's assume this is the `default.trojanhorse` table for now.
>  * The user can create a malicious table that exposes the sensitive data 
> non-masked leading to a possible data breach.
>  * Hive runs with doAs=false to be able to enforce FGAC and prevent end-user 
> direct file-system access needs
> Repro:
>  * First make sure the data is secured by the masking policy:
> {noformat}
> 
> beeline -e "
> DROP TABLE IF EXISTS default.icebergsecured PURGE;
> CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string) 
> STORED BY ICEBERG;
> INSERT INTO default.icebergsecured VALUES ('You might be allowed to see 
> this.','You are NOT allowed to see this!');
> "
> 
> beeline -e "
> SELECT * FROM default.icebergsecured;
> "
> +++
> | icebergsecured.txt | icebergsecured.secret  |
> +++
> | You might be allowed to see this.  | MASKED BY RANGER FOR SECURITY  |
> +++
> {noformat}
>  * Now let the user to create the malicious table exposing the sensitive data:
> {noformat}
> 
> SECURED_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
>  beeline -e "DESCRIBE FORMATTED default.icebergsecured;" 2>/dev/null |grep 
> metadata_location  |grep -v previous_metadata_location | awk '{print $5}')
> beeline -e "
> DROP TABLE IF EXISTS default.trojanhorse;
> CREATE EXTERNAL TABLE default.trojanhorse (txt string, secret string) STORED 
> BY ICEBERG
> TBLPROPERTIES (
>   'metadata_location'='${SECURED_META_LOCATION}');
> SELECT * FROM default.trojanhorse;
> "
> ++---+
> |  trojanhorse.txt   |trojanhorse.secret |
> ++---+
> | You might be allowed to see this.  | You are not allowed to see this!  |
> ++---+
> {noformat}
>  
> Currently - after HIVE-26707 - the rwstorage authorization only has either 
> the dummy path or the explicit path set for uri:  
> {noformat}
> Permission denied: user [oozie] does not have [RWSTORAGE] privilege on 
> [iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ftrojanhorse%2Fmetadata%2Fdummy.metadata.json]
> Permission denied: user [oozie] does not have [RWSTORAGE] privilege on 
> [iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ficebergsecured%2Fmetadata%2F1-f4c2a428-30ce-4afd-82ff-d46ecbf02244.metadata.json]
>  
> {noformat}
> This is can be used only to decide whether a user is allowed to create 
> iceberg tables 

[jira] [Resolved] (HIVE-27308) Exposing client keystore and truststore passwords in the JDBC URL can be a security concern

2023-05-17 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala resolved HIVE-27308.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

> Exposing client keystore and truststore passwords in the JDBC URL can be a 
> security concern
> ---
>
> Key: HIVE-27308
> URL: https://issues.apache.org/jira/browse/HIVE-27308
> Project: Hive
>  Issue Type: Improvement
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> At present, we may have the following keystore and truststore passwords in 
> the JDBC URL.
>  # trustStorePassword
>  # keyStorePassword
>  # zooKeeperTruststorePassword
>  # zooKeeperKeystorePassword
> Exposing these passwords in URL can be a security concern. Can hide all these 
> passwords from JDBC URL when we protect these passwords in a local JCEKS 
> keystore file and pass the JCEKS file to URL instead.
> 1. Leverage the hadoop credential provider 
> [Link|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html#Overview]
>  Create aliases for these passwords in a local JCE keystore like below. Store 
> all the passwords in the same JCEKS files.
> {{hadoop credential create *keyStorePassword* -value 
> FDUxmzTxW15xWoaCk6GxLlaoHjnjV9H7iHqCIDxTwoq -provider 
> localjceks://file/tmp/store/client_creds.jceks}}
> 2. Add a new option *storePasswordPath* to JDBC URL that point to the local 
> JCE keystore file storing the password aliases. When the existing password 
> option is present in URL, can ignore to fetch that particular alias from 
> local jceks(i.e., giving preference to existing password option). And if 
> password option is not present in URL, can fetch the password from local 
> jceks.
> JDBC URL may look like: 
> {{beeline -u 
> "jdbc:hive2://kvr-host:10001/default;retries=5;ssl=true;sslTrustStore=/tmp/truststore.jks;transportMode=http;httpPath=cliservice;twoWay=true;sslKeyStore=/tmp/keystore.jks;{*}storePasswordPath=localjceks://file/tmp/client_creds.jceks;{*}"}}
> 3. Hive JDBC can fetch the passwords with 
> [Configuration.getPassword|https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/conf/Configuration.html#getPassword-java.lang.String-]
>  API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27308) Exposing client keystore and truststore passwords in the JDBC URL can be a security concern

2023-05-17 Thread Sai Hemanth Gantasala (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723538#comment-17723538
 ] 

Sai Hemanth Gantasala commented on HIVE-27308:
--

[~VenuReddy] - Thanks for your contribution. The patch has been merged into the 
master branch.

> Exposing client keystore and truststore passwords in the JDBC URL can be a 
> security concern
> ---
>
> Key: HIVE-27308
> URL: https://issues.apache.org/jira/browse/HIVE-27308
> Project: Hive
>  Issue Type: Improvement
>Reporter: Venugopal Reddy K
>Assignee: Venugopal Reddy K
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> At present, we may have the following keystore and truststore passwords in 
> the JDBC URL.
>  # trustStorePassword
>  # keyStorePassword
>  # zooKeeperTruststorePassword
>  # zooKeeperKeystorePassword
> Exposing these passwords in URL can be a security concern. Can hide all these 
> passwords from JDBC URL when we protect these passwords in a local JCEKS 
> keystore file and pass the JCEKS file to URL instead.
> 1. Leverage the hadoop credential provider 
> [Link|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html#Overview]
>  Create aliases for these passwords in a local JCE keystore like below. Store 
> all the passwords in the same JCEKS files.
> {{hadoop credential create *keyStorePassword* -value 
> FDUxmzTxW15xWoaCk6GxLlaoHjnjV9H7iHqCIDxTwoq -provider 
> localjceks://file/tmp/store/client_creds.jceks}}
> 2. Add a new option *storePasswordPath* to JDBC URL that point to the local 
> JCE keystore file storing the password aliases. When the existing password 
> option is present in URL, can ignore to fetch that particular alias from 
> local jceks(i.e., giving preference to existing password option). And if 
> password option is not present in URL, can fetch the password from local 
> jceks.
> JDBC URL may look like: 
> {{beeline -u 
> "jdbc:hive2://kvr-host:10001/default;retries=5;ssl=true;sslTrustStore=/tmp/truststore.jks;transportMode=http;httpPath=cliservice;twoWay=true;sslKeyStore=/tmp/keystore.jks;{*}storePasswordPath=localjceks://file/tmp/client_creds.jceks;{*}"}}
> 3. Hive JDBC can fetch the passwords with 
> [Configuration.getPassword|https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/conf/Configuration.html#getPassword-java.lang.String-]
>  API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27326) Hive Authorizer not receiving resource information for few alter queries causing authorization check to fail

2023-05-17 Thread Riju Trivedi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi reassigned HIVE-27326:
---

Assignee: Riju Trivedi

> Hive Authorizer not receiving resource information for few alter queries 
> causing authorization check to fail
> 
>
> Key: HIVE-27326
> URL: https://issues.apache.org/jira/browse/HIVE-27326
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 3.1.2
>Reporter: Jai Patel
>Assignee: Riju Trivedi
>Priority: Major
>
> We have a Ranger plugin implemented for HiveService which uses the hook 
> provided by the HiveService i.e. the "{*}checkPrivileges{*}" method in 
> "org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAuthorizer.java" 
> - 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveAuthorizer.java#L163|http://example.com/].
> We do authorization based on the information provided in the *inputObjs* and 
> *outputObjs* parameters. 
> This *works fine* for the normal alter query like -
> {code:java}
> ALTER TABLE hr ADD COLUMNS (country VARCHAR(255)){code}
> Logs -
> {code:java}
> 2023-05-08T14:31:40,505 DEBUG [c85f84fd-85d6-4e1a-ae72-ea07323e1a93 
> HiveServer2-Handler-Pool: Thread-90] 
> ranger.authorization.hive.authorizer.RangerHiveAuthorizer: 
> 'checkPrivileges':{'hiveOpType':ALTERTABLE_ADDCOLS, 
> 'inputHObjs':['HivePrivilegeObject':{'type':TABLE_OR_VIEW, 'dbName':test, 
> 'objectType':TABLE_OR_VIEW, 'objectName':hr, 'columns':[], 'partKeys':[], 
> 'commandParams':[], 'actionType':OTHER}], 
> 'outputHObjs':['HivePrivilegeObject':{'type':TABLE_OR_VIEW, 'dbName':test, 
> 'objectType':TABLE_OR_VIEW, 'objectName':hr, 'columns':[], 'partKeys':[], 
> 'commandParams':[], 'actionType':OTHER}], 
> 'context':{'clientType':HIVESERVER2, 'commandString':ALTER TABLE hr ADD 
> COLUMNS (country VARCHAR(255)), 'ipAddress':172.18.0.1, 
> 'forwardedAddresses':null, 
> 'sessionString':c85f84fd-85d6-4e1a-ae72-ea07323e1a93}, 'user':root, 
> 'groups':[root]}
> {code}
>  
> {color:#ff}*But for below alter queries, we are not getting the db and 
> table information -* 
> {color}Query 1 -
> {code:java}
> ALTER TABLE hr ADD CONSTRAINT unique_key_const UNIQUE (c0) DISABLE 
> NOVALIDATE;{code}
> LOGS -
> {code:java}
> 2023-05-08T12:14:22,502 DEBUG [c0c66e4e-3014-4258-8e1a-7b689c2fbe6d 
> HiveServer2-Handler-Pool: Thread-90] 
> ranger.authorization.hive.authorizer.RangerHiveAuthorizer: 
> 'checkPrivileges':{'hiveOpType':ALTERTABLE_ADDCONSTRAINT, 'inputHObjs':[], 
> 'outputHObjs':[], 'context':{'clientType':HIVESERVER2, 'commandString':ALTER 
> TABLE hr ADD CONSTRAINT unique_key_const1 UNIQUE (c0) DISABLE NOVALIDATE, 
> 'ipAddress':172.18.0.1, 'forwardedAddresses':null, 'sessionString':c0c66{code}
> Query 2 -
> {code:java}
> ALTER TABLE temp PARTITION (c1=1) COMPACT 'minor';{code}
> Logs -
> {code:java}
> 2023-05-08T12:16:30,595 DEBUG [c0c66e4e-3014-4258-8e1a-7b689c2fbe6d 
> HiveServer2-Handler-Pool: Thread-90] 
> ranger.authorization.hive.authorizer.RangerHiveAuthorizer: 
> 'checkPrivileges':{'hiveOpType':ALTERTABLE_COMPACT, 'inputHObjs':[], 
> 'outputHObjs':[], 'context':
> {'clientType':HIVESERVER2, 'commandString':ALTER TABLE temp PARTITION (c1=1) 
> COMPACT 'minor', 'ipAddress':172.18.0.1, 'forwardedAddresses':null, 
> 'sessionString':c0c66e4e-3014-4258-8e1a-7b689c2fbe6d}
> , 'user':root, 'groups':[root]}
> {code}
>  
>  
> As you can see in the logs, we are getting empty inputHObjs and outputObjs in 
> case of Alter Table Add Constraint and Partition. This is not the case for 
> ALTER TABLE ADD COLUMNS and hence it works fine in that case.
> Can we fix this so as to provide proper authorization on these queries?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27353) Show saved snapshot of materialized view source tables

2023-05-17 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-27353:
-

 Summary: Show saved snapshot of materialized view source tables
 Key: HIVE-27353
 URL: https://issues.apache.org/jira/browse/HIVE-27353
 Project: Hive
  Issue Type: Improvement
  Components: Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


HIVE-25745 introduced a new section into
{code:java}
DESCRIBE FORMATTED ;
{code}
output:
{code:java}
# Materialized View Source table information 
Table name  I/U/D since last rebuild 
hive.default.src_txn0/0/0
hive.default.src_txn_2  0/0/0
{code}
Unfortunately transactional stats are not reliable because such stats are 
supposed to be saved along with basic stats.
If something blocks saving the stats like
{code:java}
set hive.stats.autogather=false;
{code}
basic stats still can be refreshed using
{code:java}
analyze table  compute statistics;
{code}
but it won't collect and update transactional since the amount of rows affected 
by recent transactions are no longer available and can not be calculated.

 

The goal of this jira is to print the saved snapshot information of each source 
table instead:
* writeId in case of native acid tables
* snapshotId in case of iceberg tables
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27351) Hive metastore metrics is not getting initialised even if metrics is enabled

2023-05-17 Thread Mohammad Arshad (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723444#comment-17723444
 ] 

Mohammad Arshad commented on HIVE-27351:


HIVE-19989 fixed this issue in master branch

> Hive metastore metrics is not getting initialised even if metrics is enabled
> 
>
> Key: HIVE-27351
> URL: https://issues.apache.org/jira/browse/HIVE-27351
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
>
> Configured following two properties in hivemetastore-site.xml
> {noformat}
>  
>  hive.metastore.metrics.enabled
>  true
> 
> 
>  hive.service.metrics.hadoop2.component
>  hivemetastore
> 
> {noformat}
> hadoop-metrics2-hivemetastore.properties is available in configs but 
> hivemetastore is not not initialised and not emitting the metrics



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27350) Unable to Create Table with 380+ Columns

2023-05-17 Thread Karthik (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik resolved HIVE-27350.

Target Version/s: 3.1.2
Assignee: Karthik
  Resolution: Fixed

This is not an Issue and could be ignored. The reason of the exception is due 
to the duplicate column available in the Storage Descriptor. 
This we were able to find only when we upgraded to 4.0.0-alpha-2 and exception 
was clear since column_name is unique in COLUMNS_V2 table

> Unable to Create Table with 380+ Columns
> 
>
> Key: HIVE-27350
> URL: https://issues.apache.org/jira/browse/HIVE-27350
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.1.2
> Environment: Production
>Reporter: Karthik
>Assignee: Karthik
>Priority: Blocker
>
> When we try to Iceberg table via Hive MetaStore with 388 Columns we get below 
> exception:
> {code:java}
> org.apache.hadoop.hive.metastore.api.MetaException: Add request failed : 
> INSERT INTO `COLUMNS_V2` 
> (`CD_ID`,`COMMENT`,`COLUMN_NAME`,`TYPE_NAME`,`INTEGER_IDX`) VALUES 
> (?,?,?,?,?)  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:54908)
>  ~[hive-standalone-metastore-3.1.3.jar:3.1.3]   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:54876)
>  ~[hive-standalone-metastore-3.1.3.jar:3.1.3]   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result.read(ThriftHiveMetastore.java:54802)
>  ~[hive-standalone-metastore-3.1.3.jar:3.1.3]  at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) 
> ~[libthrift-0.9.3.jar:0.9.3]at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1556)
>  ~[hive-standalone-metastore-3.1.3.jar:3.1.3]   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1542)
>  ~[hive-standalone-metastore-3.1.3.jar:3.1.3]at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2867)
>  ~[hive-standalone-metastore-3.1.3.jar:3.1.3]   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:837)
>  ~[hive-standalone-metastore-3.1.3.jar:3.1.3]  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:822)
>  ~[hive-standalone-metastore-3.1.3.jar:3.1.3]  at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]  
> at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  ~[?:?]at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:?]at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?] 
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:208)
>  ~[hive-standalone-metastore-3.1.3.jar:3.1.3]   at 
> jdk.proxy2.$Proxy20.createTable(Unknown Source) ~[?:?]   at 
> org.apache.iceberg.hive.HiveTableOperations.lambda$persistTable$4(HiveTableOperations.java:405)
>  ~[iceberg-spark-runtime-3.2_2.12-1.1.0.jar:?]at 
> org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:58) 
> ~[iceberg-spark-runtime-3.2_2.12-1.1.0.jar:?]  at 
> org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51) 
> ~[iceberg-spark-runtime-3.2_2.12-1.1.0.jar:?]  at 
> org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:82) 
> ~[iceberg-spark-runtime-3.2_2.12-1.1.0.jar:?] at 
> org.apache.iceberg.hive.HiveTableOperations.persistTable(HiveTableOperations.java:403)
>  ~[iceberg-spark-runtime-3.2_2.12-1.1.0.jar:?] at 
> org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:327)
>  ~[iceberg-spark-runtime-3.2_2.12-1.1.0.jar:?] at 
> org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:135)
>  ~[iceberg-spark-runtime-3.2_2.12-1.1.0.jar:?]  at 
> org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.create(BaseMetastoreCatalog.java:196)
>  ~[iceberg-spark-runtime-3.2_2.12-1.1.0.jar:?]
> {code}
> When we reduce columns size 380, Table is created. Our metastore DB is 
> Postgres.
> Validated SD_PARAMS and SERDE_PARAMS table, both table has PARAM_VALUE as 
> TEXT datatype only.
> This is specific to COLUMNS_V2 table. Changed TYPE_NAME to TEXT type as 

[jira] [Commented] (HIVE-27329) Document usage of the image

2023-05-17 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723398#comment-17723398
 ] 

Zhihua Deng commented on HIVE-27329:


Thank you for the contribution, [~simhadri-g]!

> Document usage of the image
> ---
>
> Key: HIVE-27329
> URL: https://issues.apache.org/jira/browse/HIVE-27329
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zhihua Deng
>Assignee: Simhadri Govindappa
>Priority: Major
> Fix For: 4.0.0
>
>
> After we pushed the image to docker hub, it would be good to update 
> https://cwiki.apache.org/confluence/display/Hive/GettingStarted for using the 
> image.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27330) Compaction entry dequeue order

2023-05-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Végh resolved HIVE-27330.

Resolution: Fixed

Merged to master, [~kokila19] , [~kkasa] , [~InvisibleProgrammer] , Attila 
Turoczy thanks for the review!

> Compaction entry dequeue order
> --
>
> Key: HIVE-27330
> URL: https://issues.apache.org/jira/browse/HIVE-27330
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Compaction entries currently dequeued in an unordered way which makes the 
> actual ordering DB dependent. The dequeue should be done in an ordered way: 
> ascending by the compaction id.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27352) Support both LDAP and Kerberos Auth in HS2

2023-05-17 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-27352:
--

 Summary: Support both LDAP and Kerberos Auth in HS2
 Key: HIVE-27352
 URL: https://issues.apache.org/jira/browse/HIVE-27352
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Zhihua Deng
Assignee: Zhihua Deng


Currently, HS2 supports a single form of auth in binary mode, and some limited 
multiple auth methods in http mode. Some analysis tools based on Hive JDBC 
provide support for mixing both Kerberos and LDAP, however HS2 doesn't make 
itself heard for this auth type, both in http and binary mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)