[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=770038=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-770038 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 13/May/22 06:44 Start Date: 13/May/22 06:44 Worklog Time Spent: 10m Work Description: lcspinter merged PR #3226: URL: https://github.com/apache/hive/pull/3226 Issue Time Tracking --- Worklog Id: (was: 770038) Time Spent: 4h 20m (was: 4h 10m) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=769026=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-769026 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 11/May/22 12:01 Start Date: 11/May/22 12:01 Worklog Time Spent: 10m Work Description: lcspinter commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r870214863 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -457,9 +459,24 @@ public boolean isValidMetadataTable(String metaTableName) { public URI getURIForAuth(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws URISyntaxException { String dbName = hmsTable.getDbName(); String tableName = hmsTable.getTableName(); -return new URI(ICEBERG_URI_PREFIX + dbName + "/" + tableName); +StringBuilder authURI = new StringBuilder(ICEBERG_URI_PREFIX).append(dbName).append("/").append(tableName) +.append("?snapshot="); +Optional locationProperty = SessionStateUtil.getProperty(conf, hive_metastoreConstants.META_TABLE_LOCATION); +if (locationProperty.isPresent()) { + Preconditions.checkArgument(locationProperty.get() != null, Review Comment: I could remove it, but I think it doesn't hurt if we have an additional check validating the location. Issue Time Tracking --- Worklog Id: (was: 769026) Time Spent: 4h 10m (was: 4h) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=769009=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-769009 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 11/May/22 11:38 Start Date: 11/May/22 11:38 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r870195167 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -457,9 +459,24 @@ public boolean isValidMetadataTable(String metaTableName) { public URI getURIForAuth(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws URISyntaxException { String dbName = hmsTable.getDbName(); String tableName = hmsTable.getTableName(); -return new URI(ICEBERG_URI_PREFIX + dbName + "/" + tableName); +StringBuilder authURI = new StringBuilder(ICEBERG_URI_PREFIX).append(dbName).append("/").append(tableName) +.append("?snapshot="); +Optional locationProperty = SessionStateUtil.getProperty(conf, hive_metastoreConstants.META_TABLE_LOCATION); +if (locationProperty.isPresent()) { + Preconditions.checkArgument(locationProperty.get() != null, Review Comment: Shall we remove it then? Issue Time Tracking --- Worklog Id: (was: 769009) Time Spent: 4h (was: 3h 50m) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=768925=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-768925 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 11/May/22 09:26 Start Date: 11/May/22 09:26 Worklog Time Spent: 10m Work Description: lcspinter commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r870076272 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -457,9 +459,24 @@ public boolean isValidMetadataTable(String metaTableName) { public URI getURIForAuth(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws URISyntaxException { String dbName = hmsTable.getDbName(); String tableName = hmsTable.getTableName(); -return new URI(ICEBERG_URI_PREFIX + dbName + "/" + tableName); +StringBuilder authURI = new StringBuilder(ICEBERG_URI_PREFIX).append(dbName).append("/").append(tableName) +.append("?snapshot="); +Optional locationProperty = SessionStateUtil.getProperty(conf, hive_metastoreConstants.META_TABLE_LOCATION); +if (locationProperty.isPresent()) { + Preconditions.checkArgument(locationProperty.get() != null, Review Comment: I believe this shouldn't happen anymore. Before my fix in `SessionStateUtil#getQueryState()` could have returned null values. Issue Time Tracking --- Worklog Id: (was: 768925) Time Spent: 3h 50m (was: 3h 40m) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=768867=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-768867 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 11/May/22 07:39 Start Date: 11/May/22 07:39 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r869971298 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -457,9 +459,24 @@ public boolean isValidMetadataTable(String metaTableName) { public URI getURIForAuth(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws URISyntaxException { String dbName = hmsTable.getDbName(); String tableName = hmsTable.getTableName(); -return new URI(ICEBERG_URI_PREFIX + dbName + "/" + tableName); +StringBuilder authURI = new StringBuilder(ICEBERG_URI_PREFIX).append(dbName).append("/").append(tableName) +.append("?snapshot="); +Optional locationProperty = SessionStateUtil.getProperty(conf, hive_metastoreConstants.META_TABLE_LOCATION); +if (locationProperty.isPresent()) { + Preconditions.checkArgument(locationProperty.get() != null, + "Table location is not set " + "in SessionState. Authorization URI cannot be supplied."); + // this property is set during the create operation before the hive table was created + // we are returning a dummy iceberg metadata file + authURI.append(URI.create(locationProperty.get()).getPath()).append("/metadata/dummy.metadata.json"); + return new URI(HiveConf.EncoderDecoderFactory.URL_ENCODER_DECODER.encode(authURI.toString())); +} Review Comment: nit: maybe use an `else`? I usually try to avoid using return in the middle of a method and using the return statement in place of else statements. ``` if () { return... } else { return... } ``` For me this is easier to read. What do you think? Issue Time Tracking --- Worklog Id: (was: 768867) Time Spent: 3h 40m (was: 3.5h) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=768865=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-768865 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 11/May/22 07:37 Start Date: 11/May/22 07:37 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r869968879 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -457,9 +459,24 @@ public boolean isValidMetadataTable(String metaTableName) { public URI getURIForAuth(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws URISyntaxException { String dbName = hmsTable.getDbName(); String tableName = hmsTable.getTableName(); -return new URI(ICEBERG_URI_PREFIX + dbName + "/" + tableName); +StringBuilder authURI = new StringBuilder(ICEBERG_URI_PREFIX).append(dbName).append("/").append(tableName) +.append("?snapshot="); +Optional locationProperty = SessionStateUtil.getProperty(conf, hive_metastoreConstants.META_TABLE_LOCATION); +if (locationProperty.isPresent()) { + Preconditions.checkArgument(locationProperty.get() != null, Review Comment: How can this happen? Issue Time Tracking --- Worklog Id: (was: 768865) Time Spent: 3h 20m (was: 3h 10m) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=768866=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-768866 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 11/May/22 07:37 Start Date: 11/May/22 07:37 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r869969179 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -457,9 +459,24 @@ public boolean isValidMetadataTable(String metaTableName) { public URI getURIForAuth(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws URISyntaxException { String dbName = hmsTable.getDbName(); String tableName = hmsTable.getTableName(); -return new URI(ICEBERG_URI_PREFIX + dbName + "/" + tableName); +StringBuilder authURI = new StringBuilder(ICEBERG_URI_PREFIX).append(dbName).append("/").append(tableName) +.append("?snapshot="); +Optional locationProperty = SessionStateUtil.getProperty(conf, hive_metastoreConstants.META_TABLE_LOCATION); +if (locationProperty.isPresent()) { + Preconditions.checkArgument(locationProperty.get() != null, + "Table location is not set " + "in SessionState. Authorization URI cannot be supplied."); + // this property is set during the create operation before the hive table was created + // we are returning a dummy iceberg metadata file + authURI.append(URI.create(locationProperty.get()).getPath()).append("/metadata/dummy.metadata.json"); Review Comment: Maybe log at least on debug level? Issue Time Tracking --- Worklog Id: (was: 768866) Time Spent: 3.5h (was: 3h 20m) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=766012=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-766012 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 04/May/22 14:13 Start Date: 04/May/22 14:13 Worklog Time Spent: 10m Work Description: lcspinter commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r864883395 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -456,9 +458,19 @@ public boolean isValidMetadataTable(String metaTableName) { public URI getURIForAuth(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws URISyntaxException { String dbName = hmsTable.getDbName(); String tableName = hmsTable.getTableName(); -return new URI(ICEBERG_URI_PREFIX + dbName + "/" + tableName); +Optional locationProperty = SessionStateUtil.getProperty(conf, hive_metastoreConstants.META_TABLE_LOCATION); +if (locationProperty.isPresent()) { + // this property is set during the create operation before the hive table was created + // we are returning a dummy iceberg metadata file + return new URI(ICEBERG_URI_PREFIX + locationProperty.get() + "/metadata/dummy.metadata.json"); Review Comment: sure, that can be done Issue Time Tracking --- Worklog Id: (was: 766012) Time Spent: 3h 10m (was: 3h) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=765993=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-765993 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 04/May/22 13:40 Start Date: 04/May/22 13:40 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r864841162 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -456,9 +458,19 @@ public boolean isValidMetadataTable(String metaTableName) { public URI getURIForAuth(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws URISyntaxException { String dbName = hmsTable.getDbName(); String tableName = hmsTable.getTableName(); -return new URI(ICEBERG_URI_PREFIX + dbName + "/" + tableName); +Optional locationProperty = SessionStateUtil.getProperty(conf, hive_metastoreConstants.META_TABLE_LOCATION); +if (locationProperty.isPresent()) { + // this property is set during the create operation before the hive table was created + // we are returning a dummy iceberg metadata file + return new URI(ICEBERG_URI_PREFIX + locationProperty.get() + "/metadata/dummy.metadata.json"); Review Comment: So a `Precondition` check could be ok here? Issue Time Tracking --- Worklog Id: (was: 765993) Time Spent: 3h (was: 2h 50m) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=765991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-765991 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 04/May/22 13:39 Start Date: 04/May/22 13:39 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r864840532 ## iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java: ## @@ -1411,15 +1411,32 @@ public void testCommandsWithPartitionClauseThrow() { @Test public void testAuthzURI() throws TException, InterruptedException, URISyntaxException { TableIdentifier target = TableIdentifier.of("default", "target"); -testTables.createTable(shell, target.name(), HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +Table table = testTables.createTable(shell, target.name(), HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, PartitionSpec.unpartitioned(), FileFormat.PARQUET, ImmutableList.of()); org.apache.hadoop.hive.metastore.api.Table hmsTable = shell.metastore().getTable(target); HiveIcebergStorageHandler storageHandler = new HiveIcebergStorageHandler(); storageHandler.setConf(shell.getHiveConf()); URI uriForAuth = storageHandler.getURIForAuth(hmsTable); -Assert.assertEquals("iceberg://" + hmsTable.getDbName() + "/" + hmsTable.getTableName(), uriForAuth.toString()); +Assert.assertEquals("iceberg:/" + URI.create(((BaseTable) table).operations().current() +.metadataFileLocation()).getPath(), uriForAuth.toString()); + + } + + @Test + public void testAutzURIWithAuthEnabled() throws TException, InterruptedException, URISyntaxException { Review Comment: Can we check which URL is checked for security on table creation? Maybe with a Mock security handler? Issue Time Tracking --- Worklog Id: (was: 765991) Time Spent: 2h 50m (was: 2h 40m) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=765988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-765988 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 04/May/22 13:38 Start Date: 04/May/22 13:38 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r864838785 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -13688,6 +13688,22 @@ ASTNode analyzeCreateTable( // outputs is empty, which means this create table happens in the current // database. rootTasks.add(TaskFactory.get(new DDLWork(getInputs(), getOutputs(), crtTblDesc))); + String tblLocation = null; Review Comment: There is a `Translator` layer in HMS which can change some properties fo the table (not sure which ones), but I remember that this `Translator` was responsible to changing location for the external tables. Does this change able to handle this `Translator`? Issue Time Tracking --- Worklog Id: (was: 765988) Time Spent: 2h 40m (was: 2.5h) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=765986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-765986 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 04/May/22 13:36 Start Date: 04/May/22 13:36 Worklog Time Spent: 10m Work Description: lcspinter commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r864835997 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -13688,6 +13688,22 @@ ASTNode analyzeCreateTable( // outputs is empty, which means this create table happens in the current // database. rootTasks.add(TaskFactory.get(new DDLWork(getInputs(), getOutputs(), crtTblDesc))); + String tblLocation = null; Review Comment: I'm not sure I follow you. Could you please elaborate a bit? Issue Time Tracking --- Worklog Id: (was: 765986) Time Spent: 2.5h (was: 2h 20m) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=765984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-765984 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 04/May/22 13:32 Start Date: 04/May/22 13:32 Worklog Time Spent: 10m Work Description: lcspinter commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r864832558 ## iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java: ## @@ -1411,15 +1411,32 @@ public void testCommandsWithPartitionClauseThrow() { @Test public void testAuthzURI() throws TException, InterruptedException, URISyntaxException { TableIdentifier target = TableIdentifier.of("default", "target"); -testTables.createTable(shell, target.name(), HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +Table table = testTables.createTable(shell, target.name(), HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, PartitionSpec.unpartitioned(), FileFormat.PARQUET, ImmutableList.of()); org.apache.hadoop.hive.metastore.api.Table hmsTable = shell.metastore().getTable(target); HiveIcebergStorageHandler storageHandler = new HiveIcebergStorageHandler(); storageHandler.setConf(shell.getHiveConf()); URI uriForAuth = storageHandler.getURIForAuth(hmsTable); -Assert.assertEquals("iceberg://" + hmsTable.getDbName() + "/" + hmsTable.getTableName(), uriForAuth.toString()); +Assert.assertEquals("iceberg:/" + URI.create(((BaseTable) table).operations().current() +.metadataFileLocation()).getPath(), uriForAuth.toString()); + + } + + @Test + public void testAutzURIWithAuthEnabled() throws TException, InterruptedException, URISyntaxException { Review Comment: I'm not sure how can we test this. We would need to halt the create table operation before the iceberg table is created. Do you have some suggestions? Issue Time Tracking --- Worklog Id: (was: 765984) Time Spent: 2h 20m (was: 2h 10m) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=765983=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-765983 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 04/May/22 13:31 Start Date: 04/May/22 13:31 Worklog Time Spent: 10m Work Description: lcspinter commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r864831363 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -456,9 +458,19 @@ public boolean isValidMetadataTable(String metaTableName) { public URI getURIForAuth(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws URISyntaxException { String dbName = hmsTable.getDbName(); String tableName = hmsTable.getTableName(); -return new URI(ICEBERG_URI_PREFIX + dbName + "/" + tableName); +Optional locationProperty = SessionStateUtil.getProperty(conf, hive_metastoreConstants.META_TABLE_LOCATION); +if (locationProperty.isPresent()) { + // this property is set during the create operation before the hive table was created + // we are returning a dummy iceberg metadata file + return new URI(ICEBERG_URI_PREFIX + locationProperty.get() + "/metadata/dummy.metadata.json"); Review Comment: yes, we are setting it. That is why I added that additional code change to the SemanticAnalyzer Issue Time Tracking --- Worklog Id: (was: 765983) Time Spent: 2h 10m (was: 2h) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=765982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-765982 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 04/May/22 13:30 Start Date: 04/May/22 13:30 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r864830199 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -13688,6 +13688,22 @@ ASTNode analyzeCreateTable( // outputs is empty, which means this create table happens in the current // database. rootTasks.add(TaskFactory.get(new DDLWork(getInputs(), getOutputs(), crtTblDesc))); + String tblLocation = null; Review Comment: What happens with the location if the Transaltion kicks in? Issue Time Tracking --- Worklog Id: (was: 765982) Time Spent: 2h (was: 1h 50m) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=765981=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-765981 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 04/May/22 13:29 Start Date: 04/May/22 13:29 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r864829140 ## iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerNoScan.java: ## @@ -1411,15 +1411,32 @@ public void testCommandsWithPartitionClauseThrow() { @Test public void testAuthzURI() throws TException, InterruptedException, URISyntaxException { TableIdentifier target = TableIdentifier.of("default", "target"); -testTables.createTable(shell, target.name(), HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, +Table table = testTables.createTable(shell, target.name(), HiveIcebergStorageHandlerTestUtils.CUSTOMER_SCHEMA, PartitionSpec.unpartitioned(), FileFormat.PARQUET, ImmutableList.of()); org.apache.hadoop.hive.metastore.api.Table hmsTable = shell.metastore().getTable(target); HiveIcebergStorageHandler storageHandler = new HiveIcebergStorageHandler(); storageHandler.setConf(shell.getHiveConf()); URI uriForAuth = storageHandler.getURIForAuth(hmsTable); -Assert.assertEquals("iceberg://" + hmsTable.getDbName() + "/" + hmsTable.getTableName(), uriForAuth.toString()); +Assert.assertEquals("iceberg:/" + URI.create(((BaseTable) table).operations().current() +.metadataFileLocation()).getPath(), uriForAuth.toString()); + + } + + @Test + public void testAutzURIWithAuthEnabled() throws TException, InterruptedException, URISyntaxException { Review Comment: Do we need a test for the dummy URI? Issue Time Tracking --- Worklog Id: (was: 765981) Time Spent: 1h 50m (was: 1h 40m) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=765980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-765980 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 04/May/22 13:27 Start Date: 04/May/22 13:27 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r864827013 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -456,9 +458,19 @@ public boolean isValidMetadataTable(String metaTableName) { public URI getURIForAuth(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws URISyntaxException { String dbName = hmsTable.getDbName(); String tableName = hmsTable.getTableName(); -return new URI(ICEBERG_URI_PREFIX + dbName + "/" + tableName); +Optional locationProperty = SessionStateUtil.getProperty(conf, hive_metastoreConstants.META_TABLE_LOCATION); +if (locationProperty.isPresent()) { + // this property is set during the create operation before the hive table was created + // we are returning a dummy iceberg metadata file + return new URI(ICEBERG_URI_PREFIX + locationProperty.get() + "/metadata/dummy.metadata.json"); Review Comment: Do we always set the location? Would it be worthwhile to check this? Issue Time Tracking --- Worklog Id: (was: 765980) Time Spent: 1h 40m (was: 1.5h) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=762153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-762153 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 26/Apr/22 07:22 Start Date: 26/Apr/22 07:22 Worklog Time Spent: 10m Work Description: lcspinter commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r858366712 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -450,7 +452,9 @@ public boolean isValidMetadataTable(String metaTableName) { public URI getURIForAuth(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws URISyntaxException { String dbName = hmsTable.getDbName(); String tableName = hmsTable.getTableName(); -return new URI(ICEBERG_URI_PREFIX + dbName + "/" + tableName); +Table table = IcebergTableUtil.getTable(conf, hmsTable); Review Comment: Good point, I will add a change to address this use case as well. Issue Time Tracking --- Worklog Id: (was: 762153) Time Spent: 1.5h (was: 1h 20m) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=762147=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-762147 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 26/Apr/22 07:15 Start Date: 26/Apr/22 07:15 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r858361076 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -450,7 +452,9 @@ public boolean isValidMetadataTable(String metaTableName) { public URI getURIForAuth(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws URISyntaxException { String dbName = hmsTable.getDbName(); String tableName = hmsTable.getTableName(); -return new URI(ICEBERG_URI_PREFIX + dbName + "/" + tableName); +Table table = IcebergTableUtil.getTable(conf, hmsTable); Review Comment: What happens if we provide the `metadata_location` in the table properties when creating the table? Or do we plan to handle this in a different PR? Issue Time Tracking --- Worklog Id: (was: 762147) Time Spent: 1h 20m (was: 1h 10m) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=760720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-760720 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 22/Apr/22 10:04 Start Date: 22/Apr/22 10:04 Worklog Time Spent: 10m Work Description: lcspinter commented on PR #3226: URL: https://github.com/apache/hive/pull/3226#issuecomment-1106322792 PR #3234 will resolve the test failures. Issue Time Tracking --- Worklog Id: (was: 760720) Time Spent: 1h 10m (was: 1h) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=759306=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-759306 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 20/Apr/22 15:18 Start Date: 20/Apr/22 15:18 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r854264528 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -104,7 +106,7 @@ public class HiveIcebergStorageHandler implements HiveStoragePredicateHandler, HiveStorageHandler { private static final Logger LOG = LoggerFactory.getLogger(HiveIcebergStorageHandler.class); - private static final String ICEBERG_URI_PREFIX = "iceberg://"; + private static final String ICEBERG_URI_PREFIX = "iceberg:/"; Review Comment: Was this change necessary? I think it's supposed to work like a 'protocol' that's why there is the double-slash. I think kafka and druid did the same thing (i.e. `druid://` and `kafka://`) Issue Time Tracking --- Worklog Id: (was: 759306) Time Spent: 1h (was: 50m) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=759265=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-759265 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 20/Apr/22 14:29 Start Date: 20/Apr/22 14:29 Worklog Time Spent: 10m Work Description: szlta commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r854208599 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java: ## @@ -43,6 +45,26 @@ private IcebergTableUtil() { } + /** + * Constructs the table properties needed for the Iceberg table loading by retrieving the information from the + * hmsTable. It then calls {@link IcebergTableUtil#getTable(Configuration, Properties)} with these properties. + * @param configuration a Hadoop configuration + * @param hmsTable the HMS table + * @return the Iceberg table + */ + static Table getTable(Configuration configuration, org.apache.hadoop.hive.metastore.api.Table hmsTable) { +Properties properties = new Properties(); +properties.setProperty(Catalogs.NAME, TableIdentifier.of(hmsTable.getDbName(), hmsTable.getTableName()).toString()); +if (hmsTable.getSd() != null) { + properties.setProperty(Catalogs.LOCATION, hmsTable.getSd().getLocation()); +} +if (hmsTable.getParameters().containsKey(InputFormatConfig.CATALOG_NAME)) { + properties.setProperty( + InputFormatConfig.CATALOG_NAME, hmsTable.getParameters().get(InputFormatConfig.CATALOG_NAME)); +} Review Comment: nit: could do with one look-up only by calling get() beforehand, and refactoring the if condition into null check. Issue Time Tracking --- Worklog Id: (was: 759265) Time Spent: 50m (was: 40m) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=759255=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-759255 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 20/Apr/22 14:18 Start Date: 20/Apr/22 14:18 Worklog Time Spent: 10m Work Description: szlta commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r854194435 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -450,7 +452,9 @@ public boolean isValidMetadataTable(String metaTableName) { public URI getURIForAuth(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws URISyntaxException { String dbName = hmsTable.getDbName(); String tableName = hmsTable.getTableName(); -return new URI(ICEBERG_URI_PREFIX + dbName + "/" + tableName); +Table table = IcebergTableUtil.getTable(conf, hmsTable); Review Comment: Giving back `hmsTable.getSd().location() + "/metadata"` seems reasonable to me in such cases Issue Time Tracking --- Worklog Id: (was: 759255) Time Spent: 40m (was: 0.5h) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=759247=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-759247 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 20/Apr/22 13:50 Start Date: 20/Apr/22 13:50 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r854160373 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -450,7 +452,9 @@ public boolean isValidMetadataTable(String metaTableName) { public URI getURIForAuth(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws URISyntaxException { String dbName = hmsTable.getDbName(); String tableName = hmsTable.getTableName(); -return new URI(ICEBERG_URI_PREFIX + dbName + "/" + tableName); +Table table = IcebergTableUtil.getTable(conf, hmsTable); Review Comment: We ran into problems with the approach of loading the Iceberg table here before. The problem is that this method can be called to authorize CREATE TABLE commands as well, at which point the iceberg table does not exist yet, so this will lead to NPE. In that case, when the table object is null, then maybe we fall back to using the `hmsTable.getSd().location() + "/metadata"`? I'm not sure though, just thinking out loud Issue Time Tracking --- Worklog Id: (was: 759247) Time Spent: 0.5h (was: 20m) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=759246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-759246 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 20/Apr/22 13:50 Start Date: 20/Apr/22 13:50 Worklog Time Spent: 10m Work Description: marton-bod commented on code in PR #3226: URL: https://github.com/apache/hive/pull/3226#discussion_r854160373 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -450,7 +452,9 @@ public boolean isValidMetadataTable(String metaTableName) { public URI getURIForAuth(org.apache.hadoop.hive.metastore.api.Table hmsTable) throws URISyntaxException { String dbName = hmsTable.getDbName(); String tableName = hmsTable.getTableName(); -return new URI(ICEBERG_URI_PREFIX + dbName + "/" + tableName); +Table table = IcebergTableUtil.getTable(conf, hmsTable); Review Comment: We ran into problems with the approach of loading the Iceberg table here before. The problem is that this method can be called to authorize CREATE TABLE commands as well, at which point the iceberg table does not exist yet, so this will lead to NPE. If the table object is null, then maybe we can use the hmsTable.getSd().location() + "/metadata"? I'm not sure though, just thinking out loud Issue Time Tracking --- Worklog Id: (was: 759246) Time Spent: 20m (was: 10m) > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26157) Change Iceberg storage handler authz URI to metadata location
[ https://issues.apache.org/jira/browse/HIVE-26157?focusedWorklogId=759231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-759231 ] ASF GitHub Bot logged work on HIVE-26157: - Author: ASF GitHub Bot Created on: 20/Apr/22 13:30 Start Date: 20/Apr/22 13:30 Worklog Time Spent: 10m Work Description: lcspinter opened a new pull request, #3226: URL: https://github.com/apache/hive/pull/3226 ### What changes were proposed in this pull request? Change Iceberg storage handler autz URI from `iceberg://dbName/tableName` format to `iceberg://metadataLocation` ### Why are the changes needed? It is possible to set the metadata pointers of table A to point to table B, and therefore you could read table B's data via querying table A. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual test, unit test Issue Time Tracking --- Worklog Id: (was: 759231) Remaining Estimate: 0h Time Spent: 10m > Change Iceberg storage handler authz URI to metadata location > - > > Key: HIVE-26157 > URL: https://issues.apache.org/jira/browse/HIVE-26157 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In HIVE-25964, the authz URI has been changed to "iceberg://db.table". > It is possible to set the metadata pointers of table A to point to table B, > and therefore you could read table B's data via querying table A. > {code:sql} > alter table A set tblproperties > ('metadata_location'='/path/to/B/snapshot.json', > 'previous_metadata_location'='/path/to/B/prev_snapshot.json'); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)