1fanwang opened a new pull request, #6454:
URL: https://github.com/apache/hive/pull/6454

   ### What changes were proposed in this pull request?
   
   Add `HiveCatalog.EXTERNAL_WAREHOUSE_LOCATION` ("external-warehouse") as a 
canonical catalog property and propagate it from `HiveCatalog.initialize(name, 
properties)` to `hive.metastore.warehouse.external.dir`, mirroring the existing 
handling for `CatalogProperties.WAREHOUSE_LOCATION` (including 
`LocationUtil.stripTrailingSlash`).
   
   The two in-tree callers are unified onto the new constant:
   - `HMSCatalogFactory#createCatalog` drops its redundant 
`configuration.set(...)` workaround that was needed because the property was 
not being read from the properties map.
   - `IcebergSummaryHandler#initialize` switches from the unhyphenated 
`"externalwarehouse"` key (which was effectively dead code — 
`HiveCatalog.initialize` never read it) to the canonical name.
   
   ### Why are the changes needed?
   
   `HiveCatalog.initialize(name, properties)` propagated 
`CatalogProperties.URI` and `CatalogProperties.WAREHOUSE_LOCATION` to the 
Hadoop configuration but silently dropped any external-warehouse value, so 
`getExternalWarehouseLocation()` and `convertToDatabase()` failed with:
   
   ```
   Warehouse location is not set: hive.metastore.warehouse.external.dir=null
   ```
   
   whenever a caller reached `HiveCatalog` through the standard Iceberg 
`Catalog.initialize(name, properties)` API without separately mutating the 
`Configuration` first.
   
   The two existing callers worked around the gap differently and used 
different property keys:
   
   - `HMSCatalogFactory.java` (line 86-91 before this change) put 
`"external-warehouse"` in the properties map AND re-set the value on the 
`Configuration` before `initialize()`, with a comment that explicitly 
acknowledged the API gap: `// HiveCatalog reads this property directly from 
Configuration, not from properties map`.
   - `IcebergSummaryHandler.java` (line 67 before this change) put 
`"externalwarehouse"` (no hyphen) in the properties map but did not perform the 
`Configuration.set` workaround — it only happened to work because the prior 
`setConf(configuration)` call carried the value through when 
`MetastoreConf.WAREHOUSE_EXTERNAL` was already set on the underlying Hadoop 
conf.
   
   External integrations using the standard Iceberg `Catalog.initialize(name, 
properties)` API (e.g. Polaris) had no documented way to pass the external 
warehouse location and tripped the NPE. After this change the property is 
documented on `HiveCatalog`, propagated via `initialize()`, and the two key 
spellings are unified.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes — adds a new public catalog property `external-warehouse` 
(`HiveCatalog.EXTERNAL_WAREHOUSE_LOCATION`) that callers can pass into 
`HiveCatalog.initialize(name, properties)` to set the external warehouse 
location. Existing flows that set `hive.metastore.warehouse.external.dir` 
directly on the `Configuration` continue to work unchanged.
   
   ### How was this patch tested?
   
   Added `testInitializeCatalogWithExternalWarehouseProperty` in 
`TestHiveCatalog`, mirroring the existing 
`testInitializeCatalogWithProperties`. The test verifies that both 
`WAREHOUSE_LOCATION` and the new `EXTERNAL_WAREHOUSE_LOCATION` propagate to 
their respective Hadoop conf keys with trailing-slash stripping.
   
   Local Maven test runs were blocked on unrelated lock contention from 
concurrent builds in my environment (`maven-remote-resources-plugin: Could not 
acquire lock(s)`), so I'm relying on CI for full verification. Happy to address 
any failures.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to