1fanwang opened a new pull request, #6454:
URL: https://github.com/apache/hive/pull/6454
### What changes were proposed in this pull request?
Add `HiveCatalog.EXTERNAL_WAREHOUSE_LOCATION` ("external-warehouse") as a
canonical catalog property and propagate it from `HiveCatalog.initialize(name,
properties)` to `hive.metastore.warehouse.external.dir`, mirroring the existing
handling for `CatalogProperties.WAREHOUSE_LOCATION` (including
`LocationUtil.stripTrailingSlash`).
The two in-tree callers are unified onto the new constant:
- `HMSCatalogFactory#createCatalog` drops its redundant
`configuration.set(...)` workaround that was needed because the property was
not being read from the properties map.
- `IcebergSummaryHandler#initialize` switches from the unhyphenated
`"externalwarehouse"` key (which was effectively dead code —
`HiveCatalog.initialize` never read it) to the canonical name.
### Why are the changes needed?
`HiveCatalog.initialize(name, properties)` propagated
`CatalogProperties.URI` and `CatalogProperties.WAREHOUSE_LOCATION` to the
Hadoop configuration but silently dropped any external-warehouse value, so
`getExternalWarehouseLocation()` and `convertToDatabase()` failed with:
```
Warehouse location is not set: hive.metastore.warehouse.external.dir=null
```
whenever a caller reached `HiveCatalog` through the standard Iceberg
`Catalog.initialize(name, properties)` API without separately mutating the
`Configuration` first.
The two existing callers worked around the gap differently and used
different property keys:
- `HMSCatalogFactory.java` (line 86-91 before this change) put
`"external-warehouse"` in the properties map AND re-set the value on the
`Configuration` before `initialize()`, with a comment that explicitly
acknowledged the API gap: `// HiveCatalog reads this property directly from
Configuration, not from properties map`.
- `IcebergSummaryHandler.java` (line 67 before this change) put
`"externalwarehouse"` (no hyphen) in the properties map but did not perform the
`Configuration.set` workaround — it only happened to work because the prior
`setConf(configuration)` call carried the value through when
`MetastoreConf.WAREHOUSE_EXTERNAL` was already set on the underlying Hadoop
conf.
External integrations using the standard Iceberg `Catalog.initialize(name,
properties)` API (e.g. Polaris) had no documented way to pass the external
warehouse location and tripped the NPE. After this change the property is
documented on `HiveCatalog`, propagated via `initialize()`, and the two key
spellings are unified.
### Does this PR introduce _any_ user-facing change?
Yes — adds a new public catalog property `external-warehouse`
(`HiveCatalog.EXTERNAL_WAREHOUSE_LOCATION`) that callers can pass into
`HiveCatalog.initialize(name, properties)` to set the external warehouse
location. Existing flows that set `hive.metastore.warehouse.external.dir`
directly on the `Configuration` continue to work unchanged.
### How was this patch tested?
Added `testInitializeCatalogWithExternalWarehouseProperty` in
`TestHiveCatalog`, mirroring the existing
`testInitializeCatalogWithProperties`. The test verifies that both
`WAREHOUSE_LOCATION` and the new `EXTERNAL_WAREHOUSE_LOCATION` propagate to
their respective Hadoop conf keys with trailing-slash stripping.
Local Maven test runs were blocked on unrelated lock contention from
concurrent builds in my environment (`maven-remote-resources-plugin: Could not
acquire lock(s)`), so I'm relying on CI for full verification. Happy to address
any failures.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]