difin commented on code in PR #6474:
URL: https://github.com/apache/hive/pull/6474#discussion_r3455665616
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveTableUtil.java:
##########
@@ -244,6 +244,7 @@ public static Table deserializeTable(Configuration config,
String name) {
table = readTableObjectFromFile(location, config);
}
checkAndSetIoConfig(config, table);
+ IcebergVendedCredentialUtil.applyFromJobConf(table, config);
Review Comment:
You're right that most Iceberg clients don't need to ser/de credentials
themselves. Hive does, because we serialize the Iceberg Table
(SerializableTable) into JobConf for Tez/LLAP, and vended credentials on FileIO
typically don't survive that round-trip. Executors rebuild the table from job
conf and don't re-run REST `loadTable`, so we propagate credentials separately
(`VENDED_STORAGE_CREDENTIALS` + `S3A` bucket keys) and restore them in
`deserializeTable` via `applyFromJobConf`.
On "store as-is / restore as-is" vs the helper methods:
**`resolveCredentialsForApply()`**: At apply time credentials can live in
different places depending on context:
- Tez/LLAP tasks: deserialized Table has an empty `FileIO.credentials()`;
the source of truth is the serialized blob in task conf (see
`applyFromJobConfRestoresCredentialsOnExecutor`).
- HS2 commit: the table often still has credentials from the earlier
loadTable in QueryState; we prefer those over re-reading the blob.
- Fallback: extractCredentials(table) if both are empty.
**`credentialsFromFileIoProperties()`**: needed at plan/propagate time. Some
REST/FileIO implementations populate `s3.access-key-id` /
`s3.secret-access-key` in `FileIO.properties()` while credentials() is still
empty after loadTable. extractCredentials uses this fallback so propagateToJob
doesn't skip vending even though the table load succeeded
(extractCredentialsFromFileIoPropertiesWhenCredentialListEmpty). It also backs
the last-resort branch in resolveCredentialsForApply.
**`withConfigurationOverrides`**: this is the part that does mutate vended
credentials. REST catalogs often vend connectivity settings from their network
view (e.g. `http://minio:9000` inside Docker), while Hive session config sets a
host-reachable endpoint
(`iceberg.catalog.ice01.s3.endpoint`=`http://host:9000`). We override only
non-secret fields (`s3.endpoint`, `s3.path-style-access`) so Iceberg `FileIO`
and S3A agree on connectivity; vended keys are preserved. It runs at both store
time (`propagateToJob`, so the blob on executors is self-contained) and restore
time (`applyFromJobConf`, e.g. when commit still has the catalog-internal
endpoint on FileIO from `loadTable`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]