difin commented on code in PR #6474:
URL: https://github.com/apache/hive/pull/6474#discussion_r3455665616


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveTableUtil.java:
##########
@@ -244,6 +244,7 @@ public static Table deserializeTable(Configuration config, 
String name) {
       table = readTableObjectFromFile(location, config);
     }
     checkAndSetIoConfig(config, table);
+    IcebergVendedCredentialUtil.applyFromJobConf(table, config);

Review Comment:
   You're right that most Iceberg clients don't need to ser/de credentials 
themselves. Hive does, because we serialize the Iceberg Table 
(SerializableTable) into JobConf for Tez/LLAP, and vended credentials on FileIO 
typically don't survive that round-trip. Executors rebuild the table from job 
conf and don't re-run REST `loadTable`, so we propagate credentials separately 
(`VENDED_STORAGE_CREDENTIALS` + `S3A` bucket keys) and restore them in 
`deserializeTable` via `applyFromJobConf`.
   
   On "store as-is / restore as-is" vs the helper methods:
   
   **`resolveCredentialsForApply()`**: At apply time credentials can live in 
different places depending on context:
   
   - Tez/LLAP tasks: deserialized Table has an empty `FileIO.credentials()`; 
the source of truth is the serialized blob in task conf (see 
`applyFromJobConfRestoresCredentialsOnExecutor`).
   - HS2 commit: the table often still has credentials from the earlier 
loadTable in QueryState; we prefer those over re-reading the blob.
   - Fallback: extractCredentials(table) if both are empty.
   
   **`credentialsFromFileIoProperties()`**: needed at plan/propagate time. Some 
REST/FileIO implementations populate `s3.access-key-id` / 
`s3.secret-access-key` in `FileIO.properties()` while credentials() is still 
empty after loadTable. extractCredentials uses this fallback so propagateToJob 
doesn't skip vending even though the table load succeeded 
(extractCredentialsFromFileIoPropertiesWhenCredentialListEmpty). It also backs 
the last-resort branch in resolveCredentialsForApply.
   
   **`withConfigurationOverrides`**: this is the part that does mutate vended 
credentials. REST catalogs often vend connectivity settings from their network 
view (e.g. `http://minio:9000` inside Docker), while Hive session config sets a 
host-reachable endpoint 
(`iceberg.catalog.ice01.s3.endpoint`=`http://host:9000`). We override only 
non-secret fields (`s3.endpoint`, `s3.path-style-access`) so Iceberg `FileIO` 
and S3A agree on connectivity; vended keys are preserved. It runs at both store 
time (`propagateToJob`, so the blob on executors is self-contained) and restore 
time (`applyFromJobConf`, e.g. when commit still has the catalog-internal 
endpoint on FileIO from `loadTable`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to