Dummk0pf opened a new pull request, #4783: URL: https://github.com/apache/datafusion-comet/pull/4783
## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/4747. ## Rationale for this change Currently comet native scan cannot authenticate properly for reading parquet files from azure storage bucket, this PR contains changes to solve this issue. ## What changes are included in this PR? This PR contains changes to support passing the required arguments to object store crate so it can authenticate and read files from azure. 1. Starts with `MicrosoftAzureBuilder::from_env()`, so any AKS-injected `AZURE_*` env vars (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_FEDERATED_TOKEN_FILE`, `AZURE_AUTHORITY_HOST`, `AZURE_STORAGE_*`) are honoured out of the box. This is what makes Workload Identity work in a stock AKS pod with no extra config. 2. Layers `.with_url(url)` on top so account + container are picked up from the URL. 3. Translates the well-known Hadoop ABFS/WASB auth keys into the corresponding `object_store` `AzureConfigKey` options and applies them via `.with_config(...)`, overriding whatever `from_env()` produced. Supported mappings: | Hadoop key (account-scoped suffix omitted) | `AzureConfigKey` | | --------------------------------------------- | -------------------- | | `fs.azure.account.key` | `AccessKey` | | `[fs.azure.account.oauth2.client.id](http://fs.azure.account.oauth2.client.id/)` | `ClientId` | | `fs.azure.account.oauth2.client.secret` | `ClientSecret` | | `fs.azure.account.oauth2.client.endpoint` | `AuthorityId` (tenant extracted from the URL path) | | `fs.azure.account.oauth2.msi.tenant` | `AuthorityId` | | `fs.azure.account.oauth2.msi.endpoint` | `MsiEndpoint` | | `fs.azure.account.oauth2.msi.authority` | `AuthorityHost` | | `fs.azure.account.oauth2.token.file` | `FederatedTokenFile` | | `fs.azure.sas.<container>.<account>` | `SasKey` | Account-scoped variants (`<base>.<account>.[dfs.core.windows.net](http://dfs.core.windows.net/)`, `<base>.<account>.[blob.core.windows.net](http://blob.core.windows.net/)`, `<base>.<account>` etc.) are preferred over global keys, mirroring Hadoop's own `AbfsConfiguration` precedence. 4. `parquet_support.rs` now dispatches Azure URL schemes (`abfs`, `abfss`, `wasb`, `wasbs`, `az`, `azure`, `adl`) to `objectstore::azure::create_store` via a new `is_azure_scheme` helper, so all Azure requests share this credential path. ## How are these changes tested? ### Screenshot of error message when comet 0.16.0 was used. <img width="1905" height="781" alt="Screenshot 2026-07-01 at 10 27 58 AM" src="https://github.com/user-attachments/assets/de904f2c-87f6-4f2d-b09c-ab595e9c56e4" /> ### Screenshot of stage finishing successfully and query plan of the stage which indicates usage of native scan <img width="1909" height="667" alt="Screenshot 2026-07-01 at 10 28 07 AM" src="https://github.com/user-attachments/assets/5b14321c-188c-47e2-8e87-61fe7ef5eb16" /> <img width="1893" height="822" alt="Screenshot 2026-07-01 at 10 28 19 AM" src="https://github.com/user-attachments/assets/d515f5fb-4f12-4d0f-af13-6771d7331f88" /> <img width="571" height="823" alt="Screenshot 2026-07-01 at 10 28 44 AM" src="https://github.com/user-attachments/assets/56cfce4d-f968-42bc-b0e3-f661af9f3027" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
