Dummk0pf opened a new pull request, #4783:
URL: https://github.com/apache/datafusion-comet/pull/4783

   ## Which issue does this PR close?
   
   Closes https://github.com/apache/datafusion-comet/issues/4747.
   
   ## Rationale for this change
   
   Currently comet native scan cannot authenticate properly for reading parquet 
files from azure storage bucket, this PR contains changes to solve this issue.
   
   ## What changes are included in this PR?
   
   This PR contains changes to support passing the required arguments to object 
store crate so it can authenticate and read files from azure.
   
   1. Starts with `MicrosoftAzureBuilder::from_env()`, so any AKS-injected 
`AZURE_*`
       env vars (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, 
`AZURE_FEDERATED_TOKEN_FILE`,
       `AZURE_AUTHORITY_HOST`, `AZURE_STORAGE_*`) are honoured out of the box. 
This is
       what makes Workload Identity work in a stock AKS pod with no extra 
config.
    2. Layers `.with_url(url)` on top so account + container are picked up from 
the URL.
    3. Translates the well-known Hadoop ABFS/WASB auth keys into the 
corresponding
       `object_store` `AzureConfigKey` options and applies them via 
`.with_config(...)`,
       overriding whatever `from_env()` produced. Supported mappings:
    
       | Hadoop key (account-scoped suffix omitted) | `AzureConfigKey` |
       | --------------------------------------------- | -------------------- |
       | `fs.azure.account.key` | `AccessKey` |
       | 
`[fs.azure.account.oauth2.client.id](http://fs.azure.account.oauth2.client.id/)`
 | `ClientId` |
       | `fs.azure.account.oauth2.client.secret` | `ClientSecret` |
       | `fs.azure.account.oauth2.client.endpoint` | `AuthorityId` (tenant 
extracted from the URL path) |
       | `fs.azure.account.oauth2.msi.tenant` | `AuthorityId` |
       | `fs.azure.account.oauth2.msi.endpoint` | `MsiEndpoint` |
       | `fs.azure.account.oauth2.msi.authority` | `AuthorityHost` |
       | `fs.azure.account.oauth2.token.file` | `FederatedTokenFile` |
       | `fs.azure.sas.<container>.<account>` | `SasKey` |
    
       Account-scoped variants
       
(`<base>.<account>.[dfs.core.windows.net](http://dfs.core.windows.net/)`, 
`<base>.<account>.[blob.core.windows.net](http://blob.core.windows.net/)`,
       `<base>.<account>` etc.) are preferred over global keys, mirroring 
Hadoop's own
       `AbfsConfiguration` precedence.
    
    4. `parquet_support.rs` now dispatches Azure URL schemes
    (`abfs`, `abfss`, `wasb`, `wasbs`, `az`, `azure`, `adl`) to
    `objectstore::azure::create_store` via a new `is_azure_scheme` helper, so 
all Azure
    requests share this credential path.
    
   
   ## How are these changes tested?
   
   ### Screenshot of error message when comet 0.16.0 was used.
   <img width="1905" height="781" alt="Screenshot 2026-07-01 at 10 27 58 AM" 
src="https://github.com/user-attachments/assets/de904f2c-87f6-4f2d-b09c-ab595e9c56e4";
 />
   
   
   
   ### Screenshot of stage finishing successfully and query plan of the stage 
which indicates usage of native scan
   
   <img width="1909" height="667" alt="Screenshot 2026-07-01 at 10 28 07 AM" 
src="https://github.com/user-attachments/assets/5b14321c-188c-47e2-8e87-61fe7ef5eb16";
 />
   
   <img width="1893" height="822" alt="Screenshot 2026-07-01 at 10 28 19 AM" 
src="https://github.com/user-attachments/assets/d515f5fb-4f12-4d0f-af13-6771d7331f88";
 />
   
   <img width="571" height="823" alt="Screenshot 2026-07-01 at 10 28 44 AM" 
src="https://github.com/user-attachments/assets/56cfce4d-f968-42bc-b0e3-f661af9f3027";
 />
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to