djouallah opened a new issue, #1356:
URL: https://github.com/apache/datafusion-python/issues/1356

   ## Summary
   The `MicrosoftAzure` class in `datafusion.object_store` does not expose the 
`use_fabric_endpoint` parameter that exists in the underlying Rust 
`object_store` crate, making it impossible to connect to Microsoft Fabric 
OneLake storage.
   
   ## Environment
   - **datafusion-python version**: (51)
   - **Operating System**: (linux)
   
   ## Current Behavior
   When attempting to connect to OneLake using the `MicrosoftAzure` class with 
bearer token authentication, the object store defaults to using the Azure Blob 
Storage endpoint (`blob.core.windows.net`) instead of the required Data Lake 
Storage Gen2/Fabric endpoint (`dfs.fabric.microsoft.com`).
   
   This results in authentication errors when trying to access OneLake paths:
   
   ```python
   from datafusion.object_store import MicrosoftAzure
   from datafusion import SessionContext
   import os
   
   ctx = SessionContext()
   onelake_store = MicrosoftAzure(
       container_name="delta_rs",
       account='onelake',
       bearer_token=os.environ["AZURE_STORAGE_TOKEN"]
   )
   
   ctx.register_object_store(
       "abfss://[email protected]/",
       onelake_store,
       None
   )
   
   ctx.sql("""
       CREATE EXTERNAL TABLE test
       STORED AS CSV
       LOCATION 
'abfss://[email protected]/test.Lakehouse/Files/csv'
   """)
   ```
   
   **Error:**
   ```
   DataFusion error: ObjectStore(Unauthenticated { 
       path: "test.Lakehouse/Files/csv", 
       source: RetryError(..., 
           uri: 
Some(https://onelake.blob.core.windows.net/delta_rs/test.Lakehouse/Files/csv), 
           ...
       )
   })
   ```
   
   Notice the URI shows `blob.core.windows.net` instead of 
`dfs.fabric.microsoft.com`.
   
   ## Expected Behavior
   The `MicrosoftAzure` class should expose a `use_fabric_endpoint` parameter 
(or similar) to allow connections to Microsoft Fabric OneLake, which uses the 
Data Lake Storage Gen2 endpoint format.
   
   The underlying Rust `object_store` crate already has this functionality via 
`MicrosoftAzureBuilder::with_use_fabric_endpoint()`:
   
https://docs.rs/object_store/latest/object_store/azure/struct.MicrosoftAzureBuilder.html#method.with_use_fabric_endpoint
   
   ## Proposed Solution
   Add a `use_fabric_endpoint` parameter to the Python `MicrosoftAzure` class 
constructor:
   
   ```python
   onelake_store = MicrosoftAzure(
       container_name="delta_rs",
       account='onelake',
       bearer_token=os.environ["AZURE_STORAGE_TOKEN"],
       use_fabric_endpoint=True  # <- New parameter
   )
   ```
   
   This should map to the Rust builder's `with_use_fabric_endpoint()` method.
   
   ## Additional Context
   Microsoft Fabric OneLake is becoming increasingly popular for data lakehouse 
scenarios. Adding this support would enable datafusion-python users to query 
OneLake data directly, similar to how they can query S3, GCS, and regular Azure 
Blob Storage.
   
   OneLake uses the `abfss://` scheme with the endpoint format:
   `abfss://<workspace>@<account>.dfs.fabric.microsoft.com/<item>/<path>`
   
   ## Related Documentation
   - [[OneLake Access 
API](https://learn.microsoft.com/en-us/fabric/onelake/onelake-access-api)](https://learn.microsoft.com/en-us/fabric/onelake/onelake-access-api)
   - [[Rust object_store 
MicrosoftAzureBuilder](https://docs.rs/object_store/latest/object_store/azure/struct.MicrosoftAzureBuilder.html)](https://docs.rs/object_store/latest/object_store/azure/struct.MicrosoftAzureBuilder.html)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to