djouallah opened a new issue, #1356:
URL: https://github.com/apache/datafusion-python/issues/1356
## Summary
The `MicrosoftAzure` class in `datafusion.object_store` does not expose the
`use_fabric_endpoint` parameter that exists in the underlying Rust
`object_store` crate, making it impossible to connect to Microsoft Fabric
OneLake storage.
## Environment
- **datafusion-python version**: (51)
- **Operating System**: (linux)
## Current Behavior
When attempting to connect to OneLake using the `MicrosoftAzure` class with
bearer token authentication, the object store defaults to using the Azure Blob
Storage endpoint (`blob.core.windows.net`) instead of the required Data Lake
Storage Gen2/Fabric endpoint (`dfs.fabric.microsoft.com`).
This results in authentication errors when trying to access OneLake paths:
```python
from datafusion.object_store import MicrosoftAzure
from datafusion import SessionContext
import os
ctx = SessionContext()
onelake_store = MicrosoftAzure(
container_name="delta_rs",
account='onelake',
bearer_token=os.environ["AZURE_STORAGE_TOKEN"]
)
ctx.register_object_store(
"abfss://[email protected]/",
onelake_store,
None
)
ctx.sql("""
CREATE EXTERNAL TABLE test
STORED AS CSV
LOCATION
'abfss://[email protected]/test.Lakehouse/Files/csv'
""")
```
**Error:**
```
DataFusion error: ObjectStore(Unauthenticated {
path: "test.Lakehouse/Files/csv",
source: RetryError(...,
uri:
Some(https://onelake.blob.core.windows.net/delta_rs/test.Lakehouse/Files/csv),
...
)
})
```
Notice the URI shows `blob.core.windows.net` instead of
`dfs.fabric.microsoft.com`.
## Expected Behavior
The `MicrosoftAzure` class should expose a `use_fabric_endpoint` parameter
(or similar) to allow connections to Microsoft Fabric OneLake, which uses the
Data Lake Storage Gen2 endpoint format.
The underlying Rust `object_store` crate already has this functionality via
`MicrosoftAzureBuilder::with_use_fabric_endpoint()`:
https://docs.rs/object_store/latest/object_store/azure/struct.MicrosoftAzureBuilder.html#method.with_use_fabric_endpoint
## Proposed Solution
Add a `use_fabric_endpoint` parameter to the Python `MicrosoftAzure` class
constructor:
```python
onelake_store = MicrosoftAzure(
container_name="delta_rs",
account='onelake',
bearer_token=os.environ["AZURE_STORAGE_TOKEN"],
use_fabric_endpoint=True # <- New parameter
)
```
This should map to the Rust builder's `with_use_fabric_endpoint()` method.
## Additional Context
Microsoft Fabric OneLake is becoming increasingly popular for data lakehouse
scenarios. Adding this support would enable datafusion-python users to query
OneLake data directly, similar to how they can query S3, GCS, and regular Azure
Blob Storage.
OneLake uses the `abfss://` scheme with the endpoint format:
`abfss://<workspace>@<account>.dfs.fabric.microsoft.com/<item>/<path>`
## Related Documentation
- [[OneLake Access
API](https://learn.microsoft.com/en-us/fabric/onelake/onelake-access-api)](https://learn.microsoft.com/en-us/fabric/onelake/onelake-access-api)
- [[Rust object_store
MicrosoftAzureBuilder](https://docs.rs/object_store/latest/object_store/azure/struct.MicrosoftAzureBuilder.html)](https://docs.rs/object_store/latest/object_store/azure/struct.MicrosoftAzureBuilder.html)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]