GitHub user keen85 created a discussion: Azure Storage Gen2 (hierarchical 
namespaces) - use DFS endpoint to improve performance

Hi,
Azure Storage comes in different flavors. A "regular" Azure Storage account is 
a classic object store and it comes with blob endpoint 
(`blob.core.windows.net`, `blob.fabric.windows.com`).

Azure Storage Account with [hierarchical 
namespaces](https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace)
 enabled (aka. Azure Data Lake Gen2, Fabric OneLake) come with a _second_ "DFS" 
(Distributed Filesystem) endpoint: `dfs.core.windows.net` 
(`dfs.fabric.windows.com`) that has it's own REST-API: 
https://learn.microsoft.com/en-us/rest/api/storageservices/data-lake-storage-gen2

This endpoint allows for potentially massive performance improvements in the 
following scenarios:
- recursive directory listings
- renaming or moving files or folders
- deleting directories with many files

These are _single atomic operations_ via DFS. With the Blob endpoint they 
require many individual blob operations and are therefore slower.

To my understanding, DFS-URLs are already _accepted_ but ultimately only the 
[blob endpoint is used by 
object_store](https://github.com/apache/arrow-rs-object-store/blob/main/src/azure/client.rs).

I propose **implementing the DFS endpoint in object_store** and introducing a 
feature-flag allowing user to specify if DFS endpoint should be used.

GitHub link: https://github.com/apache/arrow-rs-object-store/discussions/481

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to