GitHub user keen85 created a discussion: Azure Storage Gen2 (hierarchical namespaces) - use DFS endpoint to improve performance
Hi, Azure Storage comes in different flavors. A "regular" Azure Storage account is a classic object store and it comes with blob endpoint (`blob.core.windows.net`, `blob.fabric.windows.com`). Azure Storage Account with [hierarchical namespaces](https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace) enabled (aka. Azure Data Lake Gen2, Fabric OneLake) come with a _second_ "DFS" (Distributed Filesystem) endpoint: `dfs.core.windows.net` (`dfs.fabric.windows.com`) that has it's own REST-API: https://learn.microsoft.com/en-us/rest/api/storageservices/data-lake-storage-gen2 This endpoint allows for potentially massive performance improvements in the following scenarios: - recursive directory listings - renaming or moving files or folders - deleting directories with many files These are _single atomic operations_ via DFS. With the Blob endpoint they require many individual blob operations and are therefore slower. To my understanding, DFS-URLs are already _accepted_ but ultimately only the [blob endpoint is used by object_store](https://github.com/apache/arrow-rs-object-store/blob/main/src/azure/client.rs). I propose **implementing the DFS endpoint in object_store** and introducing a feature-flag allowing user to specify if DFS endpoint should be used. GitHub link: https://github.com/apache/arrow-rs-object-store/discussions/481 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
