Tom-Newton opened a new issue, #40035: URL: https://github.com/apache/arrow/issues/40035
### Describe the enhancement requested Optimisation to https://github.com/apache/arrow/issues/37511 Child of https://github.com/apache/arrow/issues/18014 When reading from Azure blob storage the bandwidth we get per connection is very dependant on the latency to the filesystem. To achieve good bandwidth with high latency far greater concurrency is needed. For example this is relevant when reading from blob storage in a different region to your compute. As an example lets consider reading a parquet file. There are 2 levels of parallelism that I'm aware of when using Arrow and the native `AzureFileSystem`: 1. Arrow will make concurrent calls to `ReadAt` for each column and row group combination. At most we can have one concurrent connection per column and row group combination, so for small parquet files this may be less than we would like. 2. Within `ReadAt` the `AzureFileSystem` calls `BlobClient::DownloadTo` which implements some extra concurrency internally https://github.com/Azure/azure-sdk-for-cpp/blob/ddd0f4bd075d6715ac3004136a690445c4cde5c2/sdk/storage/azure-storage-blobs/src/blob_client.cpp#L516. Purpose of this issue is to make the [config options for this parallelism](https://github.com/Azure/azure-sdk-for-cpp/blob/ddd0f4bd075d6715ac3004136a690445c4cde5c2/sdk/storage/azure-storage-blobs/inc/azure/storage/blobs/blob_options.hpp#L691-L709) configurable by the user. ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
