object stores [arrow-rs]

via GitHub Fri, 23 Aug 2024 09:37:28 -0700


Xuanwo commented on code in PR #6157:
URL: https://github.com/apache/arrow-rs/pull/6157#discussion_r1729235784



##########
parquet/src/arrow/async_reader/metadata.rs:
##########
@@ -52,7 +51,44 @@ impl<F: MetadataFetch> MetadataLoader<F> {
     /// Create a new [`MetadataLoader`] by reading the footer information
     ///
     /// See [`fetch_parquet_metadata`] for the meaning of the individual 
parameters
-    pub async fn load(mut fetch: F, file_size: usize, prefetch: Option<usize>) 
-> Result<Self> {
+    pub async fn load(mut fetch: F, prefetch: Option<usize>) -> Result<Self> {
+        let suffix = 
fetch.fetch(GetRange::Suffix(prefetch.unwrap_or(8))).await?;

Review Comment:
   > As described 
[here](https://github.com/apache/arrow-rs/pull/5222#issuecomment-1874131333), I 
believe the default implementation for azure is to make two requests: one for 
the the length and another for the suffix data. But then this is much more 
performant on all other platforms
   
   Hi, I disagree with this because, in most cases, we already have the file 
size from `ListObjects` or other metadata services. So, this change doesn't 
perform better on other platforms and has a negative effect on azblob.
   
   Would you reconsider the choice by adding a new function instead of changing 
the existing one? This would also allow us to include it in the next minor 
version, avoiding a breaking change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Parquet/async: Default to suffix requests on supporting readers/object stores [arrow-rs]

Reply via email to