Xuanwo commented on code in PR #6157:
URL: https://github.com/apache/arrow-rs/pull/6157#discussion_r1729235784
##########
parquet/src/arrow/async_reader/metadata.rs:
##########
@@ -52,7 +51,44 @@ impl<F: MetadataFetch> MetadataLoader<F> {
/// Create a new [`MetadataLoader`] by reading the footer information
///
/// See [`fetch_parquet_metadata`] for the meaning of the individual
parameters
- pub async fn load(mut fetch: F, file_size: usize, prefetch: Option<usize>)
-> Result<Self> {
+ pub async fn load(mut fetch: F, prefetch: Option<usize>) -> Result<Self> {
+ let suffix =
fetch.fetch(GetRange::Suffix(prefetch.unwrap_or(8))).await?;
Review Comment:
> As described
[here](https://github.com/apache/arrow-rs/pull/5222#issuecomment-1874131333), I
believe the default implementation for azure is to make two requests: one for
the the length and another for the suffix data. But then this is much more
performant on all other platforms
Hi, I disagree with this because, in most cases, we already have the file
size from `ListObjects` or other metadata services. So, this change doesn't
perform better on other platforms and has a negative effect on azblob.
Would you reconsider the choice by adding a new function instead of changing
the existing one? This would also allow us to include it in the next minor
version, avoiding a breaking change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]