H-Plus-Time commented on PR #6157:
URL: https://github.com/apache/arrow-rs/pull/6157#issuecomment-2309140943

   > I'm pretty apprehensive about changing the default behaviour to use suffix 
requests given there are stores that are known to not support them.
   > 
   > I wonder what you think of instead performing the initial suffix read of 
the prefetch size manually, and then constructing an AsyncFileReader with these 
bytes cached? This initial fetch request would tell you the total size of the 
object and so no changes would be necessary to traits, allowing us to keep the 
current simple interface and avoid potential compatibility footguns, but also 
allow you to avoid making additional HEAD requests for your use case?
   
   I agree with defaulting to swapping back the default non-suffix requests 
(consensus seems to be reached on that).
   
   The manual approach isn't an option in my main use-case (browser side fetch 
as the underlying IO method) - usually resources are cross-origin, and 
Content-Range is not CORS-safelisted (the target origin must include it in 
Access-Control-Expose-Headers), so you can't determine the object's size from a 
ranged GET request alone without considerable intervention.
   
   Shifting the suffix request path to a secondary `load_without_size`, and 
probably re-emphasising the file_size parameter (that is, make it an explicit 
choice on the users' part to _not_ give the reader that information), I think 
that would be the least disruptive change.
   
   Perhaps the following would be sufficient:
   ```rust
   /// store.rs
   impl ParquetObjectReader {
     pub fn new(store: Arc<dyn ObjectStore>, meta: ObjectMeta) -> Self
     pub fn new_without_size(store: Arc<dyn ObjectStore>, location: Path) -> 
Self
   }
   
   /// metadata.rs
   
   impl<F: MetadataFetch> MetadataLoader<F> {
     pub async fn load(mut fetch: F, file_size: usize, prefetch: Option<usize>) 
-> Result<Self>
     pub async fn load_without_size(mut fetch: F, prefetch: Option<usize>) -> 
Result<Self>
   }
   
   // fetch_parquet_metadata elided for brevity - file_size: usize -> 
Option<usize>
   ```
   
   With the above, the remaining footguns (calling fetch_parquet_metadata with 
file_size=None, calling load_without_size, or supplying a suffix range directly 
to get_bytes) would require quite a lot of intent.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to