jecsand838 commented on code in PR #9462:
URL: https://github.com/apache/arrow-rs/pull/9462#discussion_r2880711286


##########
arrow-avro/src/reader/async_reader/builder.rs:
##########
@@ -109,8 +109,18 @@ impl<R> ReaderBuilder<R> {
     }
 }
 
-impl<R: AsyncFileReader> ReaderBuilder<R> {
-    async fn read_header(&mut self) -> Result<(Header, u64), AvroError> {
+impl<R> ReaderBuilder<R>

Review Comment:
   > I wonder, do we want to allow maybe a with_header function as well? that 
will accept a user's header directly?
   
   I’d be a bit hesitant about `with_header`. For OCF, the file header is the 
source of truth as it carries the required `avro.schema` metadata, and this 
reader also relies on the parsed header length to know where block decoding 
should begin. If callers can inject a header, we’d be operating on out-of-band 
metadata rather than the file’s actual header bytes, and we’d also need to 
define how header length / start offset are supplied and validated. That feels 
like a separate optimization for cached/ranged reads, rather than part of this 
PR’s goal of exposing the discovered writer schema imo.
   
   > Header is not currently public, but this could be just an oversight. Its 
interface looks public-ready.
   
   We had tightened the public API for the crate prior to initial release to 
make shipping potential fixes with minor releases easier. However `Header`, 
`HeaderDecoder`, and `read_header` being publicly exposed makes complete sense 
now that there's a good use-case for it imo.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to