jecsand838 commented on code in PR #9462:
URL: https://github.com/apache/arrow-rs/pull/9462#discussion_r2880711286
##########
arrow-avro/src/reader/async_reader/builder.rs:
##########
@@ -109,8 +109,18 @@ impl<R> ReaderBuilder<R> {
}
}
-impl<R: AsyncFileReader> ReaderBuilder<R> {
- async fn read_header(&mut self) -> Result<(Header, u64), AvroError> {
+impl<R> ReaderBuilder<R>
Review Comment:
> I wonder, do we want to allow maybe a with_header function as well? that
will accept a user's header directly?
I’d be a bit hesitant about `with_header`. For OCF, the file header is the
source of truth as it carries the required `avro.schema` metadata, and this
reader also relies on the parsed header length to know where block decoding
should begin. If callers can inject a header, we’d be operating on out-of-band
metadata rather than the file’s actual header bytes, and we’d also need to
define how header length / start offset are supplied and validated. That feels
like a separate optimization for cached/ranged reads, rather than part of this
PR’s goal of exposing the discovered writer schema imo.
> Header is not currently public, but this could be just an oversight. Its
interface looks public-ready.
We had tightened the crate's public API prior to initial release. That way
shipping potential fixes with minor releases would be simpler. However
`Header`, `HeaderDecoder`, and `read_header` being publicly exposed makes
complete sense now that there's a good use-case for it imo.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]