jecsand838 commented on code in PR #8274:
URL: https://github.com/apache/arrow-rs/pull/8274#discussion_r2322647719
##########
arrow-avro/src/writer/format.rs:
##########
@@ -44,24 +43,6 @@ pub trait AvroFormat: Debug + Default {
#[derive(Debug, Default)]
pub struct AvroOcfFormat {
sync_marker: [u8; 16],
- /// Optional encoder behavior hints to keep file header schema ordering
- /// consistent with value encoding (e.g. Impala null-second).
- encoder_options: EncoderOptions,
-}
-
-impl AvroOcfFormat {
- /// Optional helper to attach encoder options (i.e., Impala null-second)
to the format.
- #[allow(dead_code)]
- pub fn with_encoder_options(mut self, opts: EncoderOptions) -> Self {
- self.encoder_options = opts;
- self
- }
-
- /// Access the options used by this format.
- #[allow(dead_code)]
- pub fn encoder_options(&self) -> &EncoderOptions {
- &self.encoder_options
- }
Review Comment:
That's correct. I realized having encoder options created two sources of
truth informing encoder behavior: The schema and the encoder options. The
correct way is to have the schema be the only source of truth, otherwise we
risk having records written that deviate form the associated writer schema,
i.e. rows written to an Avro file that cannot be decoded using the writer
schema provided in the file's header.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]