adriangb commented on code in PR #7015: URL: https://github.com/apache/arrow-rs/pull/7015#discussion_r1986134439
########## arrow-json/src/writer/encoder.rs: ########## @@ -25,126 +27,160 @@ use arrow_schema::{ArrowError, DataType, FieldRef}; use half::f16; use lexical_core::FormattedSize; use serde::Serializer; -use std::io::Write; +/// Configuration options for the JSON encoder. #[derive(Debug, Clone, Default)] pub struct EncoderOptions { + /// Whether to include nulls in the output or elide them. pub explicit_nulls: bool, + /// Whether to encode structs as JSON objects or JSON arrays of their values. pub struct_mode: StructMode, + /// An optional hook for customizing encoding behavior. + pub encoder_factory: Option<Arc<dyn EncoderFactory>>, +} + +/// A trait to create custom encoders for specific data types. +/// +/// This allows overriding the default encoders for specific data types, +/// or adding new encoders for custom data types. +pub trait EncoderFactory: std::fmt::Debug + Send + Sync { + /// Make an encoder that if returned runs before all of the default encoders. + /// This can be used to override how e.g. binary data is encoded so that it is an encoded string or an array of integers. + /// + /// Note that the type of the field may not match the type of the array: for dictionary arrays unless the top-level dictionary is handled this + /// will be called again for the keys and values of the dictionary, at which point the field type will still be the outer dictionary type but the + /// array will have a different type. + /// For example, `field`` might have the type `Dictionary(i32, Utf8)` but `array` will be `Utf8`. + fn make_default_encoder<'a>( + &self, + _field: &'a FieldRef, + _array: &'a dyn Array, + _options: &'a EncoderOptions, + ) -> Result<Option<Box<dyn Encoder + 'a>>, ArrowError> { + Ok(None) + } } /// A trait to format array values as JSON values /// /// Nullability is handled by the caller to allow encoding nulls implicitly, i.e. `{}` instead of `{"a": null}` pub trait Encoder { - /// Encode the non-null value at index `idx` to `out` + /// Encode the non-null value at index `idx` to `out`. /// - /// The behaviour is unspecified if `idx` corresponds to a null index + /// The behaviour is unspecified if `idx` corresponds to a null index. fn encode(&mut self, idx: usize, out: &mut Vec<u8>); + + /// Check if the value at `idx` is null. + fn is_null(&self, _idx: usize) -> bool; + + /// Short circuiting null checks if there are none. + fn has_nulls(&self) -> bool; Review Comment: Hard to remember now but I think the primary motivation was to encapsulate the (Encoder, NullBuffer) into the Encoder to make the whole API easier to work with. I've now collapsed these into a single method that returns an `Option<NullBuffer>` which can be hoisted out of loops. I think that should be equivalent to the old implementation but with a better API for implementers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org