Re: [PR] Add hooks to json encoder to override default encoding or add support for unsupported types [arrow-rs]

via GitHub Sat, 08 Mar 2025 10:55:16 -0800


adriangb commented on code in PR #7015:
URL: https://github.com/apache/arrow-rs/pull/7015#discussion_r1986134439



##########
arrow-json/src/writer/encoder.rs:
##########
@@ -25,126 +27,160 @@ use arrow_schema::{ArrowError, DataType, FieldRef};
 use half::f16;
 use lexical_core::FormattedSize;
 use serde::Serializer;
-use std::io::Write;
 
+/// Configuration options for the JSON encoder.
 #[derive(Debug, Clone, Default)]
 pub struct EncoderOptions {
+    /// Whether to include nulls in the output or elide them.
     pub explicit_nulls: bool,
+    /// Whether to encode structs as JSON objects or JSON arrays of their 
values.
     pub struct_mode: StructMode,
+    /// An optional hook for customizing encoding behavior.
+    pub encoder_factory: Option<Arc<dyn EncoderFactory>>,
+}
+
+/// A trait to create custom encoders for specific data types.
+///
+/// This allows overriding the default encoders for specific data types,
+/// or adding new encoders for custom data types.
+pub trait EncoderFactory: std::fmt::Debug + Send + Sync {
+    /// Make an encoder that if returned runs before all of the default 
encoders.
+    /// This can be used to override how e.g. binary data is encoded so that 
it is an encoded string or an array of integers.
+    ///
+    /// Note that the type of the field may not match the type of the array: 
for dictionary arrays unless the top-level dictionary is handled this
+    /// will be called again for the keys and values of the dictionary, at 
which point the field type will still be the outer dictionary type but the
+    /// array will have a different type.
+    /// For example, `field`` might have the type `Dictionary(i32, Utf8)` but 
`array` will be `Utf8`.
+    fn make_default_encoder<'a>(
+        &self,
+        _field: &'a FieldRef,
+        _array: &'a dyn Array,
+        _options: &'a EncoderOptions,
+    ) -> Result<Option<Box<dyn Encoder + 'a>>, ArrowError> {
+        Ok(None)
+    }
 }
 
 /// A trait to format array values as JSON values
 ///
 /// Nullability is handled by the caller to allow encoding nulls implicitly, 
i.e. `{}` instead of `{"a": null}`
 pub trait Encoder {
-    /// Encode the non-null value at index `idx` to `out`
+    /// Encode the non-null value at index `idx` to `out`.
     ///
-    /// The behaviour is unspecified if `idx` corresponds to a null index
+    /// The behaviour is unspecified if `idx` corresponds to a null index.
     fn encode(&mut self, idx: usize, out: &mut Vec<u8>);
+
+    /// Check if the value at `idx` is null.
+    fn is_null(&self, _idx: usize) -> bool;
+
+    /// Short circuiting null checks if there are none.
+    fn has_nulls(&self) -> bool;

Review Comment:
   Hard to remember now but I think the primary motivation was to encapsulate 
the (Encoder, NullBuffer) into the Encoder to make the whole API easier to work 
with.
   
   I've now collapsed these into a single method that returns an 
`Option<NullBuffer>` which can be hoisted out of loops. I think that should be 
equivalent to the old implementation but with a better API for implementers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Add hooks to json encoder to override default encoding or add support for unsupported types [arrow-rs]

Reply via email to