Re: [PR] API for reading Variant data and metadata [arrow-rs]

via GitHub Wed, 21 May 2025 18:10:11 -0700


scovich commented on code in PR #7535:
URL: https://github.com/apache/arrow-rs/pull/7535#discussion_r2100806370



##########
parquet-variant/src/decoder.rs:
##########
@@ -0,0 +1,199 @@
+// NOTE: Largely based on the implementation of @PinkCrow007 in 
https://github.com/apache/arrow-rs/pull/7452
+// And the feedback there.

Review Comment:
   Maybe better as a github file comment instead of a source code comment?



##########
parquet-variant/src/decoder.rs:
##########
@@ -0,0 +1,199 @@
+// NOTE: Largely based on the implementation of @PinkCrow007 in 
https://github.com/apache/arrow-rs/pull/7452
+// And the feedback there.
+use crate::variant::VariantType;
+use arrow_schema::ArrowError;
+use std::{array::TryFromSliceError, str};
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantBasicType {
+    Primitive = 0,
+    ShortString = 1,
+    Object = 2,
+    Array = 3,
+}
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantPrimitiveType {
+    Null = 0,
+    BooleanTrue = 1,
+    BooleanFalse = 2,
+    Int8 = 3,
+    // TODO: Add 'legs' for the rest of primitives, once API is agreed upon
+    String = 16,
+}
+
+/// Extracts the basic type from a header byte
+pub(crate) fn get_basic_type(header: u8) -> Result<VariantBasicType, 
ArrowError> {
+    // See 
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-encoding
+    let basic_type = header & 0x03; // Basic type is encoded in the first 2 
bits
+    let basic_type = match basic_type {
+        0 => VariantBasicType::Primitive,
+        1 => VariantBasicType::ShortString,
+        2 => VariantBasicType::Object,
+        3 => VariantBasicType::Array,
+        _ => {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "unknown basic type: {}",
+                basic_type
+            )))
+        }

Review Comment:
   Isn't this case provably impossible to hit, given that we masked it to 2 
bits and there are match arms for all four possible values? The compiler just 
isn't smart enough to notice (requires "value range propagation" optimizations) 
and/or the language forbids it (because technically the `match` is on the u8 
type, not a particular u8 value).
   ```suggestion
           _ => unreachable!(),
   ```
   (The docs for [unreachable 
macro](https://doc.rust-lang.org/std/macro.unreachable.html#examples) 
specifically mention match arms as a use case)



##########
parquet-variant/src/decoder.rs:
##########
@@ -0,0 +1,199 @@
+// NOTE: Largely based on the implementation of @PinkCrow007 in 
https://github.com/apache/arrow-rs/pull/7452
+// And the feedback there.
+use crate::variant::VariantType;
+use arrow_schema::ArrowError;
+use std::{array::TryFromSliceError, str};
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantBasicType {
+    Primitive = 0,
+    ShortString = 1,
+    Object = 2,
+    Array = 3,
+}
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantPrimitiveType {
+    Null = 0,
+    BooleanTrue = 1,
+    BooleanFalse = 2,
+    Int8 = 3,
+    // TODO: Add 'legs' for the rest of primitives, once API is agreed upon
+    String = 16,
+}
+
+/// Extracts the basic type from a header byte
+pub(crate) fn get_basic_type(header: u8) -> Result<VariantBasicType, 
ArrowError> {
+    // See 
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-encoding
+    let basic_type = header & 0x03; // Basic type is encoded in the first 2 
bits
+    let basic_type = match basic_type {
+        0 => VariantBasicType::Primitive,
+        1 => VariantBasicType::ShortString,
+        2 => VariantBasicType::Object,
+        3 => VariantBasicType::Array,
+        _ => {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "unknown basic type: {}",
+                basic_type
+            )))
+        }
+    };
+    Ok(basic_type)
+}
+
+/// Extracts the primitive type from a header byte
+pub(crate) fn get_primitive_type(header: u8) -> Result<VariantPrimitiveType, 
ArrowError> {
+    // See 
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-encoding
+    //// Primitive type is encoded in the last 6 bits of the header byte
+    let primitive_type = (header >> 2) & 0x3F;
+    let primitive_type = match primitive_type {
+        0 => VariantPrimitiveType::Null,
+        1 => VariantPrimitiveType::BooleanTrue,
+        2 => VariantPrimitiveType::BooleanFalse,
+        3 => VariantPrimitiveType::Int8,
+        // TODO: Add 'legs' for the rest, once API is agreed upon
+        16 => VariantPrimitiveType::String,
+        _ => {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "unknown primitive type: {}",
+                primitive_type
+            )))
+        }
+    };
+    Ok(primitive_type)
+}
+
+/// Extracts the variant type from the value section of a variant. The variant
+/// type is defined as the set of all basic types and all primitive types.
+pub fn get_variant_type(value: &[u8]) -> Result<VariantType, ArrowError> {
+    if value.is_empty() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Tried to get variant type from empty buffer array".to_string(),
+        ));
+    }
+    let header = value[0];
+    let variant_type = match get_basic_type(header)? {
+        VariantBasicType::Primitive => match get_primitive_type(header)? {
+            VariantPrimitiveType::Null => VariantType::Null,
+            VariantPrimitiveType::Int8 => VariantType::Int8,
+            VariantPrimitiveType::BooleanTrue => VariantType::BooleanTrue,
+            VariantPrimitiveType::BooleanFalse => VariantType::BooleanFalse,
+            // TODO: Add 'legs' for the rest, once API is agreed upon
+            VariantPrimitiveType::String => VariantType::String,
+        },
+        VariantBasicType::ShortString => VariantType::ShortString,
+        VariantBasicType::Object => VariantType::Object,
+        VariantBasicType::Array => VariantType::Array,
+    };
+    Ok(variant_type)
+}
+
+/// To be used in `map_err` when unpacking an integer from a slice of bytes.
+fn map_try_from_slice_error(e: TryFromSliceError) -> ArrowError {
+    ArrowError::InvalidArgumentError(e.to_string())
+}
+
+/// Constructs the error message for an invalid UTF-8 string.
+fn invalid_utf8_err() -> ArrowError {
+    ArrowError::InvalidArgumentError("invalid UTF-8 string".to_string())
+}
+
+/// Decodes an Int8 from the value section of a variant.
+pub(crate) fn decode_int8(value: &[u8]) -> Result<i8, ArrowError> {
+    if value.is_empty() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Got empty value buffer so can't decode into int8.".to_string(),
+        ));
+    }
+    let value = i8::from_le_bytes([value[1]]);
+    Ok(value)
+}
+
+/// Decodes a long string from the value section of a variant.
+pub(crate) fn decode_long_string(value: &[u8]) -> Result<&str, ArrowError> {
+    if value.len() < 5 {
+        return Err(ArrowError::InvalidArgumentError(
+            "Tried to decode value buffer into long_string, but it's too short 
(len<5)."
+                .to_string(),
+        ));
+    }
+    let len =
+        
u32::from_le_bytes(value[1..=4].try_into().map_err(map_try_from_slice_error)?) 
as usize;
+    if value.len() < len + 5 {
+        let err_str = format!("The length of the buffer for the long_string is 
soo short, it is {} and it should be at least {} ({} < {} + 5)", value.len(), 
len + 5 , value.len(), len);
+        return Err(ArrowError::InvalidArgumentError(err_str));
+    }
+    let string_bytes = &value[5..5 + len];
+    let string = str::from_utf8(string_bytes).map_err(|_| invalid_utf8_err())?;
+    Ok(string)
+}
+
+/// Decodes a short string from the value section of a variant.
+pub(crate) fn decode_short_string(value: &[u8]) -> Result<&str, ArrowError> {
+    if value.is_empty() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Tried to decode value buffer into short_string, but it's 
empty.".to_string(),
+        ));
+    }
+    let len = ((value[0] & 0b11111100) >> 2) as usize;

Review Comment:
   The mask shouldn't be necessary? The rightmost two bits anyway get shifted 
out.



##########
parquet-variant/src/decoder.rs:
##########
@@ -0,0 +1,199 @@
+// NOTE: Largely based on the implementation of @PinkCrow007 in 
https://github.com/apache/arrow-rs/pull/7452
+// And the feedback there.
+use crate::variant::VariantType;
+use arrow_schema::ArrowError;
+use std::{array::TryFromSliceError, str};
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantBasicType {
+    Primitive = 0,
+    ShortString = 1,
+    Object = 2,
+    Array = 3,
+}
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantPrimitiveType {
+    Null = 0,
+    BooleanTrue = 1,
+    BooleanFalse = 2,
+    Int8 = 3,
+    // TODO: Add 'legs' for the rest of primitives, once API is agreed upon
+    String = 16,
+}
+
+/// Extracts the basic type from a header byte
+pub(crate) fn get_basic_type(header: u8) -> Result<VariantBasicType, 
ArrowError> {
+    // See 
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-encoding
+    let basic_type = header & 0x03; // Basic type is encoded in the first 2 
bits
+    let basic_type = match basic_type {
+        0 => VariantBasicType::Primitive,
+        1 => VariantBasicType::ShortString,
+        2 => VariantBasicType::Object,
+        3 => VariantBasicType::Array,
+        _ => {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "unknown basic type: {}",
+                basic_type
+            )))
+        }
+    };
+    Ok(basic_type)
+}
+
+/// Extracts the primitive type from a header byte
+pub(crate) fn get_primitive_type(header: u8) -> Result<VariantPrimitiveType, 
ArrowError> {
+    // See 
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-encoding
+    //// Primitive type is encoded in the last 6 bits of the header byte
+    let primitive_type = (header >> 2) & 0x3F;
+    let primitive_type = match primitive_type {
+        0 => VariantPrimitiveType::Null,
+        1 => VariantPrimitiveType::BooleanTrue,
+        2 => VariantPrimitiveType::BooleanFalse,
+        3 => VariantPrimitiveType::Int8,
+        // TODO: Add 'legs' for the rest, once API is agreed upon
+        16 => VariantPrimitiveType::String,
+        _ => {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "unknown primitive type: {}",
+                primitive_type
+            )))
+        }
+    };
+    Ok(primitive_type)
+}
+
+/// Extracts the variant type from the value section of a variant. The variant
+/// type is defined as the set of all basic types and all primitive types.
+pub fn get_variant_type(value: &[u8]) -> Result<VariantType, ArrowError> {
+    if value.is_empty() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Tried to get variant type from empty buffer array".to_string(),
+        ));
+    }
+    let header = value[0];
+    let variant_type = match get_basic_type(header)? {
+        VariantBasicType::Primitive => match get_primitive_type(header)? {
+            VariantPrimitiveType::Null => VariantType::Null,
+            VariantPrimitiveType::Int8 => VariantType::Int8,
+            VariantPrimitiveType::BooleanTrue => VariantType::BooleanTrue,
+            VariantPrimitiveType::BooleanFalse => VariantType::BooleanFalse,
+            // TODO: Add 'legs' for the rest, once API is agreed upon
+            VariantPrimitiveType::String => VariantType::String,
+        },
+        VariantBasicType::ShortString => VariantType::ShortString,
+        VariantBasicType::Object => VariantType::Object,
+        VariantBasicType::Array => VariantType::Array,
+    };
+    Ok(variant_type)
+}
+
+/// To be used in `map_err` when unpacking an integer from a slice of bytes.
+fn map_try_from_slice_error(e: TryFromSliceError) -> ArrowError {
+    ArrowError::InvalidArgumentError(e.to_string())
+}
+
+/// Constructs the error message for an invalid UTF-8 string.
+fn invalid_utf8_err() -> ArrowError {
+    ArrowError::InvalidArgumentError("invalid UTF-8 string".to_string())
+}
+
+/// Decodes an Int8 from the value section of a variant.
+pub(crate) fn decode_int8(value: &[u8]) -> Result<i8, ArrowError> {
+    if value.is_empty() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Got empty value buffer so can't decode into int8.".to_string(),
+        ));
+    }
+    let value = i8::from_le_bytes([value[1]]);
+    Ok(value)
+}
+
+/// Decodes a long string from the value section of a variant.
+pub(crate) fn decode_long_string(value: &[u8]) -> Result<&str, ArrowError> {
+    if value.len() < 5 {
+        return Err(ArrowError::InvalidArgumentError(
+            "Tried to decode value buffer into long_string, but it's too short 
(len<5)."
+                .to_string(),
+        ));

Review Comment:
   Hmm, `value[1..=4]` panics if the slice is too short? I wonder if we need 
some helper functions to make our lives easier?
   ```rust
   fn slice_from_slice(bytes: &[u8], range: Range<usize>) -> Result<&[u8], 
ArrowError> {
       bytes.get(range).ok_or_else(|| ArrowError::InvalidArgumentError(
           "Tried to extract {} bytes at offset {} from {}-byte buffer",
           range.end - range.start, range.start, bytes.len(),
       ))
   }
   fn array_from_slice<const N: usize>(bytes: &[u8], offset: usize) -> 
Result<[u8; N], ArrowError> {
       let bytes = slice_from_slice(bytes, offset..offset+N)?;
       bytes.try_into().map_err(map_try_from_slice_error)
   }
   ```
   and then this code could just do:
   ```rust
   let len = u32::from_le_bytes(array_from_slice(value, 1)?) as usize;
   let string_bytes = slice_from_slice(value, 5..5 + len)?;
   ```
   and e.g. `decode_int8` above simplifies to:
   ```rust
   Ok(i8::from_le_bytes(array_from_slice(value, 1)?))
   ```



##########
parquet-variant/src/variant.rs:
##########
@@ -0,0 +1,321 @@
+use std::{borrow::Cow, ops::Index};
+
+use crate::decoder::{self, get_variant_type};
+use arrow_schema::ArrowError;
+use strum_macros::EnumDiscriminants;
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+/// Encodes the Variant Metadata, see the Variant spec file for more 
information
+pub struct VariantMetadata<'m> {
+    bytes: &'m [u8],
+}
+
+impl<'m> VariantMetadata<'m> {
+    /// View the raw bytes (needed by very low-level decoders)
+    #[inline]
+    pub const fn as_bytes(&self) -> &'m [u8] {
+        self.bytes
+    }
+
+    /// Whether the dictionary keys are sorted and unique
+    pub fn is_sorted(&self) -> bool {
+        todo!()
+    }
+
+    /// Get the dict length
+    pub fn dict_len(&self) -> Result<usize, ArrowError> {
+        let end_location = self.offset_size()? as usize + 1;
+        if self.bytes.len() < end_location {
+            let err_str = format!(
+                "Invalid metadata bytes, must have at least length {} but has 
length {}",
+                &end_location,
+                self.bytes.len()
+            );
+            return Err(ArrowError::InvalidArgumentError(err_str));
+        }
+        let dict_len_bytes = &self.bytes[1..end_location];
+        let dict_len = 
usize::from_le_bytes(dict_len_bytes.try_into().map_err(|e| {
+            ArrowError::InvalidArgumentError(format!(
+                "Unable to convert dictionary_size bytes into usize: {}",
+                e,
+            ))
+        })?);
+        Ok(dict_len)
+    }
+    pub fn version(&self) -> usize {
+        todo!()
+    }
+
+    /// Get the offset by index
+    pub fn get_offset_by(&self, index: usize) -> Result<usize, ArrowError> {
+        todo!()
+    }
+
+    /// Get the header byte, which has the following form
+    ///              7     6  5   4  3             0
+    ///             +-------+---+---+---------------+
+    /// header      |       |   |   |    version    |
+    ///             +-------+---+---+---------------+
+    ///                 ^         ^
+    ///                 |         +-- sorted_strings
+    ///                 +-- offset_size_minus_one
+    /// The version is a 4-bit value that must always contain the value 1.
+    /// - sorted_strings is a 1-bit value indicating whether dictionary 
strings are sorted and unique.
+    /// - offset_size_minus_one is a 2-bit value providing the number of bytes 
per dictionary size and offset field.
+    /// - The actual number of bytes, offset_size, is offset_size_minus_one + 1
+    pub fn header(&self) -> Result<u8, ArrowError> {
+        if self.bytes.is_empty() {
+            return Err(ArrowError::InvalidArgumentError(
+                "Can't get header from empty buffer".to_string(),
+            ));
+        }
+        Ok(self.bytes[0])
+    }
+
+    /// Get the offset_minus_one value from the header
+    pub fn offset_size_minus_one(&self) -> Result<u8, ArrowError> {
+        if self.bytes.is_empty() {
+            Err(ArrowError::InvalidArgumentError(
+                "Tried to get offset_size_minus_one from header, but 
self.bytes buffer is emtpy."
+                    .to_string(),
+            ))
+        } else {
+            Ok(self.bytes[0] & (0b11 << 6)) // Grab the last 2 bits
+        }
+    }
+
+    /// Get the offset_size
+    pub fn offset_size(&self) -> Result<u8, ArrowError> {
+        Ok(self.offset_size_minus_one()? + 1)
+    }
+
+    /// Get the offsets as an iterator
+    // TODO: Do we want this kind of API?
+    // TODO: Test once API is agreed upon
+    pub fn offsets(&'m self) -> Result<impl Iterator<Item = (usize, usize)> + 
'm, ArrowError> {
+        struct OffsetIterators<'m> {
+            buffer: &'m [u8],
+            total: usize,
+            seen: usize,
+            offset_size: usize,
+        }
+        impl<'m> Iterator for OffsetIterators<'m> {
+            type Item = (usize, usize); // (start, end) positions of the bytes
+
+            // TODO: Check bounds here or ensure they're correct
+
+            fn next(&mut self) -> Option<Self::Item> {
+                // +1 to skip the first offset
+                if self.seen < self.total {
+                    let start = usize::from_le_bytes(
+                        self.buffer[(self.seen ) * self.offset_size + 1 // +1 
to skip header
+                            ..(self.seen ) * self.offset_size + 1]
+                            .try_into()
+                            .ok()?,
+                    );
+                    self.seen += 1;
+                    let end = usize::from_le_bytes(
+                        self.buffer[(self.seen ) * self.offset_size + 1 // +1 
to skip header
+                            ..(self.seen ) * self.offset_size + 1]
+                            .try_into()
+                            .ok()?,
+                    );
+
+                    Some((start, end))
+                } else {
+                    None
+                }
+            }
+        }
+        let iterator: OffsetIterators = OffsetIterators {
+            buffer: self.bytes,
+            total: self.dict_len()?,
+            seen: 0,
+            offset_size: self.offset_size()? as usize,
+        };
+        Ok(iterator)
+    }
+    /// Get the key-name-bytes by index
+    pub fn get_by(&self, index: usize) -> Result<&'m str, ArrowError> {
+        todo!()
+    }
+    /// Get all key-names as an Iterator of strings
+    // TODO: Result
+    pub fn fields(&self) -> impl Iterator<Item = &'m str> {
+        // Do the same as for offsets
+        todo!();
+        vec![].into_iter()
+    }
+}
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+pub struct VariantObject<'m, 'v> {
+    pub metadata: VariantMetadata<'m>,
+    pub value: &'v [u8],
+}
+
+impl<'m, 'v> VariantObject<'m, 'v> {
+    pub fn fields(&self) -> Result<impl Iterator<Item = (&'m str, Variant<'m, 
'v>)>, ArrowError> {
+        todo!();
+        #[allow(unreachable_code)] // Just to infer the return type
+        Ok(vec![].into_iter())
+    }
+
+    pub fn field(&self, _name: &'m str) -> Result<Variant<'m, 'v>, ArrowError> 
{
+        todo!()
+    }
+}
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+pub struct VariantArray<'m, 'v> {
+    pub metadata: VariantMetadata<'m>,
+    pub value: &'v [u8],
+}
+
+// TODO: Let's agree on the API here, also should we expose a way to get the 
values as a vec of
+// variants for those who want it? Would require allocations.
+impl<'m, 'v> VariantArray<'m, 'v> {
+    pub fn len(&self) -> usize {
+        todo!()
+    }
+
+    pub fn values(&self) -> Result<impl Iterator<Item = Variant<'m, 'v>>, 
ArrowError> {
+        todo!();
+        #[allow(unreachable_code)] // Just to infer the return type
+        Ok(vec![].into_iter())
+    }
+}
+
+impl<'m, 'v> Index<usize> for VariantArray<'m, 'v> {
+    type Output = Variant<'m, 'v>;
+
+    fn index(&self, _index: usize) -> &Self::Output {
+        todo!()
+    }
+}
+
+/// Variant value. May contain references to metadata and value
+// TODO: Add copy if no Cow on String and Shortstring?
+#[derive(Clone, Debug, PartialEq, EnumDiscriminants)]
+#[strum_discriminants(name(VariantType))]
+pub enum Variant<'m, 'v> {
+    // TODO: Add 'legs' for the rest of the primitive types, once API is 
agreed upon
+    Null,
+    Int8(i8),
+
+    BooleanTrue,
+    BooleanFalse,
+
+    // only need the *value* buffer
+    // TODO: Do we want Cow<'v, str> over &'v str? It enables From<String> - 
discuss on PR
+    String(Cow<'v, str>),
+    ShortString(Cow<'v, str>),
+
+    // need both metadata & value
+    Object(VariantObject<'m, 'v>),
+    Array(VariantArray<'m, 'v>),
+}
+
+impl<'m, 'v> Variant<'m, 'v> {
+    /// Parse the buffers and return the appropriate variant.
+    pub fn try_new(metadata: &'m [u8], value: &'v [u8]) -> Result<Self, 
ArrowError> {
+        Ok(match get_variant_type(value)? {
+            VariantType::Null => Variant::Null,
+            VariantType::BooleanTrue => Variant::BooleanTrue,
+            VariantType::BooleanFalse => Variant::BooleanFalse,
+
+            VariantType::Int8 => Variant::Int8(decoder::decode_int8(value)?),
+
+            // TODO: Add 'legs' for the rest of the primitive types, once API 
is agreed upon
+            VariantType::String => {
+                
Variant::String(Cow::Borrowed(decoder::decode_long_string(value)?))
+            }
+
+            VariantType::ShortString => {
+                
Variant::ShortString(Cow::Borrowed(decoder::decode_short_string(value)?))
+            }
+
+            VariantType::Object => Variant::Object(VariantObject {

Review Comment:
   Probably? `VariantObject::try_new` and `VariantArray::try_new` should 
probably do basic header and offset monotonicity validation, similar to what I 
suggested for `VariantMetadata::try_new` (see comment above)? 



##########
parquet-variant/src/variant.rs:
##########
@@ -0,0 +1,321 @@
+use std::{borrow::Cow, ops::Index};
+
+use crate::decoder::{self, get_variant_type};
+use arrow_schema::ArrowError;
+use strum_macros::EnumDiscriminants;
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+/// Encodes the Variant Metadata, see the Variant spec file for more 
information
+pub struct VariantMetadata<'m> {
+    bytes: &'m [u8],
+}
+
+impl<'m> VariantMetadata<'m> {
+    /// View the raw bytes (needed by very low-level decoders)
+    #[inline]
+    pub const fn as_bytes(&self) -> &'m [u8] {
+        self.bytes
+    }
+
+    /// Whether the dictionary keys are sorted and unique
+    pub fn is_sorted(&self) -> bool {
+        todo!()
+    }
+
+    /// Get the dict length
+    pub fn dict_len(&self) -> Result<usize, ArrowError> {
+        let end_location = self.offset_size()? as usize + 1;
+        if self.bytes.len() < end_location {
+            let err_str = format!(
+                "Invalid metadata bytes, must have at least length {} but has 
length {}",
+                &end_location,
+                self.bytes.len()
+            );
+            return Err(ArrowError::InvalidArgumentError(err_str));
+        }
+        let dict_len_bytes = &self.bytes[1..end_location];
+        let dict_len = 
usize::from_le_bytes(dict_len_bytes.try_into().map_err(|e| {
+            ArrowError::InvalidArgumentError(format!(
+                "Unable to convert dictionary_size bytes into usize: {}",
+                e,
+            ))
+        })?);
+        Ok(dict_len)
+    }
+    pub fn version(&self) -> usize {
+        todo!()
+    }
+
+    /// Get the offset by index
+    pub fn get_offset_by(&self, index: usize) -> Result<usize, ArrowError> {
+        todo!()
+    }
+
+    /// Get the header byte, which has the following form
+    ///              7     6  5   4  3             0
+    ///             +-------+---+---+---------------+
+    /// header      |       |   |   |    version    |
+    ///             +-------+---+---+---------------+
+    ///                 ^         ^
+    ///                 |         +-- sorted_strings
+    ///                 +-- offset_size_minus_one
+    /// The version is a 4-bit value that must always contain the value 1.
+    /// - sorted_strings is a 1-bit value indicating whether dictionary 
strings are sorted and unique.
+    /// - offset_size_minus_one is a 2-bit value providing the number of bytes 
per dictionary size and offset field.
+    /// - The actual number of bytes, offset_size, is offset_size_minus_one + 1
+    pub fn header(&self) -> Result<u8, ArrowError> {
+        if self.bytes.is_empty() {
+            return Err(ArrowError::InvalidArgumentError(
+                "Can't get header from empty buffer".to_string(),
+            ));
+        }
+        Ok(self.bytes[0])
+    }
+
+    /// Get the offset_minus_one value from the header
+    pub fn offset_size_minus_one(&self) -> Result<u8, ArrowError> {
+        if self.bytes.is_empty() {
+            Err(ArrowError::InvalidArgumentError(
+                "Tried to get offset_size_minus_one from header, but 
self.bytes buffer is emtpy."
+                    .to_string(),
+            ))
+        } else {
+            Ok(self.bytes[0] & (0b11 << 6)) // Grab the last 2 bits
+        }
+    }
+
+    /// Get the offset_size
+    pub fn offset_size(&self) -> Result<u8, ArrowError> {
+        Ok(self.offset_size_minus_one()? + 1)
+    }
+
+    /// Get the offsets as an iterator
+    // TODO: Do we want this kind of API?
+    // TODO: Test once API is agreed upon
+    pub fn offsets(&'m self) -> Result<impl Iterator<Item = (usize, usize)> + 
'm, ArrowError> {
+        struct OffsetIterators<'m> {
+            buffer: &'m [u8],
+            total: usize,
+            seen: usize,
+            offset_size: usize,
+        }
+        impl<'m> Iterator for OffsetIterators<'m> {
+            type Item = (usize, usize); // (start, end) positions of the bytes
+
+            // TODO: Check bounds here or ensure they're correct
+
+            fn next(&mut self) -> Option<Self::Item> {
+                // +1 to skip the first offset
+                if self.seen < self.total {
+                    let start = usize::from_le_bytes(
+                        self.buffer[(self.seen ) * self.offset_size + 1 // +1 
to skip header
+                            ..(self.seen ) * self.offset_size + 1]
+                            .try_into()
+                            .ok()?,

Review Comment:
   I don't think the offsets are _ever_ valid usizes? They occupy 1, 2, 3, or 4 
bytes, depending on the [metadata 
header's](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#metadata-encoding)
 (2-bit) `offset_size_minus_one` field? We'll need to extract the correct 
number of bytes, pad it out to u32 (or usize) and then `try_into`+ 
`from_le_bytes`?
   
   Also, I think this code misses length checks needed for those subslice 
ranges to not panic?
   Or did I miss some earlier global check that covers the whole iteration?



##########
parquet-variant/src/variant.rs:
##########
@@ -0,0 +1,321 @@
+use std::{borrow::Cow, ops::Index};
+
+use crate::decoder::{self, get_variant_type};
+use arrow_schema::ArrowError;
+use strum_macros::EnumDiscriminants;
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+/// Encodes the Variant Metadata, see the Variant spec file for more 
information
+pub struct VariantMetadata<'m> {
+    bytes: &'m [u8],
+}
+
+impl<'m> VariantMetadata<'m> {
+    /// View the raw bytes (needed by very low-level decoders)
+    #[inline]
+    pub const fn as_bytes(&self) -> &'m [u8] {
+        self.bytes
+    }
+
+    /// Whether the dictionary keys are sorted and unique
+    pub fn is_sorted(&self) -> bool {
+        todo!()
+    }
+
+    /// Get the dict length
+    pub fn dict_len(&self) -> Result<usize, ArrowError> {
+        let end_location = self.offset_size()? as usize + 1;
+        if self.bytes.len() < end_location {
+            let err_str = format!(
+                "Invalid metadata bytes, must have at least length {} but has 
length {}",
+                &end_location,
+                self.bytes.len()
+            );
+            return Err(ArrowError::InvalidArgumentError(err_str));
+        }
+        let dict_len_bytes = &self.bytes[1..end_location];
+        let dict_len = 
usize::from_le_bytes(dict_len_bytes.try_into().map_err(|e| {
+            ArrowError::InvalidArgumentError(format!(
+                "Unable to convert dictionary_size bytes into usize: {}",
+                e,
+            ))
+        })?);
+        Ok(dict_len)
+    }
+    pub fn version(&self) -> usize {
+        todo!()
+    }
+
+    /// Get the offset by index
+    pub fn get_offset_by(&self, index: usize) -> Result<usize, ArrowError> {
+        todo!()
+    }
+
+    /// Get the header byte, which has the following form
+    ///              7     6  5   4  3             0
+    ///             +-------+---+---+---------------+
+    /// header      |       |   |   |    version    |
+    ///             +-------+---+---+---------------+
+    ///                 ^         ^
+    ///                 |         +-- sorted_strings
+    ///                 +-- offset_size_minus_one
+    /// The version is a 4-bit value that must always contain the value 1.
+    /// - sorted_strings is a 1-bit value indicating whether dictionary 
strings are sorted and unique.
+    /// - offset_size_minus_one is a 2-bit value providing the number of bytes 
per dictionary size and offset field.
+    /// - The actual number of bytes, offset_size, is offset_size_minus_one + 1
+    pub fn header(&self) -> Result<u8, ArrowError> {
+        if self.bytes.is_empty() {
+            return Err(ArrowError::InvalidArgumentError(
+                "Can't get header from empty buffer".to_string(),
+            ));
+        }
+        Ok(self.bytes[0])
+    }
+
+    /// Get the offset_minus_one value from the header
+    pub fn offset_size_minus_one(&self) -> Result<u8, ArrowError> {
+        if self.bytes.is_empty() {
+            Err(ArrowError::InvalidArgumentError(
+                "Tried to get offset_size_minus_one from header, but 
self.bytes buffer is emtpy."
+                    .to_string(),
+            ))
+        } else {
+            Ok(self.bytes[0] & (0b11 << 6)) // Grab the last 2 bits
+        }
+    }
+
+    /// Get the offset_size
+    pub fn offset_size(&self) -> Result<u8, ArrowError> {
+        Ok(self.offset_size_minus_one()? + 1)
+    }
+
+    /// Get the offsets as an iterator
+    // TODO: Do we want this kind of API?
+    // TODO: Test once API is agreed upon
+    pub fn offsets(&'m self) -> Result<impl Iterator<Item = (usize, usize)> + 
'm, ArrowError> {
+        struct OffsetIterators<'m> {
+            buffer: &'m [u8],
+            total: usize,
+            seen: usize,
+            offset_size: usize,
+        }
+        impl<'m> Iterator for OffsetIterators<'m> {
+            type Item = (usize, usize); // (start, end) positions of the bytes
+
+            // TODO: Check bounds here or ensure they're correct
+
+            fn next(&mut self) -> Option<Self::Item> {
+                // +1 to skip the first offset
+                if self.seen < self.total {
+                    let start = usize::from_le_bytes(
+                        self.buffer[(self.seen ) * self.offset_size + 1 // +1 
to skip header
+                            ..(self.seen ) * self.offset_size + 1]
+                            .try_into()
+                            .ok()?,
+                    );
+                    self.seen += 1;
+                    let end = usize::from_le_bytes(
+                        self.buffer[(self.seen ) * self.offset_size + 1 // +1 
to skip header
+                            ..(self.seen ) * self.offset_size + 1]
+                            .try_into()
+                            .ok()?,
+                    );
+
+                    Some((start, end))
+                } else {
+                    None
+                }
+            }
+        }
+        let iterator: OffsetIterators = OffsetIterators {
+            buffer: self.bytes,
+            total: self.dict_len()?,
+            seen: 0,
+            offset_size: self.offset_size()? as usize,
+        };
+        Ok(iterator)
+    }
+    /// Get the key-name-bytes by index
+    pub fn get_by(&self, index: usize) -> Result<&'m str, ArrowError> {
+        todo!()
+    }
+    /// Get all key-names as an Iterator of strings
+    // TODO: Result
+    pub fn fields(&self) -> impl Iterator<Item = &'m str> {
+        // Do the same as for offsets
+        todo!();
+        vec![].into_iter()
+    }
+}
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+pub struct VariantObject<'m, 'v> {
+    pub metadata: VariantMetadata<'m>,
+    pub value: &'v [u8],
+}
+
+impl<'m, 'v> VariantObject<'m, 'v> {
+    pub fn fields(&self) -> Result<impl Iterator<Item = (&'m str, Variant<'m, 
'v>)>, ArrowError> {
+        todo!();
+        #[allow(unreachable_code)] // Just to infer the return type
+        Ok(vec![].into_iter())
+    }
+
+    pub fn field(&self, _name: &'m str) -> Result<Variant<'m, 'v>, ArrowError> 
{
+        todo!()
+    }
+}
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+pub struct VariantArray<'m, 'v> {
+    pub metadata: VariantMetadata<'m>,
+    pub value: &'v [u8],
+}
+
+// TODO: Let's agree on the API here, also should we expose a way to get the 
values as a vec of
+// variants for those who want it? Would require allocations.
+impl<'m, 'v> VariantArray<'m, 'v> {
+    pub fn len(&self) -> usize {
+        todo!()
+    }
+
+    pub fn values(&self) -> Result<impl Iterator<Item = Variant<'m, 'v>>, 
ArrowError> {
+        todo!();
+        #[allow(unreachable_code)] // Just to infer the return type
+        Ok(vec![].into_iter())
+    }
+}
+
+impl<'m, 'v> Index<usize> for VariantArray<'m, 'v> {
+    type Output = Variant<'m, 'v>;
+
+    fn index(&self, _index: usize) -> &Self::Output {
+        todo!()
+    }
+}
+
+/// Variant value. May contain references to metadata and value
+// TODO: Add copy if no Cow on String and Shortstring?
+#[derive(Clone, Debug, PartialEq, EnumDiscriminants)]
+#[strum_discriminants(name(VariantType))]
+pub enum Variant<'m, 'v> {
+    // TODO: Add 'legs' for the rest of the primitive types, once API is 
agreed upon
+    Null,
+    Int8(i8),
+
+    BooleanTrue,
+    BooleanFalse,
+
+    // only need the *value* buffer
+    // TODO: Do we want Cow<'v, str> over &'v str? It enables From<String> - 
discuss on PR

Review Comment:
   Seems like we should take `From<&'v str>` instead -- it doesn't take 
ownership and basically treats the string like a value byte array? But I don't 
know how useful that would be in practice, likely depends on how the builder 
API shapes up?



##########
parquet-variant/src/decoder.rs:
##########
@@ -0,0 +1,199 @@
+// NOTE: Largely based on the implementation of @PinkCrow007 in 
https://github.com/apache/arrow-rs/pull/7452
+// And the feedback there.
+use crate::variant::VariantType;
+use arrow_schema::ArrowError;
+use std::{array::TryFromSliceError, str};
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantBasicType {
+    Primitive = 0,
+    ShortString = 1,
+    Object = 2,
+    Array = 3,
+}
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantPrimitiveType {
+    Null = 0,
+    BooleanTrue = 1,
+    BooleanFalse = 2,
+    Int8 = 3,
+    // TODO: Add 'legs' for the rest of primitives, once API is agreed upon
+    String = 16,
+}
+
+/// Extracts the basic type from a header byte
+pub(crate) fn get_basic_type(header: u8) -> Result<VariantBasicType, 
ArrowError> {
+    // See 
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-encoding
+    let basic_type = header & 0x03; // Basic type is encoded in the first 2 
bits
+    let basic_type = match basic_type {
+        0 => VariantBasicType::Primitive,
+        1 => VariantBasicType::ShortString,
+        2 => VariantBasicType::Object,
+        3 => VariantBasicType::Array,
+        _ => {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "unknown basic type: {}",
+                basic_type
+            )))
+        }
+    };
+    Ok(basic_type)
+}
+
+/// Extracts the primitive type from a header byte
+pub(crate) fn get_primitive_type(header: u8) -> Result<VariantPrimitiveType, 
ArrowError> {
+    // See 
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-encoding
+    //// Primitive type is encoded in the last 6 bits of the header byte
+    let primitive_type = (header >> 2) & 0x3F;
+    let primitive_type = match primitive_type {
+        0 => VariantPrimitiveType::Null,
+        1 => VariantPrimitiveType::BooleanTrue,
+        2 => VariantPrimitiveType::BooleanFalse,
+        3 => VariantPrimitiveType::Int8,

Review Comment:
   aside: This is an annoying duplication of the enum values... but doing it 
"right" would be pretty icky:
   ```suggestion
       use VariantPrimitiveType::*;
       let primitive_type = match primitive_type {
           t if t == Null as _ => Null,
           t if t == BooleanTrue as _ => BooleanTrue,
           t if t == BooleanFalse as _ => BooleanFalse,
           t if t == Int8 as _ => Int8,
   ```



##########
parquet-variant/src/decoder.rs:
##########
@@ -0,0 +1,199 @@
+// NOTE: Largely based on the implementation of @PinkCrow007 in 
https://github.com/apache/arrow-rs/pull/7452
+// And the feedback there.
+use crate::variant::VariantType;
+use arrow_schema::ArrowError;
+use std::{array::TryFromSliceError, str};
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantBasicType {
+    Primitive = 0,
+    ShortString = 1,
+    Object = 2,
+    Array = 3,
+}
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantPrimitiveType {
+    Null = 0,
+    BooleanTrue = 1,
+    BooleanFalse = 2,

Review Comment:
   out of curiosity, why not just `True` and `False`?



##########
parquet-variant/src/decoder.rs:
##########
@@ -0,0 +1,199 @@
+// NOTE: Largely based on the implementation of @PinkCrow007 in 
https://github.com/apache/arrow-rs/pull/7452
+// And the feedback there.
+use crate::variant::VariantType;
+use arrow_schema::ArrowError;
+use std::{array::TryFromSliceError, str};
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantBasicType {
+    Primitive = 0,
+    ShortString = 1,
+    Object = 2,
+    Array = 3,
+}
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantPrimitiveType {
+    Null = 0,
+    BooleanTrue = 1,
+    BooleanFalse = 2,
+    Int8 = 3,
+    // TODO: Add 'legs' for the rest of primitives, once API is agreed upon

Review Comment:
   aside: Heh... talking about (rust) enum variants for (parquet) variant type 
can quickly become confusing...
   



##########
parquet-variant/src/decoder.rs:
##########
@@ -0,0 +1,199 @@
+// NOTE: Largely based on the implementation of @PinkCrow007 in 
https://github.com/apache/arrow-rs/pull/7452
+// And the feedback there.
+use crate::variant::VariantType;
+use arrow_schema::ArrowError;
+use std::{array::TryFromSliceError, str};
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantBasicType {
+    Primitive = 0,
+    ShortString = 1,
+    Object = 2,
+    Array = 3,
+}
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantPrimitiveType {
+    Null = 0,
+    BooleanTrue = 1,
+    BooleanFalse = 2,
+    Int8 = 3,
+    // TODO: Add 'legs' for the rest of primitives, once API is agreed upon
+    String = 16,
+}
+
+/// Extracts the basic type from a header byte
+pub(crate) fn get_basic_type(header: u8) -> Result<VariantBasicType, 
ArrowError> {
+    // See 
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-encoding
+    let basic_type = header & 0x03; // Basic type is encoded in the first 2 
bits
+    let basic_type = match basic_type {
+        0 => VariantBasicType::Primitive,
+        1 => VariantBasicType::ShortString,
+        2 => VariantBasicType::Object,
+        3 => VariantBasicType::Array,
+        _ => {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "unknown basic type: {}",
+                basic_type
+            )))
+        }
+    };
+    Ok(basic_type)
+}
+
+/// Extracts the primitive type from a header byte
+pub(crate) fn get_primitive_type(header: u8) -> Result<VariantPrimitiveType, 
ArrowError> {
+    // See 
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-encoding
+    //// Primitive type is encoded in the last 6 bits of the header byte
+    let primitive_type = (header >> 2) & 0x3F;
+    let primitive_type = match primitive_type {
+        0 => VariantPrimitiveType::Null,
+        1 => VariantPrimitiveType::BooleanTrue,
+        2 => VariantPrimitiveType::BooleanFalse,
+        3 => VariantPrimitiveType::Int8,
+        // TODO: Add 'legs' for the rest, once API is agreed upon
+        16 => VariantPrimitiveType::String,
+        _ => {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "unknown primitive type: {}",
+                primitive_type
+            )))
+        }
+    };
+    Ok(primitive_type)
+}
+
+/// Extracts the variant type from the value section of a variant. The variant
+/// type is defined as the set of all basic types and all primitive types.
+pub fn get_variant_type(value: &[u8]) -> Result<VariantType, ArrowError> {
+    if value.is_empty() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Tried to get variant type from empty buffer array".to_string(),
+        ));
+    }
+    let header = value[0];
+    let variant_type = match get_basic_type(header)? {
+        VariantBasicType::Primitive => match get_primitive_type(header)? {
+            VariantPrimitiveType::Null => VariantType::Null,

Review Comment:
   Am I correct in understanding that `VariantPrimitiveType` and 
`VariantBasicType` map directly to the parquet spec, while `VariantType` 
"flattens" the two concepts together for everyone's convenience?



##########
parquet-variant/src/decoder.rs:
##########
@@ -0,0 +1,199 @@
+// NOTE: Largely based on the implementation of @PinkCrow007 in 
https://github.com/apache/arrow-rs/pull/7452
+// And the feedback there.
+use crate::variant::VariantType;
+use arrow_schema::ArrowError;
+use std::{array::TryFromSliceError, str};
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantBasicType {
+    Primitive = 0,
+    ShortString = 1,
+    Object = 2,
+    Array = 3,
+}
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantPrimitiveType {
+    Null = 0,
+    BooleanTrue = 1,
+    BooleanFalse = 2,
+    Int8 = 3,
+    // TODO: Add 'legs' for the rest of primitives, once API is agreed upon
+    String = 16,
+}
+
+/// Extracts the basic type from a header byte
+pub(crate) fn get_basic_type(header: u8) -> Result<VariantBasicType, 
ArrowError> {
+    // See 
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-encoding
+    let basic_type = header & 0x03; // Basic type is encoded in the first 2 
bits
+    let basic_type = match basic_type {
+        0 => VariantBasicType::Primitive,
+        1 => VariantBasicType::ShortString,
+        2 => VariantBasicType::Object,
+        3 => VariantBasicType::Array,
+        _ => {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "unknown basic type: {}",
+                basic_type
+            )))
+        }
+    };
+    Ok(basic_type)
+}
+
+/// Extracts the primitive type from a header byte
+pub(crate) fn get_primitive_type(header: u8) -> Result<VariantPrimitiveType, 
ArrowError> {
+    // See 
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-encoding
+    //// Primitive type is encoded in the last 6 bits of the header byte
+    let primitive_type = (header >> 2) & 0x3F;
+    let primitive_type = match primitive_type {
+        0 => VariantPrimitiveType::Null,
+        1 => VariantPrimitiveType::BooleanTrue,
+        2 => VariantPrimitiveType::BooleanFalse,
+        3 => VariantPrimitiveType::Int8,
+        // TODO: Add 'legs' for the rest, once API is agreed upon
+        16 => VariantPrimitiveType::String,
+        _ => {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "unknown primitive type: {}",
+                primitive_type
+            )))
+        }
+    };
+    Ok(primitive_type)
+}
+
+/// Extracts the variant type from the value section of a variant. The variant
+/// type is defined as the set of all basic types and all primitive types.
+pub fn get_variant_type(value: &[u8]) -> Result<VariantType, ArrowError> {
+    if value.is_empty() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Tried to get variant type from empty buffer array".to_string(),
+        ));
+    }
+    let header = value[0];
+    let variant_type = match get_basic_type(header)? {
+        VariantBasicType::Primitive => match get_primitive_type(header)? {
+            VariantPrimitiveType::Null => VariantType::Null,
+            VariantPrimitiveType::Int8 => VariantType::Int8,
+            VariantPrimitiveType::BooleanTrue => VariantType::BooleanTrue,
+            VariantPrimitiveType::BooleanFalse => VariantType::BooleanFalse,
+            // TODO: Add 'legs' for the rest, once API is agreed upon
+            VariantPrimitiveType::String => VariantType::String,
+        },
+        VariantBasicType::ShortString => VariantType::ShortString,
+        VariantBasicType::Object => VariantType::Object,
+        VariantBasicType::Array => VariantType::Array,
+    };
+    Ok(variant_type)
+}
+
+/// To be used in `map_err` when unpacking an integer from a slice of bytes.
+fn map_try_from_slice_error(e: TryFromSliceError) -> ArrowError {
+    ArrowError::InvalidArgumentError(e.to_string())
+}
+
+/// Constructs the error message for an invalid UTF-8 string.
+fn invalid_utf8_err() -> ArrowError {
+    ArrowError::InvalidArgumentError("invalid UTF-8 string".to_string())
+}
+
+/// Decodes an Int8 from the value section of a variant.
+pub(crate) fn decode_int8(value: &[u8]) -> Result<i8, ArrowError> {
+    if value.is_empty() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Got empty value buffer so can't decode into int8.".to_string(),
+        ));
+    }
+    let value = i8::from_le_bytes([value[1]]);
+    Ok(value)
+}
+
+/// Decodes a long string from the value section of a variant.
+pub(crate) fn decode_long_string(value: &[u8]) -> Result<&str, ArrowError> {
+    if value.len() < 5 {
+        return Err(ArrowError::InvalidArgumentError(
+            "Tried to decode value buffer into long_string, but it's too short 
(len<5)."
+                .to_string(),
+        ));
+    }
+    let len =
+        
u32::from_le_bytes(value[1..=4].try_into().map_err(map_try_from_slice_error)?) 
as usize;

Review Comment:
   Casting is dangerous, we should find a way to avoid it when possible.
   
   For example, there's no `From<u32> for usize` because `usize` is allowed to 
be only 16 bits (e.g. for some embedded architectures). It seems like we would 
want to define a helper trait for such conversions, that is conditioned on the 
sizes being compatible? Or does arrow-rs not pretend to support 16-bit 
architectures in the first place?
   



##########
parquet-variant/src/variant.rs:
##########
@@ -0,0 +1,321 @@
+use std::{borrow::Cow, ops::Index};
+
+use crate::decoder::{self, get_variant_type};
+use arrow_schema::ArrowError;
+use strum_macros::EnumDiscriminants;
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+/// Encodes the Variant Metadata, see the Variant spec file for more 
information

Review Comment:
   Something like this:
   
   <details>
   
   ```rust
   struct VariantMetadataHeader {
       version: u8,
       sorted: bool,
       offset_size_bytes: OffsetSizeBytes,
   }
   
   impl<'m> VariantMetadata<'m> {
       pub fn try_new(bytes: &'m [u8]) -> Result<Self, ArrowError> {
           let Some(header) = bytes.get(0) else {
               return Err(...);
           }
           let header = VariantMetadataHeader::try_new(bytes)?;
           let dictionary_size = header.offset_size_bytes.unpack_usize(bytes, 
1)?;
           
           // verify the offsets are monotonically increasing
           let valid = (0..=dictionary_size)
               .map(|i| header.offset_size_bytes.unpack_usize(bytes, 1, i)?)
               .scan(0, |prev, cur| cur.is_ok_and(|i| prev <= i))
               .all(|valid| valid);
           if !valid {
               return Err(...);
           }
           Ok(Self { ... })
       }
       
       pub fn get(i: usize) -> Result<&'m str, ArrowError> {
           // TODO: Should we memoize the offsets? There could be thousands of 
them...
           let start = self.header.offset_size_bytes.unpack_usize(self.bytes, 
self.first_offset, i)?;
           let end = self.header.offset_size_bytes.unpack_usize(self.bytes, 
self.first_offset, i+1)?;
           let bytes = slice_from_slice(self.value_bytes, 
self.first_value+start..self.first_value+end)?;
           Ok(str::from_utf8(bytes).map_err(|| invalid_utf8_err())?)
       }
   }
   ```
   
   </details>
   
   The important thing, I suspect, is to verify that the offsets are 
monotonically increasing, so we don't risk panics due to a case where end < 
begin. Tho I suppose we could also check that in the individual accessors 
instead.



##########
parquet-variant/src/decoder.rs:
##########
@@ -0,0 +1,199 @@
+// NOTE: Largely based on the implementation of @PinkCrow007 in 
https://github.com/apache/arrow-rs/pull/7452
+// And the feedback there.
+use crate::variant::VariantType;
+use arrow_schema::ArrowError;
+use std::{array::TryFromSliceError, str};
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantBasicType {
+    Primitive = 0,
+    ShortString = 1,
+    Object = 2,
+    Array = 3,
+}
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantPrimitiveType {
+    Null = 0,
+    BooleanTrue = 1,
+    BooleanFalse = 2,
+    Int8 = 3,
+    // TODO: Add 'legs' for the rest of primitives, once API is agreed upon
+    String = 16,
+}
+
+/// Extracts the basic type from a header byte
+pub(crate) fn get_basic_type(header: u8) -> Result<VariantBasicType, 
ArrowError> {
+    // See 
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-encoding
+    let basic_type = header & 0x03; // Basic type is encoded in the first 2 
bits
+    let basic_type = match basic_type {
+        0 => VariantBasicType::Primitive,
+        1 => VariantBasicType::ShortString,
+        2 => VariantBasicType::Object,
+        3 => VariantBasicType::Array,
+        _ => {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "unknown basic type: {}",
+                basic_type
+            )))
+        }
+    };
+    Ok(basic_type)
+}
+
+/// Extracts the primitive type from a header byte
+pub(crate) fn get_primitive_type(header: u8) -> Result<VariantPrimitiveType, 
ArrowError> {
+    // See 
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-encoding
+    //// Primitive type is encoded in the last 6 bits of the header byte
+    let primitive_type = (header >> 2) & 0x3F;
+    let primitive_type = match primitive_type {
+        0 => VariantPrimitiveType::Null,
+        1 => VariantPrimitiveType::BooleanTrue,
+        2 => VariantPrimitiveType::BooleanFalse,
+        3 => VariantPrimitiveType::Int8,
+        // TODO: Add 'legs' for the rest, once API is agreed upon
+        16 => VariantPrimitiveType::String,
+        _ => {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "unknown primitive type: {}",
+                primitive_type
+            )))
+        }
+    };
+    Ok(primitive_type)
+}
+
+/// Extracts the variant type from the value section of a variant. The variant
+/// type is defined as the set of all basic types and all primitive types.
+pub fn get_variant_type(value: &[u8]) -> Result<VariantType, ArrowError> {
+    if value.is_empty() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Tried to get variant type from empty buffer array".to_string(),
+        ));
+    }
+    let header = value[0];
+    let variant_type = match get_basic_type(header)? {
+        VariantBasicType::Primitive => match get_primitive_type(header)? {
+            VariantPrimitiveType::Null => VariantType::Null,
+            VariantPrimitiveType::Int8 => VariantType::Int8,
+            VariantPrimitiveType::BooleanTrue => VariantType::BooleanTrue,
+            VariantPrimitiveType::BooleanFalse => VariantType::BooleanFalse,
+            // TODO: Add 'legs' for the rest, once API is agreed upon
+            VariantPrimitiveType::String => VariantType::String,
+        },
+        VariantBasicType::ShortString => VariantType::ShortString,
+        VariantBasicType::Object => VariantType::Object,
+        VariantBasicType::Array => VariantType::Array,
+    };
+    Ok(variant_type)
+}
+
+/// To be used in `map_err` when unpacking an integer from a slice of bytes.
+fn map_try_from_slice_error(e: TryFromSliceError) -> ArrowError {
+    ArrowError::InvalidArgumentError(e.to_string())
+}
+
+/// Constructs the error message for an invalid UTF-8 string.
+fn invalid_utf8_err() -> ArrowError {
+    ArrowError::InvalidArgumentError("invalid UTF-8 string".to_string())
+}
+
+/// Decodes an Int8 from the value section of a variant.
+pub(crate) fn decode_int8(value: &[u8]) -> Result<i8, ArrowError> {
+    if value.is_empty() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Got empty value buffer so can't decode into int8.".to_string(),
+        ));
+    }
+    let value = i8::from_le_bytes([value[1]]);
+    Ok(value)
+}
+
+/// Decodes a long string from the value section of a variant.
+pub(crate) fn decode_long_string(value: &[u8]) -> Result<&str, ArrowError> {
+    if value.len() < 5 {
+        return Err(ArrowError::InvalidArgumentError(
+            "Tried to decode value buffer into long_string, but it's too short 
(len<5)."
+                .to_string(),
+        ));
+    }
+    let len =
+        
u32::from_le_bytes(value[1..=4].try_into().map_err(map_try_from_slice_error)?) 
as usize;
+    if value.len() < len + 5 {
+        let err_str = format!("The length of the buffer for the long_string is 
soo short, it is {} and it should be at least {} ({} < {} + 5)", value.len(), 
len + 5 , value.len(), len);
+        return Err(ArrowError::InvalidArgumentError(err_str));
+    }
+    let string_bytes = &value[5..5 + len];

Review Comment:
   A general question: Do we expect that variant values can include 
unused/padding bytes? I guess there could be trailing bytes after the whole 
thing, and the object/array `field_offset` array doesn't forbid dead space 
between values. So we probably do need to expressly limit lengths of slices 
everywhere.



##########
parquet-variant/src/decoder.rs:
##########
@@ -0,0 +1,199 @@
+// NOTE: Largely based on the implementation of @PinkCrow007 in 
https://github.com/apache/arrow-rs/pull/7452
+// And the feedback there.
+use crate::variant::VariantType;
+use arrow_schema::ArrowError;
+use std::{array::TryFromSliceError, str};
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantBasicType {
+    Primitive = 0,
+    ShortString = 1,
+    Object = 2,
+    Array = 3,
+}
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantPrimitiveType {
+    Null = 0,
+    BooleanTrue = 1,
+    BooleanFalse = 2,
+    Int8 = 3,
+    // TODO: Add 'legs' for the rest of primitives, once API is agreed upon
+    String = 16,
+}
+
+/// Extracts the basic type from a header byte
+pub(crate) fn get_basic_type(header: u8) -> Result<VariantBasicType, 
ArrowError> {
+    // See 
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-encoding
+    let basic_type = header & 0x03; // Basic type is encoded in the first 2 
bits
+    let basic_type = match basic_type {
+        0 => VariantBasicType::Primitive,
+        1 => VariantBasicType::ShortString,
+        2 => VariantBasicType::Object,
+        3 => VariantBasicType::Array,
+        _ => {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "unknown basic type: {}",
+                basic_type
+            )))
+        }
+    };
+    Ok(basic_type)
+}
+
+/// Extracts the primitive type from a header byte
+pub(crate) fn get_primitive_type(header: u8) -> Result<VariantPrimitiveType, 
ArrowError> {
+    // See 
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-encoding
+    //// Primitive type is encoded in the last 6 bits of the header byte
+    let primitive_type = (header >> 2) & 0x3F;
+    let primitive_type = match primitive_type {
+        0 => VariantPrimitiveType::Null,
+        1 => VariantPrimitiveType::BooleanTrue,
+        2 => VariantPrimitiveType::BooleanFalse,
+        3 => VariantPrimitiveType::Int8,
+        // TODO: Add 'legs' for the rest, once API is agreed upon
+        16 => VariantPrimitiveType::String,
+        _ => {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "unknown primitive type: {}",
+                primitive_type
+            )))
+        }
+    };
+    Ok(primitive_type)
+}
+
+/// Extracts the variant type from the value section of a variant. The variant
+/// type is defined as the set of all basic types and all primitive types.
+pub fn get_variant_type(value: &[u8]) -> Result<VariantType, ArrowError> {
+    if value.is_empty() {
+        return Err(ArrowError::InvalidArgumentError(
+            "Tried to get variant type from empty buffer array".to_string(),
+        ));
+    }
+    let header = value[0];
+    let variant_type = match get_basic_type(header)? {
+        VariantBasicType::Primitive => match get_primitive_type(header)? {
+            VariantPrimitiveType::Null => VariantType::Null,
+            VariantPrimitiveType::Int8 => VariantType::Int8,
+            VariantPrimitiveType::BooleanTrue => VariantType::BooleanTrue,
+            VariantPrimitiveType::BooleanFalse => VariantType::BooleanFalse,
+            // TODO: Add 'legs' for the rest, once API is agreed upon
+            VariantPrimitiveType::String => VariantType::String,
+        },
+        VariantBasicType::ShortString => VariantType::ShortString,
+        VariantBasicType::Object => VariantType::Object,
+        VariantBasicType::Array => VariantType::Array,
+    };
+    Ok(variant_type)
+}
+
+/// To be used in `map_err` when unpacking an integer from a slice of bytes.
+fn map_try_from_slice_error(e: TryFromSliceError) -> ArrowError {
+    ArrowError::InvalidArgumentError(e.to_string())
+}

Review Comment:
   Why not `impl From` so `?` Just Works? Is it because trait definitions are 
global and we don't want this to implicitly become part of the public API?



##########
parquet-variant/src/variant.rs:
##########
@@ -0,0 +1,321 @@
+use std::{borrow::Cow, ops::Index};
+
+use crate::decoder::{self, get_variant_type};
+use arrow_schema::ArrowError;
+use strum_macros::EnumDiscriminants;
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+/// Encodes the Variant Metadata, see the Variant spec file for more 
information
+pub struct VariantMetadata<'m> {
+    bytes: &'m [u8],
+}
+
+impl<'m> VariantMetadata<'m> {
+    /// View the raw bytes (needed by very low-level decoders)
+    #[inline]
+    pub const fn as_bytes(&self) -> &'m [u8] {
+        self.bytes
+    }
+
+    /// Whether the dictionary keys are sorted and unique
+    pub fn is_sorted(&self) -> bool {
+        todo!()
+    }
+
+    /// Get the dict length
+    pub fn dict_len(&self) -> Result<usize, ArrowError> {
+        let end_location = self.offset_size()? as usize + 1;
+        if self.bytes.len() < end_location {
+            let err_str = format!(
+                "Invalid metadata bytes, must have at least length {} but has 
length {}",
+                &end_location,
+                self.bytes.len()
+            );
+            return Err(ArrowError::InvalidArgumentError(err_str));
+        }
+        let dict_len_bytes = &self.bytes[1..end_location];
+        let dict_len = 
usize::from_le_bytes(dict_len_bytes.try_into().map_err(|e| {
+            ArrowError::InvalidArgumentError(format!(
+                "Unable to convert dictionary_size bytes into usize: {}",
+                e,
+            ))
+        })?);
+        Ok(dict_len)
+    }
+    pub fn version(&self) -> usize {
+        todo!()
+    }
+
+    /// Get the offset by index
+    pub fn get_offset_by(&self, index: usize) -> Result<usize, ArrowError> {
+        todo!()
+    }
+
+    /// Get the header byte, which has the following form
+    ///              7     6  5   4  3             0
+    ///             +-------+---+---+---------------+
+    /// header      |       |   |   |    version    |
+    ///             +-------+---+---+---------------+
+    ///                 ^         ^
+    ///                 |         +-- sorted_strings
+    ///                 +-- offset_size_minus_one
+    /// The version is a 4-bit value that must always contain the value 1.
+    /// - sorted_strings is a 1-bit value indicating whether dictionary 
strings are sorted and unique.
+    /// - offset_size_minus_one is a 2-bit value providing the number of bytes 
per dictionary size and offset field.
+    /// - The actual number of bytes, offset_size, is offset_size_minus_one + 1
+    pub fn header(&self) -> Result<u8, ArrowError> {
+        if self.bytes.is_empty() {
+            return Err(ArrowError::InvalidArgumentError(
+                "Can't get header from empty buffer".to_string(),
+            ));
+        }
+        Ok(self.bytes[0])
+    }
+
+    /// Get the offset_minus_one value from the header
+    pub fn offset_size_minus_one(&self) -> Result<u8, ArrowError> {
+        if self.bytes.is_empty() {
+            Err(ArrowError::InvalidArgumentError(
+                "Tried to get offset_size_minus_one from header, but 
self.bytes buffer is emtpy."
+                    .to_string(),
+            ))
+        } else {
+            Ok(self.bytes[0] & (0b11 << 6)) // Grab the last 2 bits
+        }
+    }
+
+    /// Get the offset_size
+    pub fn offset_size(&self) -> Result<u8, ArrowError> {
+        Ok(self.offset_size_minus_one()? + 1)
+    }
+
+    /// Get the offsets as an iterator
+    // TODO: Do we want this kind of API?
+    // TODO: Test once API is agreed upon
+    pub fn offsets(&'m self) -> Result<impl Iterator<Item = (usize, usize)> + 
'm, ArrowError> {
+        struct OffsetIterators<'m> {
+            buffer: &'m [u8],
+            total: usize,
+            seen: usize,
+            offset_size: usize,
+        }
+        impl<'m> Iterator for OffsetIterators<'m> {
+            type Item = (usize, usize); // (start, end) positions of the bytes

Review Comment:
   Why not just:
   ```suggestion
               type Item = Range<usize>; // (start, end) positions of the bytes
   ```



##########
parquet-variant/src/decoder.rs:
##########
@@ -0,0 +1,199 @@
+// NOTE: Largely based on the implementation of @PinkCrow007 in 
https://github.com/apache/arrow-rs/pull/7452
+// And the feedback there.
+use crate::variant::VariantType;
+use arrow_schema::ArrowError;
+use std::{array::TryFromSliceError, str};
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantBasicType {
+    Primitive = 0,
+    ShortString = 1,
+    Object = 2,
+    Array = 3,
+}
+
+#[derive(Debug, Clone, Copy)]
+pub enum VariantPrimitiveType {
+    Null = 0,
+    BooleanTrue = 1,
+    BooleanFalse = 2,
+    Int8 = 3,
+    // TODO: Add 'legs' for the rest of primitives, once API is agreed upon
+    String = 16,
+}
+
+/// Extracts the basic type from a header byte
+pub(crate) fn get_basic_type(header: u8) -> Result<VariantBasicType, 
ArrowError> {
+    // See 
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-encoding
+    let basic_type = header & 0x03; // Basic type is encoded in the first 2 
bits
+    let basic_type = match basic_type {
+        0 => VariantBasicType::Primitive,
+        1 => VariantBasicType::ShortString,
+        2 => VariantBasicType::Object,
+        3 => VariantBasicType::Array,
+        _ => {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "unknown basic type: {}",
+                basic_type
+            )))
+        }
+    };
+    Ok(basic_type)
+}
+
+/// Extracts the primitive type from a header byte
+pub(crate) fn get_primitive_type(header: u8) -> Result<VariantPrimitiveType, 
ArrowError> {
+    // See 
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-encoding
+    //// Primitive type is encoded in the last 6 bits of the header byte
+    let primitive_type = (header >> 2) & 0x3F;

Review Comment:
   Isn't the header a u8? I don't think we need the mask, because any u8 
shifted right by 2 bits will by definition contain only 6 bits (unsigned `>>` 
shifts in zeros from the left)



##########
parquet-variant/src/variant.rs:
##########
@@ -0,0 +1,321 @@
+use std::{borrow::Cow, ops::Index};
+
+use crate::decoder::{self, get_variant_type};
+use arrow_schema::ArrowError;
+use strum_macros::EnumDiscriminants;
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+/// Encodes the Variant Metadata, see the Variant spec file for more 
information
+pub struct VariantMetadata<'m> {
+    bytes: &'m [u8],
+}
+
+impl<'m> VariantMetadata<'m> {
+    /// View the raw bytes (needed by very low-level decoders)
+    #[inline]
+    pub const fn as_bytes(&self) -> &'m [u8] {
+        self.bytes
+    }
+
+    /// Whether the dictionary keys are sorted and unique
+    pub fn is_sorted(&self) -> bool {
+        todo!()
+    }
+
+    /// Get the dict length
+    pub fn dict_len(&self) -> Result<usize, ArrowError> {
+        let end_location = self.offset_size()? as usize + 1;
+        if self.bytes.len() < end_location {
+            let err_str = format!(
+                "Invalid metadata bytes, must have at least length {} but has 
length {}",
+                &end_location,
+                self.bytes.len()
+            );
+            return Err(ArrowError::InvalidArgumentError(err_str));
+        }
+        let dict_len_bytes = &self.bytes[1..end_location];
+        let dict_len = 
usize::from_le_bytes(dict_len_bytes.try_into().map_err(|e| {
+            ArrowError::InvalidArgumentError(format!(
+                "Unable to convert dictionary_size bytes into usize: {}",
+                e,
+            ))
+        })?);
+        Ok(dict_len)
+    }
+    pub fn version(&self) -> usize {
+        todo!()
+    }
+
+    /// Get the offset by index
+    pub fn get_offset_by(&self, index: usize) -> Result<usize, ArrowError> {
+        todo!()
+    }
+
+    /// Get the header byte, which has the following form
+    ///              7     6  5   4  3             0
+    ///             +-------+---+---+---------------+
+    /// header      |       |   |   |    version    |
+    ///             +-------+---+---+---------------+
+    ///                 ^         ^
+    ///                 |         +-- sorted_strings
+    ///                 +-- offset_size_minus_one
+    /// The version is a 4-bit value that must always contain the value 1.
+    /// - sorted_strings is a 1-bit value indicating whether dictionary 
strings are sorted and unique.
+    /// - offset_size_minus_one is a 2-bit value providing the number of bytes 
per dictionary size and offset field.
+    /// - The actual number of bytes, offset_size, is offset_size_minus_one + 1
+    pub fn header(&self) -> Result<u8, ArrowError> {
+        if self.bytes.is_empty() {
+            return Err(ArrowError::InvalidArgumentError(
+                "Can't get header from empty buffer".to_string(),
+            ));
+        }
+        Ok(self.bytes[0])
+    }
+
+    /// Get the offset_minus_one value from the header
+    pub fn offset_size_minus_one(&self) -> Result<u8, ArrowError> {
+        if self.bytes.is_empty() {
+            Err(ArrowError::InvalidArgumentError(
+                "Tried to get offset_size_minus_one from header, but 
self.bytes buffer is emtpy."
+                    .to_string(),
+            ))
+        } else {
+            Ok(self.bytes[0] & (0b11 << 6)) // Grab the last 2 bits
+        }
+    }
+
+    /// Get the offset_size
+    pub fn offset_size(&self) -> Result<u8, ArrowError> {
+        Ok(self.offset_size_minus_one()? + 1)
+    }
+
+    /// Get the offsets as an iterator
+    // TODO: Do we want this kind of API?
+    // TODO: Test once API is agreed upon
+    pub fn offsets(&'m self) -> Result<impl Iterator<Item = (usize, usize)> + 
'm, ArrowError> {
+        struct OffsetIterators<'m> {
+            buffer: &'m [u8],
+            total: usize,
+            seen: usize,
+            offset_size: usize,
+        }
+        impl<'m> Iterator for OffsetIterators<'m> {
+            type Item = (usize, usize); // (start, end) positions of the bytes
+
+            // TODO: Check bounds here or ensure they're correct
+
+            fn next(&mut self) -> Option<Self::Item> {
+                // +1 to skip the first offset
+                if self.seen < self.total {
+                    let start = usize::from_le_bytes(
+                        self.buffer[(self.seen ) * self.offset_size + 1 // +1 
to skip header
+                            ..(self.seen ) * self.offset_size + 1]
+                            .try_into()
+                            .ok()?,
+                    );
+                    self.seen += 1;
+                    let end = usize::from_le_bytes(
+                        self.buffer[(self.seen ) * self.offset_size + 1 // +1 
to skip header
+                            ..(self.seen ) * self.offset_size + 1]
+                            .try_into()
+                            .ok()?,
+                    );
+
+                    Some((start, end))
+                } else {
+                    None
+                }
+            }
+        }
+        let iterator: OffsetIterators = OffsetIterators {
+            buffer: self.bytes,
+            total: self.dict_len()?,
+            seen: 0,
+            offset_size: self.offset_size()? as usize,
+        };
+        Ok(iterator)
+    }
+    /// Get the key-name-bytes by index
+    pub fn get_by(&self, index: usize) -> Result<&'m str, ArrowError> {
+        todo!()
+    }
+    /// Get all key-names as an Iterator of strings
+    // TODO: Result
+    pub fn fields(&self) -> impl Iterator<Item = &'m str> {
+        // Do the same as for offsets
+        todo!();
+        vec![].into_iter()
+    }
+}
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+pub struct VariantObject<'m, 'v> {
+    pub metadata: VariantMetadata<'m>,
+    pub value: &'v [u8],
+}
+
+impl<'m, 'v> VariantObject<'m, 'v> {
+    pub fn fields(&self) -> Result<impl Iterator<Item = (&'m str, Variant<'m, 
'v>)>, ArrowError> {
+        todo!();
+        #[allow(unreachable_code)] // Just to infer the return type
+        Ok(vec![].into_iter())
+    }
+
+    pub fn field(&self, _name: &'m str) -> Result<Variant<'m, 'v>, ArrowError> 
{
+        todo!()
+    }
+}
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+pub struct VariantArray<'m, 'v> {
+    pub metadata: VariantMetadata<'m>,
+    pub value: &'v [u8],
+}
+
+// TODO: Let's agree on the API here, also should we expose a way to get the 
values as a vec of
+// variants for those who want it? Would require allocations.
+impl<'m, 'v> VariantArray<'m, 'v> {
+    pub fn len(&self) -> usize {
+        todo!()
+    }
+
+    pub fn values(&self) -> Result<impl Iterator<Item = Variant<'m, 'v>>, 
ArrowError> {
+        todo!();
+        #[allow(unreachable_code)] // Just to infer the return type
+        Ok(vec![].into_iter())
+    }
+}
+
+impl<'m, 'v> Index<usize> for VariantArray<'m, 'v> {
+    type Output = Variant<'m, 'v>;
+
+    fn index(&self, _index: usize) -> &Self::Output {
+        todo!()
+    }
+}
+
+/// Variant value. May contain references to metadata and value
+// TODO: Add copy if no Cow on String and Shortstring?
+#[derive(Clone, Debug, PartialEq, EnumDiscriminants)]
+#[strum_discriminants(name(VariantType))]
+pub enum Variant<'m, 'v> {
+    // TODO: Add 'legs' for the rest of the primitive types, once API is 
agreed upon
+    Null,
+    Int8(i8),
+
+    BooleanTrue,
+    BooleanFalse,
+
+    // only need the *value* buffer
+    // TODO: Do we want Cow<'v, str> over &'v str? It enables From<String> - 
discuss on PR
+    String(Cow<'v, str>),
+    ShortString(Cow<'v, str>),
+
+    // need both metadata & value
+    Object(VariantObject<'m, 'v>),
+    Array(VariantArray<'m, 'v>),
+}
+
+impl<'m, 'v> Variant<'m, 'v> {
+    /// Parse the buffers and return the appropriate variant.
+    pub fn try_new(metadata: &'m [u8], value: &'v [u8]) -> Result<Self, 
ArrowError> {

Review Comment:
   IMO we should pass an already created and validated `metadata: &'m 
Metadata<'m>`. Create that once, and pass it around everywhere afterward.



##########
parquet-variant/src/variant.rs:
##########
@@ -0,0 +1,321 @@
+use std::{borrow::Cow, ops::Index};
+
+use crate::decoder::{self, get_variant_type};
+use arrow_schema::ArrowError;
+use strum_macros::EnumDiscriminants;
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+/// Encodes the Variant Metadata, see the Variant spec file for more 
information
+pub struct VariantMetadata<'m> {
+    bytes: &'m [u8],
+}
+
+impl<'m> VariantMetadata<'m> {
+    /// View the raw bytes (needed by very low-level decoders)
+    #[inline]
+    pub const fn as_bytes(&self) -> &'m [u8] {
+        self.bytes
+    }
+
+    /// Whether the dictionary keys are sorted and unique
+    pub fn is_sorted(&self) -> bool {
+        todo!()
+    }
+
+    /// Get the dict length
+    pub fn dict_len(&self) -> Result<usize, ArrowError> {
+        let end_location = self.offset_size()? as usize + 1;
+        if self.bytes.len() < end_location {
+            let err_str = format!(
+                "Invalid metadata bytes, must have at least length {} but has 
length {}",
+                &end_location,
+                self.bytes.len()
+            );
+            return Err(ArrowError::InvalidArgumentError(err_str));
+        }
+        let dict_len_bytes = &self.bytes[1..end_location];
+        let dict_len = 
usize::from_le_bytes(dict_len_bytes.try_into().map_err(|e| {
+            ArrowError::InvalidArgumentError(format!(
+                "Unable to convert dictionary_size bytes into usize: {}",
+                e,
+            ))
+        })?);
+        Ok(dict_len)
+    }
+    pub fn version(&self) -> usize {
+        todo!()
+    }
+
+    /// Get the offset by index
+    pub fn get_offset_by(&self, index: usize) -> Result<usize, ArrowError> {
+        todo!()
+    }
+
+    /// Get the header byte, which has the following form
+    ///              7     6  5   4  3             0
+    ///             +-------+---+---+---------------+
+    /// header      |       |   |   |    version    |
+    ///             +-------+---+---+---------------+
+    ///                 ^         ^
+    ///                 |         +-- sorted_strings
+    ///                 +-- offset_size_minus_one
+    /// The version is a 4-bit value that must always contain the value 1.
+    /// - sorted_strings is a 1-bit value indicating whether dictionary 
strings are sorted and unique.
+    /// - offset_size_minus_one is a 2-bit value providing the number of bytes 
per dictionary size and offset field.
+    /// - The actual number of bytes, offset_size, is offset_size_minus_one + 1
+    pub fn header(&self) -> Result<u8, ArrowError> {
+        if self.bytes.is_empty() {
+            return Err(ArrowError::InvalidArgumentError(
+                "Can't get header from empty buffer".to_string(),
+            ));
+        }
+        Ok(self.bytes[0])
+    }
+
+    /// Get the offset_minus_one value from the header
+    pub fn offset_size_minus_one(&self) -> Result<u8, ArrowError> {
+        if self.bytes.is_empty() {
+            Err(ArrowError::InvalidArgumentError(
+                "Tried to get offset_size_minus_one from header, but 
self.bytes buffer is emtpy."
+                    .to_string(),
+            ))
+        } else {
+            Ok(self.bytes[0] & (0b11 << 6)) // Grab the last 2 bits
+        }
+    }
+
+    /// Get the offset_size
+    pub fn offset_size(&self) -> Result<u8, ArrowError> {
+        Ok(self.offset_size_minus_one()? + 1)
+    }
+
+    /// Get the offsets as an iterator
+    // TODO: Do we want this kind of API?
+    // TODO: Test once API is agreed upon
+    pub fn offsets(&'m self) -> Result<impl Iterator<Item = (usize, usize)> + 
'm, ArrowError> {
+        struct OffsetIterators<'m> {
+            buffer: &'m [u8],
+            total: usize,
+            seen: usize,
+            offset_size: usize,
+        }
+        impl<'m> Iterator for OffsetIterators<'m> {
+            type Item = (usize, usize); // (start, end) positions of the bytes
+
+            // TODO: Check bounds here or ensure they're correct
+
+            fn next(&mut self) -> Option<Self::Item> {
+                // +1 to skip the first offset
+                if self.seen < self.total {
+                    let start = usize::from_le_bytes(
+                        self.buffer[(self.seen ) * self.offset_size + 1 // +1 
to skip header
+                            ..(self.seen ) * self.offset_size + 1]
+                            .try_into()
+                            .ok()?,
+                    );
+                    self.seen += 1;
+                    let end = usize::from_le_bytes(
+                        self.buffer[(self.seen ) * self.offset_size + 1 // +1 
to skip header
+                            ..(self.seen ) * self.offset_size + 1]
+                            .try_into()
+                            .ok()?,
+                    );
+
+                    Some((start, end))
+                } else {
+                    None
+                }
+            }
+        }
+        let iterator: OffsetIterators = OffsetIterators {
+            buffer: self.bytes,
+            total: self.dict_len()?,
+            seen: 0,
+            offset_size: self.offset_size()? as usize,
+        };
+        Ok(iterator)
+    }
+    /// Get the key-name-bytes by index
+    pub fn get_by(&self, index: usize) -> Result<&'m str, ArrowError> {
+        todo!()
+    }
+    /// Get all key-names as an Iterator of strings
+    // TODO: Result
+    pub fn fields(&self) -> impl Iterator<Item = &'m str> {
+        // Do the same as for offsets
+        todo!();
+        vec![].into_iter()
+    }
+}
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+pub struct VariantObject<'m, 'v> {
+    pub metadata: VariantMetadata<'m>,
+    pub value: &'v [u8],
+}
+
+impl<'m, 'v> VariantObject<'m, 'v> {
+    pub fn fields(&self) -> Result<impl Iterator<Item = (&'m str, Variant<'m, 
'v>)>, ArrowError> {
+        todo!();
+        #[allow(unreachable_code)] // Just to infer the return type
+        Ok(vec![].into_iter())
+    }
+
+    pub fn field(&self, _name: &'m str) -> Result<Variant<'m, 'v>, ArrowError> 
{
+        todo!()
+    }
+}
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+pub struct VariantArray<'m, 'v> {
+    pub metadata: VariantMetadata<'m>,
+    pub value: &'v [u8],
+}
+
+// TODO: Let's agree on the API here, also should we expose a way to get the 
values as a vec of
+// variants for those who want it? Would require allocations.
+impl<'m, 'v> VariantArray<'m, 'v> {
+    pub fn len(&self) -> usize {
+        todo!()
+    }
+
+    pub fn values(&self) -> Result<impl Iterator<Item = Variant<'m, 'v>>, 
ArrowError> {
+        todo!();
+        #[allow(unreachable_code)] // Just to infer the return type
+        Ok(vec![].into_iter())
+    }
+}
+
+impl<'m, 'v> Index<usize> for VariantArray<'m, 'v> {
+    type Output = Variant<'m, 'v>;
+
+    fn index(&self, _index: usize) -> &Self::Output {
+        todo!()
+    }
+}
+
+/// Variant value. May contain references to metadata and value
+// TODO: Add copy if no Cow on String and Shortstring?
+#[derive(Clone, Debug, PartialEq, EnumDiscriminants)]
+#[strum_discriminants(name(VariantType))]
+pub enum Variant<'m, 'v> {
+    // TODO: Add 'legs' for the rest of the primitive types, once API is 
agreed upon
+    Null,
+    Int8(i8),
+
+    BooleanTrue,
+    BooleanFalse,
+
+    // only need the *value* buffer
+    // TODO: Do we want Cow<'v, str> over &'v str? It enables From<String> - 
discuss on PR
+    String(Cow<'v, str>),

Review Comment:
   I'm also a bit nervous about the inability to have an owned variant... 
especially once we get into shredding territory with a need to mix and match 
things. But maybe the variant builder will be enough for that.



##########
parquet-variant/src/variant.rs:
##########
@@ -0,0 +1,321 @@
+use std::{borrow::Cow, ops::Index};
+
+use crate::decoder::{self, get_variant_type};
+use arrow_schema::ArrowError;
+use strum_macros::EnumDiscriminants;
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+/// Encodes the Variant Metadata, see the Variant spec file for more 
information

Review Comment:
   +1
   
   Also, pay the cost to construct it once and pass references to everyone else 
that needs it.



##########
parquet-variant/src/variant.rs:
##########
@@ -0,0 +1,321 @@
+use std::{borrow::Cow, ops::Index};
+
+use crate::decoder::{self, get_variant_type};
+use arrow_schema::ArrowError;
+use strum_macros::EnumDiscriminants;
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+/// Encodes the Variant Metadata, see the Variant spec file for more 
information
+pub struct VariantMetadata<'m> {
+    bytes: &'m [u8],
+}
+
+impl<'m> VariantMetadata<'m> {
+    /// View the raw bytes (needed by very low-level decoders)
+    #[inline]
+    pub const fn as_bytes(&self) -> &'m [u8] {
+        self.bytes
+    }
+
+    /// Whether the dictionary keys are sorted and unique
+    pub fn is_sorted(&self) -> bool {
+        todo!()
+    }
+
+    /// Get the dict length
+    pub fn dict_len(&self) -> Result<usize, ArrowError> {
+        let end_location = self.offset_size()? as usize + 1;
+        if self.bytes.len() < end_location {
+            let err_str = format!(
+                "Invalid metadata bytes, must have at least length {} but has 
length {}",
+                &end_location,
+                self.bytes.len()
+            );
+            return Err(ArrowError::InvalidArgumentError(err_str));
+        }
+        let dict_len_bytes = &self.bytes[1..end_location];
+        let dict_len = 
usize::from_le_bytes(dict_len_bytes.try_into().map_err(|e| {
+            ArrowError::InvalidArgumentError(format!(
+                "Unable to convert dictionary_size bytes into usize: {}",
+                e,
+            ))
+        })?);
+        Ok(dict_len)
+    }
+    pub fn version(&self) -> usize {
+        todo!()
+    }
+
+    /// Get the offset by index
+    pub fn get_offset_by(&self, index: usize) -> Result<usize, ArrowError> {
+        todo!()
+    }
+
+    /// Get the header byte, which has the following form
+    ///              7     6  5   4  3             0
+    ///             +-------+---+---+---------------+
+    /// header      |       |   |   |    version    |
+    ///             +-------+---+---+---------------+
+    ///                 ^         ^
+    ///                 |         +-- sorted_strings
+    ///                 +-- offset_size_minus_one
+    /// The version is a 4-bit value that must always contain the value 1.
+    /// - sorted_strings is a 1-bit value indicating whether dictionary 
strings are sorted and unique.
+    /// - offset_size_minus_one is a 2-bit value providing the number of bytes 
per dictionary size and offset field.
+    /// - The actual number of bytes, offset_size, is offset_size_minus_one + 1
+    pub fn header(&self) -> Result<u8, ArrowError> {
+        if self.bytes.is_empty() {
+            return Err(ArrowError::InvalidArgumentError(
+                "Can't get header from empty buffer".to_string(),
+            ));
+        }
+        Ok(self.bytes[0])
+    }
+
+    /// Get the offset_minus_one value from the header
+    pub fn offset_size_minus_one(&self) -> Result<u8, ArrowError> {
+        if self.bytes.is_empty() {
+            Err(ArrowError::InvalidArgumentError(
+                "Tried to get offset_size_minus_one from header, but 
self.bytes buffer is emtpy."
+                    .to_string(),
+            ))
+        } else {
+            Ok(self.bytes[0] & (0b11 << 6)) // Grab the last 2 bits
+        }
+    }
+
+    /// Get the offset_size
+    pub fn offset_size(&self) -> Result<u8, ArrowError> {
+        Ok(self.offset_size_minus_one()? + 1)
+    }
+
+    /// Get the offsets as an iterator
+    // TODO: Do we want this kind of API?
+    // TODO: Test once API is agreed upon
+    pub fn offsets(&'m self) -> Result<impl Iterator<Item = (usize, usize)> + 
'm, ArrowError> {
+        struct OffsetIterators<'m> {
+            buffer: &'m [u8],
+            total: usize,
+            seen: usize,
+            offset_size: usize,
+        }
+        impl<'m> Iterator for OffsetIterators<'m> {
+            type Item = (usize, usize); // (start, end) positions of the bytes
+
+            // TODO: Check bounds here or ensure they're correct
+
+            fn next(&mut self) -> Option<Self::Item> {
+                // +1 to skip the first offset
+                if self.seen < self.total {
+                    let start = usize::from_le_bytes(
+                        self.buffer[(self.seen ) * self.offset_size + 1 // +1 
to skip header
+                            ..(self.seen ) * self.offset_size + 1]
+                            .try_into()
+                            .ok()?,
+                    );
+                    self.seen += 1;
+                    let end = usize::from_le_bytes(
+                        self.buffer[(self.seen ) * self.offset_size + 1 // +1 
to skip header
+                            ..(self.seen ) * self.offset_size + 1]
+                            .try_into()
+                            .ok()?,
+                    );
+
+                    Some((start, end))
+                } else {
+                    None
+                }
+            }
+        }
+        let iterator: OffsetIterators = OffsetIterators {
+            buffer: self.bytes,
+            total: self.dict_len()?,
+            seen: 0,
+            offset_size: self.offset_size()? as usize,
+        };
+        Ok(iterator)
+    }
+    /// Get the key-name-bytes by index
+    pub fn get_by(&self, index: usize) -> Result<&'m str, ArrowError> {
+        todo!()
+    }
+    /// Get all key-names as an Iterator of strings
+    // TODO: Result
+    pub fn fields(&self) -> impl Iterator<Item = &'m str> {
+        // Do the same as for offsets
+        todo!();
+        vec![].into_iter()
+    }
+}
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+pub struct VariantObject<'m, 'v> {
+    pub metadata: VariantMetadata<'m>,
+    pub value: &'v [u8],
+}
+
+impl<'m, 'v> VariantObject<'m, 'v> {
+    pub fn fields(&self) -> Result<impl Iterator<Item = (&'m str, Variant<'m, 
'v>)>, ArrowError> {
+        todo!();
+        #[allow(unreachable_code)] // Just to infer the return type
+        Ok(vec![].into_iter())
+    }
+
+    pub fn field(&self, _name: &'m str) -> Result<Variant<'m, 'v>, ArrowError> 
{
+        todo!()
+    }
+}
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+pub struct VariantArray<'m, 'v> {
+    pub metadata: VariantMetadata<'m>,
+    pub value: &'v [u8],
+}
+
+// TODO: Let's agree on the API here, also should we expose a way to get the 
values as a vec of
+// variants for those who want it? Would require allocations.
+impl<'m, 'v> VariantArray<'m, 'v> {
+    pub fn len(&self) -> usize {
+        todo!()
+    }
+
+    pub fn values(&self) -> Result<impl Iterator<Item = Variant<'m, 'v>>, 
ArrowError> {
+        todo!();
+        #[allow(unreachable_code)] // Just to infer the return type
+        Ok(vec![].into_iter())
+    }
+}
+
+impl<'m, 'v> Index<usize> for VariantArray<'m, 'v> {
+    type Output = Variant<'m, 'v>;
+
+    fn index(&self, _index: usize) -> &Self::Output {

Review Comment:
   I don't think this trait works, if it forces us to return a reference? 
   We need to return a `Variant` by value, which wraps the referenced bytes.



##########
parquet-variant/src/variant.rs:
##########
@@ -0,0 +1,321 @@
+use std::{borrow::Cow, ops::Index};
+
+use crate::decoder::{self, get_variant_type};
+use arrow_schema::ArrowError;
+use strum_macros::EnumDiscriminants;
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+/// Encodes the Variant Metadata, see the Variant spec file for more 
information

Review Comment:
   Actually, there are _four_ different `XX_size_minus_one` situations in the 
variant spec (metadata dictionary offsets, array value offsets, object field 
ids, and object value offsets). We should probably factor out a helper class 
for working with them, since it's ~always the same:
   ```rust
   enum PackedUsizeArrayElementSize {
       One = 1,
       Two = 2,
       Three = 3,
       Four = 4,
   };
   
   impl PackedUsizeArrayElementSize {
       fn try_new(offset_size_minus_one: u8) -> Result<Self, ArrowError> { 
           use PackedUsizeArrayElementSize::*; 
           let result = match offset_size_minus_one { 
               0 => One, 
               1 => Two, 
               2 => Three, 
               3 => Four, 
               _ => return Err(...),
            }; 
            Ok(result)
        }
   }
   
   struct PackedUsizeArray<'a> {
       element_size: PackedArrayElementSize,
       elements: &'a [u8],
   }
   
   impl<'a> PackedUsizeArray<'a> {
       /// NOTE: Only looks at the lower 2 bits of the element size!
       fn try_new(
           bytes: &'a [u8],
           byte_offset: usize,
           element_size: PackedUsizeArrayElementSize,
           num_elements: usize,
       ) -> Result<Self, ArrowError> {
           let len = (element_size as usize)*num_elements;
           Ok(Self {
               elements: slice_from_slice(bytes, byte_offset..byte_offset+len)?
               element_size,
               num_elements,
           })
       }
   
       fn get(&self, i: usize) -> Result<usize, ArrowError> {
           self.element_size.unpack_usize(self. bytes, 0, i)
       }
   }
   ```
   



##########
parquet-variant/src/variant.rs:
##########
@@ -0,0 +1,321 @@
+use std::{borrow::Cow, ops::Index};
+
+use crate::decoder::{self, get_variant_type};
+use arrow_schema::ArrowError;
+use strum_macros::EnumDiscriminants;
+
+#[derive(Clone, Copy, Debug, PartialEq)]
+/// Encodes the Variant Metadata, see the Variant spec file for more 
information
+pub struct VariantMetadata<'m> {
+    bytes: &'m [u8],
+}
+
+impl<'m> VariantMetadata<'m> {
+    /// View the raw bytes (needed by very low-level decoders)
+    #[inline]
+    pub const fn as_bytes(&self) -> &'m [u8] {
+        self.bytes
+    }
+
+    /// Whether the dictionary keys are sorted and unique
+    pub fn is_sorted(&self) -> bool {
+        todo!()
+    }
+
+    /// Get the dict length
+    pub fn dict_len(&self) -> Result<usize, ArrowError> {
+        let end_location = self.offset_size()? as usize + 1;
+        if self.bytes.len() < end_location {
+            let err_str = format!(
+                "Invalid metadata bytes, must have at least length {} but has 
length {}",
+                &end_location,
+                self.bytes.len()
+            );
+            return Err(ArrowError::InvalidArgumentError(err_str));
+        }
+        let dict_len_bytes = &self.bytes[1..end_location];
+        let dict_len = 
usize::from_le_bytes(dict_len_bytes.try_into().map_err(|e| {
+            ArrowError::InvalidArgumentError(format!(
+                "Unable to convert dictionary_size bytes into usize: {}",
+                e,
+            ))
+        })?);
+        Ok(dict_len)
+    }
+    pub fn version(&self) -> usize {
+        todo!()
+    }
+
+    /// Get the offset by index
+    pub fn get_offset_by(&self, index: usize) -> Result<usize, ArrowError> {
+        todo!()
+    }
+
+    /// Get the header byte, which has the following form
+    ///              7     6  5   4  3             0
+    ///             +-------+---+---+---------------+
+    /// header      |       |   |   |    version    |
+    ///             +-------+---+---+---------------+
+    ///                 ^         ^
+    ///                 |         +-- sorted_strings
+    ///                 +-- offset_size_minus_one
+    /// The version is a 4-bit value that must always contain the value 1.
+    /// - sorted_strings is a 1-bit value indicating whether dictionary 
strings are sorted and unique.
+    /// - offset_size_minus_one is a 2-bit value providing the number of bytes 
per dictionary size and offset field.
+    /// - The actual number of bytes, offset_size, is offset_size_minus_one + 1
+    pub fn header(&self) -> Result<u8, ArrowError> {
+        if self.bytes.is_empty() {
+            return Err(ArrowError::InvalidArgumentError(
+                "Can't get header from empty buffer".to_string(),
+            ));
+        }
+        Ok(self.bytes[0])
+    }
+
+    /// Get the offset_minus_one value from the header
+    pub fn offset_size_minus_one(&self) -> Result<u8, ArrowError> {
+        if self.bytes.is_empty() {
+            Err(ArrowError::InvalidArgumentError(
+                "Tried to get offset_size_minus_one from header, but 
self.bytes buffer is emtpy."
+                    .to_string(),
+            ))
+        } else {
+            Ok(self.bytes[0] & (0b11 << 6)) // Grab the last 2 bits
+        }
+    }
+
+    /// Get the offset_size
+    pub fn offset_size(&self) -> Result<u8, ArrowError> {
+        Ok(self.offset_size_minus_one()? + 1)
+    }
+
+    /// Get the offsets as an iterator
+    // TODO: Do we want this kind of API?
+    // TODO: Test once API is agreed upon
+    pub fn offsets(&'m self) -> Result<impl Iterator<Item = (usize, usize)> + 
'm, ArrowError> {
+        struct OffsetIterators<'m> {
+            buffer: &'m [u8],
+            total: usize,
+            seen: usize,
+            offset_size: usize,
+        }
+        impl<'m> Iterator for OffsetIterators<'m> {
+            type Item = (usize, usize); // (start, end) positions of the bytes
+
+            // TODO: Check bounds here or ensure they're correct
+
+            fn next(&mut self) -> Option<Self::Item> {
+                // +1 to skip the first offset
+                if self.seen < self.total {
+                    let start = usize::from_le_bytes(
+                        self.buffer[(self.seen ) * self.offset_size + 1 // +1 
to skip header
+                            ..(self.seen ) * self.offset_size + 1]
+                            .try_into()
+                            .ok()?,

Review Comment:
   We'd want some helper like this:
   
   <details>
   
   ```rust
   enum OffsetSizeBytes {
       One = 1,
       Two = 2,
       Three = 3,
       Four = 4,
   }
   
   impl OffsetSizeBytes {
       fn try_new(offset_size_minus_one: u8) -> Result<Self, ArrowError> {
           use OffsetSizeBytes::*;
           let result = match offset_size_minus_one {
               0 => One,
               1 => Two,
               2 => Three,
               3 => Four,
               _ => return Err(...),
           };
           Ok(result)
       }
   
       fn unpack_usize(
           bytes: &[u8], 
           byte_offset: usize, // how many bytes to skip
           offset_index: usize, // which offset in an array of offsets
       ) -> Result<usize, ArrowError> {
           use OffsetSizeBytes::*;
           let offset = byte_offset + (self as usize)*offset_index;
           let result = match self {
               One => u8::from_le_bytes(array_from_slice(bytes, 
offset)?).into(),
               Two => u16::from_le_bytes(array_from_slice(bytes, 
offset)?).into(),
               Three => todo!(), // ugh, endianness
               Four => u32::from_le_bytes(array_from_slice(bytes, 
offset)?).try_into()?,
           };
           Ok(result)
       }
   }
   ```
   
   </details>
   
   For simplicity (and IIRC also speed), I believe this kind of enum-based 
dispatch is generally preferred in rust, instead of function pointers?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] API for reading Variant data and metadata [arrow-rs]

Reply via email to