scovich commented on code in PR #7911:
URL: https://github.com/apache/arrow-rs/pull/7911#discussion_r2202046994


##########
parquet-variant/src/builder.rs:
##########
@@ -315,29 +343,29 @@ impl MetadataBuilder {
         let string_start = offset_start + (nkeys + 1) * offset_size as usize;
         let metadata_size = string_start + total_dict_size;
 
-        let mut metadata = Vec::with_capacity(metadata_size);
+        metadata_buffer.reserve(metadata_size);

Review Comment:
   Shouldn't we reserve `metadata_size` _more_ bytes, rather than that many 
_total_ bytes?



##########
parquet-variant-compute/src/variant_array_builder.rs:
##########
@@ -168,7 +166,105 @@ impl VariantArrayBuilder {
         self.value_buffer.extend_from_slice(value);
     }
 
-    // TODO: Return a Variant builder that will write to the underlying 
buffers (TODO)
+    /// Return a DirectVariantBuilder that writes directly to the buffers of 
this builder.
+    pub fn variant_builder(&mut self) -> DirectVariantBuilder {
+        // append directly into the metadata and value buffers
+        let metadata_buffer = std::mem::take(&mut self.metadata_buffer);
+        let value_buffer = std::mem::take(&mut self.value_buffer);
+        let metadata_offset = metadata_buffer.len();
+        let value_offset = value_buffer.len();

Review Comment:
   nit: seems like these could just fold into the constructor call below



##########
parquet-variant-compute/src/variant_array_builder.rs:
##########
@@ -168,7 +166,105 @@ impl VariantArrayBuilder {
         self.value_buffer.extend_from_slice(value);
     }
 
-    // TODO: Return a Variant builder that will write to the underlying 
buffers (TODO)
+    /// Return a DirectVariantBuilder that writes directly to the buffers of 
this builder.
+    pub fn variant_builder(&mut self) -> DirectVariantBuilder {
+        // append directly into the metadata and value buffers
+        let metadata_buffer = std::mem::take(&mut self.metadata_buffer);
+        let value_buffer = std::mem::take(&mut self.value_buffer);
+        let metadata_offset = metadata_buffer.len();
+        let value_offset = value_buffer.len();
+        DirectVariantBuilder {
+            finished: false,
+            metadata_offset,
+            value_offset,
+            variant_builder: VariantBuilder::new_with_buffers(metadata_buffer, 
value_buffer),
+            array_builder: self,
+        }
+    }
+}
+
+/// A `VariantBuilder` that writes directly to the buffers of a 
`VariantArrayBuilder`.
+///
+/// See [`VariantArray::variant_builder`] for an example
+pub struct DirectVariantBuilder<'a> {
+    /// was finish called?
+    finished: bool,
+    /// starting metadata offset
+    metadata_offset: usize,
+    /// starting value offset
+    value_offset: usize,
+    array_builder: &'a mut VariantArrayBuilder,
+    variant_builder: VariantBuilder,
+}
+
+impl DirectVariantBuilder<'_> {
+    /// Return a reference to the underlying `VariantBuilder`
+    pub fn inner(&self) -> &VariantBuilder {
+        &self.variant_builder
+    }
+
+    /// Return a mutable reference to the underlying `VariantBuilder`
+    pub fn inner_mut(&mut self) -> &mut VariantBuilder {
+        &mut self.variant_builder
+    }
+
+    /// Called to finalize the variant and write it to the underlying buffers
+    ///
+    /// Note if you do not call finish, the struct will be reset and the 
buffers
+    /// will not be updated.
+    ///
+    pub fn finish(mut self) {
+        let metadata_offset = self.metadata_offset;
+        let value_offset = self.value_offset;
+
+        // get the buffers back
+        let (metadata_buffer, value_buffer) = std::mem::take(&mut 
self.variant_builder).finish();

Review Comment:
   Random tired Friday afternoon thoughts: 
   * Now we have four possible levels of forgetting to finish. Maybe the 
protections we came up for the other three could apply here as well so we don't 
need a fancy `impl Drop`?
   * I also wonder if the `BuilderExt` concept could be helpful, given that 
this is really just a fourth thing that acts like a variant builder and you can 
stuff values into?



##########
parquet-variant-compute/src/variant_array_builder.rs:
##########
@@ -168,7 +166,105 @@ impl VariantArrayBuilder {
         self.value_buffer.extend_from_slice(value);
     }
 
-    // TODO: Return a Variant builder that will write to the underlying 
buffers (TODO)
+    /// Return a DirectVariantBuilder that writes directly to the buffers of 
this builder.
+    pub fn variant_builder(&mut self) -> DirectVariantBuilder {
+        // append directly into the metadata and value buffers
+        let metadata_buffer = std::mem::take(&mut self.metadata_buffer);
+        let value_buffer = std::mem::take(&mut self.value_buffer);
+        let metadata_offset = metadata_buffer.len();
+        let value_offset = value_buffer.len();
+        DirectVariantBuilder {
+            finished: false,
+            metadata_offset,
+            value_offset,
+            variant_builder: VariantBuilder::new_with_buffers(metadata_buffer, 
value_buffer),
+            array_builder: self,
+        }
+    }
+}
+
+/// A `VariantBuilder` that writes directly to the buffers of a 
`VariantArrayBuilder`.
+///
+/// See [`VariantArray::variant_builder`] for an example
+pub struct DirectVariantBuilder<'a> {
+    /// was finish called?
+    finished: bool,
+    /// starting metadata offset
+    metadata_offset: usize,
+    /// starting value offset
+    value_offset: usize,
+    array_builder: &'a mut VariantArrayBuilder,
+    variant_builder: VariantBuilder,
+}
+
+impl DirectVariantBuilder<'_> {
+    /// Return a reference to the underlying `VariantBuilder`
+    pub fn inner(&self) -> &VariantBuilder {
+        &self.variant_builder
+    }
+
+    /// Return a mutable reference to the underlying `VariantBuilder`
+    pub fn inner_mut(&mut self) -> &mut VariantBuilder {
+        &mut self.variant_builder
+    }
+
+    /// Called to finalize the variant and write it to the underlying buffers
+    ///
+    /// Note if you do not call finish, the struct will be reset and the 
buffers
+    /// will not be updated.
+    ///
+    pub fn finish(mut self) {
+        let metadata_offset = self.metadata_offset;
+        let value_offset = self.value_offset;
+
+        // get the buffers back
+        let (metadata_buffer, value_buffer) = std::mem::take(&mut 
self.variant_builder).finish();
+        let metadata_len = metadata_buffer
+            .len()
+            .checked_sub(metadata_offset)
+            .expect("metadata length decreased unexpectedly");
+        let value_len = value_buffer
+            .len()
+            .checked_sub(value_offset)
+            .expect("value length decreased unexpectedly");
+
+        // put the buffers back
+        self.array_builder.metadata_buffer = metadata_buffer;
+        self.array_builder.value_buffer = value_buffer;
+
+        // Append offsets and lengths for new nulls into the array builder
+        self.array_builder
+            .metadata_locations
+            .push((metadata_offset, metadata_len));
+        self.array_builder
+            .value_locations
+            .push((value_offset, value_len));
+        self.array_builder.nulls.append_non_null();
+        self.finished = true;
+    }
+}
+
+impl<'a> Drop for DirectVariantBuilder<'a> {
+    fn drop(&mut self) {
+        if self.finished {
+            // if the object was finished, we do not need to do anything
+            return;
+        }
+        // if the object was not finished, we need to reset any partial state 
put the buffers back
+        println!("DirectVariantBuilder::drop");
+        let variant_builder = std::mem::take(&mut self.variant_builder);
+        let (mut metadata_buffer, mut value_buffer) = variant_builder.finish();
+        assert!(
+            metadata_buffer.len() >= self.metadata_offset,
+            "metadata got smaller"
+        );
+        assert!(value_buffer.len() >= self.value_offset, "value got smaller");
+        metadata_buffer.truncate(self.metadata_offset);
+        value_buffer.truncate(self.value_offset);

Review Comment:
   part of me observes that we don't need to truncate if we ensure the 
"garbage" bytes are a suffix of the previous entry rather than a prefix of the 
following entry. But it's probably better to not leave garbage lying around. 
Especially since garbage while attempting the first entry is necessarily a 
prefix.



##########
parquet-variant/src/builder.rs:
##########
@@ -626,23 +689,33 @@ impl ParentState<'_> {
 /// let (metadata, value) = builder.finish();
 /// let variant = Variant::try_new(&metadata, &value).unwrap();
 /// ```
-///
-#[derive(Default)]
+#[derive(Default, Debug)]
 pub struct VariantBuilder {
     buffer: ValueBuffer,
     metadata_builder: MetadataBuilder,
     validate_unique_fields: bool,
 }
 
 impl VariantBuilder {
+    /// Create a new VariantBuilder with new underlying buffer
     pub fn new() -> Self {
         Self {
-            buffer: ValueBuffer::default(),
+            buffer: ValueBuffer::new(),
             metadata_builder: MetadataBuilder::default(),
             validate_unique_fields: false,
         }
     }
 
+    /// Create a new VariantBuilder that will write the metadata and values to

Review Comment:
   ```suggestion
       /// Create a new VariantBuilder that will append the metadata and values 
to
   ```
   (good to be clear that this won't erase existing content)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to