scovich commented on code in PR #7865:
URL: https://github.com/apache/arrow-rs/pull/7865#discussion_r2186127640


##########
parquet-variant/src/builder.rs:
##########
@@ -668,43 +739,41 @@ impl<'a, 'b> ObjectBuilder<'a, 'b> {
         self
     }
 
-    /// Return a new [`ObjectBuilder`] to add a nested object with the 
specified
-    /// key to the object.
-    pub fn new_object(&mut self, key: &'b str) -> ObjectBuilder {
-        self.check_pending_field();
-
-        let field_start = self.buffer.offset();
-        let obj_builder = ObjectBuilder::new(&mut self.buffer, 
self.metadata_builder)
-            .with_validate_unique_fields(self.validate_unique_fields);
-        self.pending = Some((key, field_start));
-
-        obj_builder
+    // Returns validate_unique_fields because we can no longer reference self 
once this method returns.
+    fn parent_state<'b>(&'b mut self, key: &'b str) -> (ParentState<'b>, bool) 
{
+        let state = ParentState::Object {
+            buffer: &mut self.buffer,
+            metadata_builder: self.parent_state.metadata_builder(),
+            fields: &mut self.fields,
+            field_name: key,
+        };
+        (state, self.validate_unique_fields)
     }
 
-    /// Return a new [`ListBuilder`] to add a list with the specified key to 
the
-    /// object.
-    pub fn new_list(&mut self, key: &'b str) -> ListBuilder {
-        self.check_pending_field();
-
-        let field_start = self.buffer.offset();
-        let list_builder = ListBuilder::new(&mut self.buffer, 
self.metadata_builder)
-            .with_validate_unique_fields(self.validate_unique_fields);
-        self.pending = Some((key, field_start));
-
-        list_builder
+    /// Returns an object builder that can be used to append a new (nested) 
object to this list.
+    ///
+    /// WARNING: The builder will have no effect unless/until 
[`ObjectBuilder::finish`] is called.
+    pub fn new_object<'b>(&'b mut self, key: &'b str) -> ObjectBuilder<'b> {
+        let (parent_state, validate_unique_fields) = self.parent_state(key);
+        ObjectBuilder::new(parent_state, validate_unique_fields)
     }
 
-    /// Finalize object
+    /// Returns a list builder that can be used to append a new (nested) list 
to this list.
     ///
-    /// This consumes self and writes the object to the parent buffer.
-    pub fn finish(mut self) -> Result<(), ArrowError> {
-        self.check_pending_field();
+    /// WARNING: The builder will have no effect unless/until 
[`ListBuilder::finish`] is called.
+    pub fn new_list<'b>(&'b mut self, key: &'b str) -> ListBuilder<'b> {
+        let (parent_state, validate_unique_fields) = self.parent_state(key);
+        ListBuilder::new(parent_state, validate_unique_fields)
+    }
 
+    /// Finalizes this object and appends it to its parent, which otherwise 
remains unmodified.
+    pub fn finish(mut self) -> Result<(), ArrowError> {
+        let metadata_builder = self.parent_state.metadata_builder();

Review Comment:
   It's easy enough to add the nested insert(s), but I don't think I understand 
"the issue" you refer to, that the current code does not fix?
   
   The new code doesn't insert a new object field's name to the metadata 
dictionary until the builder's `finish`method is called. You are correct, that 
a scenario like the above could leave extra/unused field names in the metadata 
dictionary (because the field names were added when the field finished 
inserting, but then the object that contains those fields was never actually 
added to its parent). However, spurious names in the metadata dictionary are 
harmless -- the spec does not require that every metadata entry be referenced 
as a field id in the corresponding value. In fact, the variant shredding spec 
_requires_ the metadata dictionary to have "extra" field names, corresponding 
to fields that were shredded out. That way, a reader who wants to unshred the 
those fields back to normal variant values doesn't have to rewrite the metadata 
dictionary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to