alamb commented on code in PR #7833:
URL: https://github.com/apache/arrow-rs/pull/7833#discussion_r2187123305


##########
parquet-variant/src/builder.rs:
##########
@@ -480,6 +522,25 @@ impl VariantBuilder {
         self
     }
 
+    /// This method pre-populates the field name directory in the Variant 
metadata with
+    /// the specific field names, in order.
+    ///
+    /// You can use this to pre-populate a [`VariantBuilder`] with a sorted 
dictionary if you
+    /// know the field names beforehand. Sorted dictionaries can accelerate 
field access when
+    /// reading [`Variant`]s.
+    pub fn with_field_names<'a>(mut self, field_names: impl Iterator<Item = 
&'a str>) -> Self {
+        self.metadata_builder.extend(field_names);

Review Comment:
   this is quite neat



##########
parquet-variant/src/builder.rs:
##########
@@ -237,18 +237,43 @@ impl ValueBuffer {
 struct MetadataBuilder {
     // Field names -- field_ids are assigned in insert order
     field_names: IndexSet<String>,
+
+    // flag that checks if field names by insertion order are also 
lexicographically sorted
+    is_sorted: bool,
 }
 
 impl MetadataBuilder {
     /// Upsert field name to dictionary, return its ID
     fn upsert_field_name(&mut self, field_name: &str) -> u32 {
-        let (id, _) = self.field_names.insert_full(field_name.to_string());
+        let (id, new_entry) = 
self.field_names.insert_full(field_name.to_string());
+
+        if new_entry {
+            let n = self.num_field_names();
+
+            self.is_sorted =
+                n == 1 || self.is_sorted & (self.field_names[n - 2] < 
self.field_names[n - 1]);
+
+            if n == 1 {

Review Comment:
   am I missing something? this check seems to be redundant with the 
`self.is_sorted` line above 🤔 



##########
parquet-variant/src/builder.rs:
##########
@@ -299,6 +324,23 @@ impl MetadataBuilder {
     }
 }
 
+impl<S: AsRef<str>> FromIterator<S> for MetadataBuilder {

Review Comment:
   This is pretty neat. 
   
   I think we should add tests of some type  for this `impl`s 
   
   I personally suggest adding to the doc tests for  `MetadataBuilder`, 
something like
   ```rust
   let metadata = MetadataBuilder::from(["foo", "bar" "baz"])
   ```
   
   or add a specific test below



##########
parquet-variant/src/builder.rs:
##########
@@ -480,6 +522,25 @@ impl VariantBuilder {
         self
     }
 
+    /// This method pre-populates the field name directory in the Variant 
metadata with

Review Comment:
   Bonus points for adding a new example to `VariantBuilder` showing how to 
create a Variant with a sorted directionary



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to