scovich commented on code in PR #7943:
URL: https://github.com/apache/arrow-rs/pull/7943#discussion_r2214222666


##########
parquet-variant/src/variant/metadata.rs:
##########
@@ -332,6 +334,30 @@ impl<'m> VariantMetadata<'m> {
     }
 }
 
+// According to the spec, metadata dictionaries are not required to be in a 
specific order,
+// to enable flexibility when constructing Variant values
+//
+// Instead of comparing the raw bytes of 2 variant metadata instances, this 
implementation
+// checks whether the dictionary entries are equal -- regardless of their 
sorting order
+impl<'m> PartialEq for VariantMetadata<'m> {
+    fn eq(&self, other: &Self) -> bool {

Review Comment:
   Given the cost of constructing hash maps etc, is it worth adding the 
following quick-check in case the dictionaries are both sorted?
   ```rust
   if self.is_ordered() && other.is_ordered() {
       if self.len() != other.len() {
           return false;
       }
       let self_value_bytes = ... all string value bytes ...;
       let other_value_bytes = ... all string value bytes ...;
       return self_value_bytes == other_value_bytes;
   }
   ```
   Before trusting a single long string compare tho, we would need to convince 
ourselves that there's no way two dictionaries with different offsets can have 
identical value bytes. Otherwise, we'd have to loop over the two sets of 
strings manually.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to