friendlymatthew commented on PR #7961:
URL: https://github.com/apache/arrow-rs/pull/7961#issuecomment-3092196767

   > That way, two metadata dictionaries only compare equal if they contain the 
same strings and they assign the same field ids to those strings. Such a 
logical comparison makes it safe to swap the bytes of one metadata dictionary 
with the bytes of another that compares logically equal, e.g. to improve 
parquet dictionary encoding of the field. But I'm not sure that would happen 
often enough to be worth optimizing for? Especially because (for unordered 
metadata at least) one would likely want the ability to replace a metadata 
dictionary with a different one that provides a superset of field names (with 
matching field ids in the common part).
   
   I think @scovich's comment about metadata equality is super interesting. I 
will think more about this
   
   I could imagine having such logical comparison could be useful. @alamb and I 
were discussing encoding a single metadata dictionary per parquet file. This 
would only be possible if we know whether every row in the metadata column have 
the "same" metadata dictionary


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to