friendlymatthew commented on PR #7961: URL: https://github.com/apache/arrow-rs/pull/7961#issuecomment-3092196767
> That way, two metadata dictionaries only compare equal if they contain the same strings and they assign the same field ids to those strings. Such a logical comparison makes it safe to swap the bytes of one metadata dictionary with the bytes of another that compares logically equal, e.g. to improve parquet dictionary encoding of the field. But I'm not sure that would happen often enough to be worth optimizing for? Especially because (for unordered metadata at least) one would likely want the ability to replace a metadata dictionary with a different one that provides a superset of field names (with matching field ids in the common part). I think @scovich's comment about metadata equality is super interesting. I will think more about this I could imagine having such logical comparison could be useful. @alamb and I were discussing encoding a single metadata dictionary per parquet file. This would only be possible if we know whether every row in the metadata column have the "same" metadata dictionary -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org