Hi,

Recently, I revisited the code for array equality in Rust. While going
through it, I observed some assumptions about how we conclude that two
elements of an arrow array are equal, and when two arrays are equal.

The notion of equality is also used throughout the document e.g. when we
offer examples using "unspecified", we are implicitly arguing that we
should not care about that value when comparing arrays. It is also used
when we use the wording "unique values" in the dictionary-encoded arrays.

The notion of array equality is important when we want to verify
interoperability between languages, where we often need to compare arrays
(e.g. after a round-trip), as some implementations may change the data of
the "unspecified" slots and e.g. offsets.

More fundamentally, IMO the specification offers a physical representation
(buffers, childs, offests, etc) of a logical asset (lists, structs, int8,
int32), but currently does not say when two logical assets are considered
equal.

Would it make sense to systematize the notion of equality in the
specification, to align the different implementations into when they should
consider two arrays to be equal?

Best,
Jorge

Reply via email to