Hi Wes,

Thanks a lot. I agree. My question is whether we should make it explicit in
the specification. AFAIK, "if the data represented in the slot is equal"
depends on the datatype: for variable sized arrays with offsets (e.g.
strings), the equality of slot i is something along the lines of:

start = lhs.buffer[0][(lhs.offset + i) * size_of<T>] as T
end = lhs.buffer[0][(lhs.offset + i + 1) * size_of<T>] as T
lhs_value = lhs.buffer[1][start..end]
# same for rhs
lhs_value == rhs_value

This logic is also tricky for any type with childs, where we need to
compare the slot of the child through recursion.
These things are not really implementation specific, yet they are really
important when implementations inter-operate.

Best,
Jorge




On Thu, Nov 5, 2020 at 3:44 PM Wes McKinney <wesmck...@gmail.com> wrote:

> hi Jorge,
>
> The intent when authoring the specification was as follows
>
> * If two array slots being compared are both null, then they are equal
> * If one is null and the other is not, they are not equal
> * If they are both not null, then they are equal if the data
> represented in the slot is equal (and if dictionary indices reference
> the same dictionary value, even if the dictionaries are different,
> then they are equal because the data they represent is the same)
>
> - Wes
>
> On Thu, Nov 5, 2020 at 1:13 AM Jorge Cardoso Leitão
> <jorgecarlei...@gmail.com> wrote:
> >
> > Hi,
> >
> > Recently, I revisited the code for array equality in Rust. While going
> > through it, I observed some assumptions about how we conclude that two
> > elements of an arrow array are equal, and when two arrays are equal.
> >
> > The notion of equality is also used throughout the document e.g. when we
> > offer examples using "unspecified", we are implicitly arguing that we
> > should not care about that value when comparing arrays. It is also used
> > when we use the wording "unique values" in the dictionary-encoded arrays.
> >
> > The notion of array equality is important when we want to verify
> > interoperability between languages, where we often need to compare arrays
> > (e.g. after a round-trip), as some implementations may change the data of
> > the "unspecified" slots and e.g. offsets.
> >
> > More fundamentally, IMO the specification offers a physical
> representation
> > (buffers, childs, offests, etc) of a logical asset (lists, structs, int8,
> > int32), but currently does not say when two logical assets are considered
> > equal.
> >
> > Would it make sense to systematize the notion of equality in the
> > specification, to align the different implementations into when they
> should
> > consider two arrays to be equal?
> >
> > Best,
> > Jorge
>

Reply via email to