Yimche commented on issue #43985:
URL: https://github.com/apache/arrow/issues/43985#issuecomment-2424838714
> I would expect the result to be the same whether the Python object is the
same or if the object is a copy
I think I see where you're coming from, and I think you're correct as what
is happening here is that (if my understanding of the equality function is
correct):
```Python
def equals(self, Table other, bint check_metadata=False):
self._assert_cpu()
if other is None:
return False
cdef:
CTable* this_table = self.table
CTable* other_table = other.table
c_bool result
with nogil:
result = this_table.Equals(deref(other_table), check_metadata)
return result
```
In terms of C, since it points to itself, and by extension the same "NaN"
data, it results to being equal. The copy previously notioned wasn't a copy of
the pointer, but a copy by value, hence the failed comparison, as I think C
would have just had whatever data was previously there, if it wasn't just
filled with random data. If we were to instead do a copy by reference (i.e.
making table_2 point to the same point of memory) it passes (but note this is
just restating an equals to self).
```Python
>>> import pyarrow as pa
>>> table_1 = pa.Table.from_pydict({"foo": [float("nan")]})
>>> table_2 = pa.Table.from_pydict({"foo": [float("nan")]})
>>> table_1 == table_2
False
>>> table_2 = table_1
>>> table_1 == table_2
True
```
I will note that when looking for similar situations I noticed some
inconsistencies in how other libraries deal with such an edge case with NaNs:
```Python
>>> import numpy as np
>>> a1 = np.array([float("nan")])
>>> a2 = np.array([float("nan")])
>>> np.array_equal(a1, a2)
False
>>> np.array_equal(a1, a1)
False
```
and
```Python
>>> import pandas as pd
>>> table1 = pd.DataFrame([[float("nan")]])
>>> table2 = pd.DataFrame([[float("nan")]])
>>> table1.equals(table2)
True
>>> table1.equals(table1)
True
```
So this feels like something a pyarrow maintainer or the greater community
need to decide on as the "correct" behaviour for the equality comparison.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]