Hello,

Recent changes to PyArrow seem to have taken the stance that comparing
null values should return null.  The problem is that it breaks the
expectation that comparisons should return booleans, and perculates into
crazy behaviour in other places.  Here is an example of such
misbehaviour in the scalar refactor PR:

>>> import pyarrow as pa


>>> na = pa.scalar(None)


>>> na == na


<pyarrow.NullScalar: None>
>>> na == 5


<pyarrow.NullScalar: None>
>>> bool(na == 5)


True
>>> if na == 5: print("yo!")


yo!
>>> na in [5]


True

But you can see it also with arrays containing null values:

>>> pa.array([1, None]) in [pa.scalar(42)]


True

I think that Python equality operators should behave in a
Python-sensible way (return True or False).  Have people call another
method if they like the fancy (or noxious, depending on the POV)
semantics of returning null when comparing null with anything.

(note that Numpy doesn't have null scalars, so it can be less
conservative in its customization of equality methods)

Regards

Antoine.

Reply via email to