On Mon, 2021-08-23 at 13:51 +0000, Thomas Grainger wrote:
> In all seriousness this is an actual problem with numpy/pandas arrays
> where:
> 

<snip>

>  line 1537, in __nonzero__
>     raise ValueError(
> ValueError: The truth value of a DataFrame is ambiguous. Use a.empty,
> a.bool(), a.item(), a.any() or a.all().
> ```
> 
> eg  
> https://pandas.pydata.org/pandas-docs/version/1.3.0/user_guide/gotchas.html#using-if-truth-statements-with-pandas
> > Should it be True because it’s not zero-length, or False because
> > there are False values? It is unclear, so instead, pandas raises a
> > ValueError:
> 
> I'm not sure I believe the author here - I think it's clear. It
> should be True because it's not zero-length.


It must be undefined because the operators are elementwise operators
and the concept of non-emptiness being True does not make sense for
elementwise containers.

   arr = np.arange(5)
   if arr < 0:
       arr *= -1

is code that makes sense if `arr` was a number, but it is not
meaningful for arrays.
The second, distinct, problem is that `len` is non-obvious for many N-D
containers:

    arr = np.ones(shape=(5, 0))
    assert len(arr) == 5
    assert arr.size == 0  # it is empty.

NumPy breaks the contract that `len` is already the same as "size" [1].
(And we are stuck with it probably...)


So the length definition of truth only works out for Python containers
because `container == 0` is always obviously `False` already and you
never have the `len != size` problem.


An argument that I will bring is that the bigger problem for arrays may
be that we don't have a concept for "elementwise container" (or maybe
higher dimensional container, but I think the elementwise is the
important distinction).

A "size" protocol would be useful to deal with NumPy's choice of `len`!

But, an "has elementwise operations" protocol may be more generally
useful to code dealing with a mix of NumPy arrays or Python sequences –
and even NumPy itself.
(E.g. it also tells you that `+` will not concatenate and it could tell
NumPy whether it should try coercing to an array or not.)

Cheers,

Sebastian


[1] I will not argue that this is the best way to define it, I don't
like the list-of-lists analogy, so I think that `len(arr) == arr.size`
and `arr.__iter__` iterating all elements would be a better definition.
(Making the notion of "length" equivalent to "size".)

Or even refusing `len` unless 1-D!

That would make everyone who argues to use `len()` always correct, or
at least never incorrect.  But it is simply not what we got...


> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/P7FA42SYR3DKSVSBAVLIHGSJXO3AT33G/
> Code of Conduct: http://python.org/psf/codeofconduct/

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/THLJ2ET5NB54LCQII5NOIE62UJBBWSHY/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to