On Mon, 2021-08-23 at 13:51 +0000, Thomas Grainger wrote: > In all seriousness this is an actual problem with numpy/pandas arrays > where: >
<snip> > line 1537, in __nonzero__ > raise ValueError( > ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, > a.bool(), a.item(), a.any() or a.all(). > ``` > > eg > https://pandas.pydata.org/pandas-docs/version/1.3.0/user_guide/gotchas.html#using-if-truth-statements-with-pandas > > Should it be True because it’s not zero-length, or False because > > there are False values? It is unclear, so instead, pandas raises a > > ValueError: > > I'm not sure I believe the author here - I think it's clear. It > should be True because it's not zero-length. It must be undefined because the operators are elementwise operators and the concept of non-emptiness being True does not make sense for elementwise containers. arr = np.arange(5) if arr < 0: arr *= -1 is code that makes sense if `arr` was a number, but it is not meaningful for arrays. The second, distinct, problem is that `len` is non-obvious for many N-D containers: arr = np.ones(shape=(5, 0)) assert len(arr) == 5 assert arr.size == 0 # it is empty. NumPy breaks the contract that `len` is already the same as "size" [1]. (And we are stuck with it probably...) So the length definition of truth only works out for Python containers because `container == 0` is always obviously `False` already and you never have the `len != size` problem. An argument that I will bring is that the bigger problem for arrays may be that we don't have a concept for "elementwise container" (or maybe higher dimensional container, but I think the elementwise is the important distinction). A "size" protocol would be useful to deal with NumPy's choice of `len`! But, an "has elementwise operations" protocol may be more generally useful to code dealing with a mix of NumPy arrays or Python sequences – and even NumPy itself. (E.g. it also tells you that `+` will not concatenate and it could tell NumPy whether it should try coercing to an array or not.) Cheers, Sebastian [1] I will not argue that this is the best way to define it, I don't like the list-of-lists analogy, so I think that `len(arr) == arr.size` and `arr.__iter__` iterating all elements would be a better definition. (Making the notion of "length" equivalent to "size".) Or even refusing `len` unless 1-D! That would make everyone who argues to use `len()` always correct, or at least never incorrect. But it is simply not what we got... > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/P7FA42SYR3DKSVSBAVLIHGSJXO3AT33G/ > Code of Conduct: http://python.org/psf/codeofconduct/
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/THLJ2ET5NB54LCQII5NOIE62UJBBWSHY/ Code of Conduct: http://python.org/psf/codeofconduct/