On Mon, 2020-03-23 at 11:45 -0700, Chris Barker wrote: > I've always found the duality of zero-d arrays an scalars confusing, > and > I'm sure I'm not alone. > > Having both is just plain weird.
I guess so, it is a tricky situation, and I do not really have an answer. > > But, backward compatibility aside, could we have ONLY Scalars? > > When we index into an array, the dimensionality is reduced by one, so > indexing into a 1D array has to get us something: but the zero-d > array is a > really weird object -- do we really need it? > Well, it is hard to write functions that work on N-Dimensions (where N can be 0), if the 0-D array does not exist. You can get away with scalars in most cases, because they pretend to be arrays in most cases (aside from mutability). But I am pretty sure we have a bunch of cases that need `res = np.asarray(res)` simply because `res` is N-D but could then be silently converted to a scalar. E.g. see https://github.com/numpy/numpy/issues/13105 for an issue about this (although it does not actually list any specific problems). - Sebastian > There is certainly a need for more numpy-like scalars: more than the > built > in data types, and some handy attributes and methods, like dtype, > .itemsize, etc. But could we make an enhanced scalar that had > everything we > actually need from a zero-d array? > > The key point would be mutability -- but do we really need mutable > scalars? > I can't think of any time I've needed that, when I couldn't have used > a 1-d > array of length 1. > > Is there a use case for zero-d arrays that could not be met with an > enhanced scalar? > > -CHB > > > > > > > > On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane < > allanhald...@gmail.com> > wrote: > > > I have some thoughts on scalars from playing with ndarray ducktypes > > (__array_function__), eg a MaskedArray ndarray-ducktype, for which > > I > > wanted an associated "MaskedScalar" type. > > > > In summary, the ways scalars currently work makes ducktyping > > (duck-scalars) difficult: > > > > * numpy scalar types are not subclassable, so my duck-scalars > > aren't > > subclasses of numpy scalars and aren't in the type hierarchy > > * even if scalars were subclassable, I would have to subclass > > each > > scalar datatype individually to make masked versions > > * lots of code checks `np.isinstance(var, np.float64)` which > > breaks > > for my duck-scalars > > * it was difficult to distinguish between a duck-scalar and a > > duck-0d > > array. The method I used in the end seems hacky. > > > > This has led to some daydreams about how scalars should work, and > > also > > led me last to read through your NEPs 40/41 with specific focus on > > what > > you said about scalars, and was about to post there until I saw > > this > > discussion. I agree with what you said in the NEPs about not making > > scalars be dtype instances. > > > > Here is what ducktypes led me to: > > > > If we are able to do something like define a `np.numpy_scalar` type > > covering all numpy scalars, which has a `.dtype` attribute like you > > describe in the NEPs, then that would seem to solve the ducktype > > problems above. Ducktype implementors would need to make a "duck- > > scalar" > > type in parallel to their "duck-ndarray" type, but I found that to > > be > > pretty easy using an abstract class in my MaskedArray ducktype, > > since > > the MaskedArray and MaskedScalar share a lot of behavior. > > > > A numpy_scalar type would also help solve some object-array > > problems if > > the object scalars are wrapped in the np_scalar type. A long time > > ago I > > started to try to fix up various funny/strange behaviors of object > > datatypes, but there are lots of special cases, and the main > > problem was > > that the returned objects (eg from indexing) were not numpy types > > and > > did not support numpy attributes or indexing. Wrapping the returned > > object in `np.numpy_scalar` might add an extra slight annoyance to > > people who want to unwrap the object, but I think it would make > > object > > arrays less buggy and make code using object arrays easier to > > reason > > about and debug. > > > > Finally, a few random votes/comments based on the other emails on > > the list: > > > > I think scalars have a place in numpy (rather than just reusing 0d > > arrays), since there is a clear use in having hashable, immutable > > scalars. Structured scalars should probably be immutable. > > > > I agree with your suggestion that scalars should not be indexable. > > Thus, > > my duck-scalars (and proposed numpy_scalar) would not be indexable. > > However, I think they should encode their datatype though a .dtype > > attribute like ndarrays, rather than by inheritance. > > > > Also, something to think about is that currently numpy scalars > > satisfy > > the property `isinstance(np.float64(1), float)`, i.e they are > > within the > > python numerical type hierarchy. 0d arrays do not have this > > property. My > > proposal above would break this. I'm not sure what to think about > > whether this is a good property to maintain or not. > > > > Cheers, > > Allan > > > > > > > > On 2/21/20 8:37 PM, Sebastian Berg wrote: > > > Hi all, > > > > > > When we create new datatypes, we have the option to make new > > > choices > > > for the new datatypes [0] (not the existing ones). > > > > > > The question is: Should every NumPy datatype have a scalar > > > associated > > > and should operations like indexing return a scalar or a 0-D > > > array? > > > > > > This is in my opinion a complex, almost philosophical, question, > > > and we > > > do not have to settle anything for a long time. But, if we do not > > > decide a direction before we have many new datatypes the decision > > > will > > > make itself... > > > So happy about any ideas, even if its just a gut feeling :). > > > > > > There are various points. I would like to mostly ignore the > > > technical > > > ones, but I am listing them anyway here: > > > > > > * Scalars are faster (although that can be optimized likely) > > > > > > * Scalars have a lower memory footprint > > > > > > * The current implementation incurs a technical debt in NumPy. > > > (I do not think that is a general issue, though. We could > > > automatically create scalars for each new datatype probably.) > > > > > > Advantages of having no scalars: > > > > > > * No need to keep track of scalars to preserve them in ufuncs, > > > or > > > libraries using `np.asarray`, do they need > > > `np.asarray_or_scalar`? > > > (or decide they return always arrays, although ufuncs may > > > not) > > > > > > * Seems simpler in many ways, you always know the output will > > > be an > > > array if it has to do with NumPy. > > > > > > Advantages of having scalars: > > > > > > * Scalars are immutable and we are used to them from Python. > > > A 0-D array cannot be used as a dictionary key consistently > > > [1]. > > > > > > I.e. without scalars as first class citizen `dict[arr1d[0]]` > > > cannot work, `dict[arr1d[0].item()]` may (if `.item()` is > > > defined, > > > and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. > > > [2] > > > > > > * Object arrays as we have them now make sense, `arr1d[0]` can > > > reasonably return a Python object. I.e. arrays feel more like > > > container if you can take elements out easily. > > > > > > Could go both ways: > > > > > > * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the > > > array > > > without scalars. With scalars `arr1d[0, ...]` clarifies the > > > meaning. (In principle it is good to never use `arr2d[0]` to > > > get a 1D slice, probably more-so if scalars exist.) > > > > > > Note: array-scalars (the current NumPy scalars) are not useful in > > > my > > > opinion [3]. A scalar should not be indexed or have a shape. I do > > > not > > > believe in scalars pretending to be arrays. > > > > > > I personally tend towards liking scalars. If Python was a > > > language > > > where the array (array-programming) concept was ingrained into > > > the > > > language itself, I would lean the other way. But users are used > > > to > > > scalars, and they "put" scalars into arrays. Array objects are in > > > some > > > ways strange in Python, and I feel not having scalars detaches > > > them > > > further. > > > > > > Having scalars, however also means we should preserve them. I > > > feel in > > > principle that is actually fairly straight forward. E.g. for > > > ufuncs: > > > > > > * np.add(scalar, scalar) -> scalar > > > * np.add.reduce(arr, axis=None) -> scalar > > > * np.add.reduce(arr, axis=1) -> array (even if arr is 1d) > > > * np.add.reduce(scalar, axis=()) -> array > > > > > > Of course libraries that do `np.asarray` would/could basically > > > chose to > > > not preserve scalars: Their signature is defined as taking > > > strictly > > > array input. > > > > > > Cheers, > > > > > > Sebastian > > > > > > > > > [0] At best this can be a vision to decide which way they may > > > evolve. > > > > > > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is > > > arguably > > > strange. E.g. Quantity defines hash correctly, but does not fully > > > ensure immutability for 0-D Quantities. Ensuring immutability in > > > a > > > world where "views" are a central concept requires a write-only > > > copy. > > > > > > [2] Arguably `.item()` would always return a scalar, but it would > > > be a > > > second class citizen. (Although if it returns a scalar, at least > > > we > > > already have a scalar implementation.) > > > > > > [3] They are necessary due to technical debt for NumPy datatypes > > > though. > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion@python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion