I personally have always found it weird and annoying to deal with 0-D arrays, so +1 for scalars!*
Juan *: admittedly, I have almost no grasp of the underlying NumPy implementation complexities, but I will happily take Sebastian's word that scalars can be consistent with the library. On Fri, 21 Feb 2020, at 7:37 PM, Sebastian Berg wrote: > Hi all, > > When we create new datatypes, we have the option to make new choices > for the new datatypes [0] (not the existing ones). > > The question is: Should every NumPy datatype have a scalar associated > and should operations like indexing return a scalar or a 0-D array? > > This is in my opinion a complex, almost philosophical, question, and we > do not have to settle anything for a long time. But, if we do not > decide a direction before we have many new datatypes the decision will > make itself... > So happy about any ideas, even if its just a gut feeling :). > > There are various points. I would like to mostly ignore the technical > ones, but I am listing them anyway here: > > * Scalars are faster (although that can be optimized likely) > > * Scalars have a lower memory footprint > > * The current implementation incurs a technical debt in NumPy. > (I do not think that is a general issue, though. We could > automatically create scalars for each new datatype probably.) > > Advantages of having no scalars: > > * No need to keep track of scalars to preserve them in ufuncs, or > libraries using `np.asarray`, do they need `np.asarray_or_scalar`? > (or decide they return always arrays, although ufuncs may not) > > * Seems simpler in many ways, you always know the output will be an > array if it has to do with NumPy. > > Advantages of having scalars: > > * Scalars are immutable and we are used to them from Python. > A 0-D array cannot be used as a dictionary key consistently [1]. > > I.e. without scalars as first class citizen `dict[arr1d[0]]` > cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined, > and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2] > > * Object arrays as we have them now make sense, `arr1d[0]` can > reasonably return a Python object. I.e. arrays feel more like > container if you can take elements out easily. > > Could go both ways: > > * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array > without scalars. With scalars `arr1d[0, ...]` clarifies the > meaning. (In principle it is good to never use `arr2d[0]` to > get a 1D slice, probably more-so if scalars exist.) > > Note: array-scalars (the current NumPy scalars) are not useful in my > opinion [3]. A scalar should not be indexed or have a shape. I do not > believe in scalars pretending to be arrays. > > I personally tend towards liking scalars. If Python was a language > where the array (array-programming) concept was ingrained into the > language itself, I would lean the other way. But users are used to > scalars, and they "put" scalars into arrays. Array objects are in some > ways strange in Python, and I feel not having scalars detaches them > further. > > Having scalars, however also means we should preserve them. I feel in > principle that is actually fairly straight forward. E.g. for ufuncs: > > * np.add(scalar, scalar) -> scalar > * np.add.reduce(arr, axis=None) -> scalar > * np.add.reduce(arr, axis=1) -> array (even if arr is 1d) > * np.add.reduce(scalar, axis=()) -> array > > Of course libraries that do `np.asarray` would/could basically chose to > not preserve scalars: Their signature is defined as taking strictly > array input. > > Cheers, > > Sebastian > > > [0] At best this can be a vision to decide which way they may evolve. > > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably > strange. E.g. Quantity defines hash correctly, but does not fully > ensure immutability for 0-D Quantities. Ensuring immutability in a > world where "views" are a central concept requires a write-only copy. > > [2] Arguably `.item()` would always return a scalar, but it would be a > second class citizen. (Although if it returns a scalar, at least we > already have a scalar implementation.) > > [3] They are necessary due to technical debt for NumPy datatypes > though. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > *Attachments:* > * signature.asc
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion