sorry to have fallen off the numpy grid for a bit, but: On Mon, Mar 23, 2020 at 1:37 PM Sebastian Berg <sebast...@sipsolutions.net> wrote:
> On Mon, 2020-03-23 at 11:45 -0700, Chris Barker wrote: > > But, backward compatibility aside, could we have ONLY Scalars? > > Well, it is hard to write functions that work on N-Dimensions (where N > can be 0), if the 0-D array does not exist. You can get away with > scalars in most cases, because they pretend to be arrays in most cases > (aside from mutability). > > But I am pretty sure we have a bunch of cases that need > `res = np.asarray(res)` simply because `res` is N-D but could then be > silently converted to a scalar. E.g. see > https://github.com/numpy/numpy/issues/13105 for an issue about this > (although it does not actually list any specific problems). > I'm not sure this is insolvable (again, backwards compatibility aside) -- after all, one of the key issues is that it's undetermined what the rank should be of: array(a_scalar) -- 0-d is the only unambiguous answer, but then it's not really an array in the usual sense anyway. So in theory, we could not allow that conversion without specifying a rank. at the end of the day, there has to be some endpoint on how far you can reduce the rank of an array and have it work -- why not have 1 be the lower limit? -CHB > - Sebastian > > > > There is certainly a need for more numpy-like scalars: more than the > > built > > in data types, and some handy attributes and methods, like dtype, > > .itemsize, etc. But could we make an enhanced scalar that had > > everything we > > actually need from a zero-d array? > > > > The key point would be mutability -- but do we really need mutable > > scalars? > > I can't think of any time I've needed that, when I couldn't have used > > a 1-d > > array of length 1. > > > > Is there a use case for zero-d arrays that could not be met with an > > enhanced scalar? > > > > -CHB > > > > > > > > > > > > > > > > On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane < > > allanhald...@gmail.com> > > wrote: > > > > > I have some thoughts on scalars from playing with ndarray ducktypes > > > (__array_function__), eg a MaskedArray ndarray-ducktype, for which > > > I > > > wanted an associated "MaskedScalar" type. > > > > > > In summary, the ways scalars currently work makes ducktyping > > > (duck-scalars) difficult: > > > > > > * numpy scalar types are not subclassable, so my duck-scalars > > > aren't > > > subclasses of numpy scalars and aren't in the type hierarchy > > > * even if scalars were subclassable, I would have to subclass > > > each > > > scalar datatype individually to make masked versions > > > * lots of code checks `np.isinstance(var, np.float64)` which > > > breaks > > > for my duck-scalars > > > * it was difficult to distinguish between a duck-scalar and a > > > duck-0d > > > array. The method I used in the end seems hacky. > > > > > > This has led to some daydreams about how scalars should work, and > > > also > > > led me last to read through your NEPs 40/41 with specific focus on > > > what > > > you said about scalars, and was about to post there until I saw > > > this > > > discussion. I agree with what you said in the NEPs about not making > > > scalars be dtype instances. > > > > > > Here is what ducktypes led me to: > > > > > > If we are able to do something like define a `np.numpy_scalar` type > > > covering all numpy scalars, which has a `.dtype` attribute like you > > > describe in the NEPs, then that would seem to solve the ducktype > > > problems above. Ducktype implementors would need to make a "duck- > > > scalar" > > > type in parallel to their "duck-ndarray" type, but I found that to > > > be > > > pretty easy using an abstract class in my MaskedArray ducktype, > > > since > > > the MaskedArray and MaskedScalar share a lot of behavior. > > > > > > A numpy_scalar type would also help solve some object-array > > > problems if > > > the object scalars are wrapped in the np_scalar type. A long time > > > ago I > > > started to try to fix up various funny/strange behaviors of object > > > datatypes, but there are lots of special cases, and the main > > > problem was > > > that the returned objects (eg from indexing) were not numpy types > > > and > > > did not support numpy attributes or indexing. Wrapping the returned > > > object in `np.numpy_scalar` might add an extra slight annoyance to > > > people who want to unwrap the object, but I think it would make > > > object > > > arrays less buggy and make code using object arrays easier to > > > reason > > > about and debug. > > > > > > Finally, a few random votes/comments based on the other emails on > > > the list: > > > > > > I think scalars have a place in numpy (rather than just reusing 0d > > > arrays), since there is a clear use in having hashable, immutable > > > scalars. Structured scalars should probably be immutable. > > > > > > I agree with your suggestion that scalars should not be indexable. > > > Thus, > > > my duck-scalars (and proposed numpy_scalar) would not be indexable. > > > However, I think they should encode their datatype though a .dtype > > > attribute like ndarrays, rather than by inheritance. > > > > > > Also, something to think about is that currently numpy scalars > > > satisfy > > > the property `isinstance(np.float64(1), float)`, i.e they are > > > within the > > > python numerical type hierarchy. 0d arrays do not have this > > > property. My > > > proposal above would break this. I'm not sure what to think about > > > whether this is a good property to maintain or not. > > > > > > Cheers, > > > Allan > > > > > > > > > > > > On 2/21/20 8:37 PM, Sebastian Berg wrote: > > > > Hi all, > > > > > > > > When we create new datatypes, we have the option to make new > > > > choices > > > > for the new datatypes [0] (not the existing ones). > > > > > > > > The question is: Should every NumPy datatype have a scalar > > > > associated > > > > and should operations like indexing return a scalar or a 0-D > > > > array? > > > > > > > > This is in my opinion a complex, almost philosophical, question, > > > > and we > > > > do not have to settle anything for a long time. But, if we do not > > > > decide a direction before we have many new datatypes the decision > > > > will > > > > make itself... > > > > So happy about any ideas, even if its just a gut feeling :). > > > > > > > > There are various points. I would like to mostly ignore the > > > > technical > > > > ones, but I am listing them anyway here: > > > > > > > > * Scalars are faster (although that can be optimized likely) > > > > > > > > * Scalars have a lower memory footprint > > > > > > > > * The current implementation incurs a technical debt in NumPy. > > > > (I do not think that is a general issue, though. We could > > > > automatically create scalars for each new datatype probably.) > > > > > > > > Advantages of having no scalars: > > > > > > > > * No need to keep track of scalars to preserve them in ufuncs, > > > > or > > > > libraries using `np.asarray`, do they need > > > > `np.asarray_or_scalar`? > > > > (or decide they return always arrays, although ufuncs may > > > > not) > > > > > > > > * Seems simpler in many ways, you always know the output will > > > > be an > > > > array if it has to do with NumPy. > > > > > > > > Advantages of having scalars: > > > > > > > > * Scalars are immutable and we are used to them from Python. > > > > A 0-D array cannot be used as a dictionary key consistently > > > > [1]. > > > > > > > > I.e. without scalars as first class citizen `dict[arr1d[0]]` > > > > cannot work, `dict[arr1d[0].item()]` may (if `.item()` is > > > > defined, > > > > and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. > > > > [2] > > > > > > > > * Object arrays as we have them now make sense, `arr1d[0]` can > > > > reasonably return a Python object. I.e. arrays feel more like > > > > container if you can take elements out easily. > > > > > > > > Could go both ways: > > > > > > > > * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the > > > > array > > > > without scalars. With scalars `arr1d[0, ...]` clarifies the > > > > meaning. (In principle it is good to never use `arr2d[0]` to > > > > get a 1D slice, probably more-so if scalars exist.) > > > > > > > > Note: array-scalars (the current NumPy scalars) are not useful in > > > > my > > > > opinion [3]. A scalar should not be indexed or have a shape. I do > > > > not > > > > believe in scalars pretending to be arrays. > > > > > > > > I personally tend towards liking scalars. If Python was a > > > > language > > > > where the array (array-programming) concept was ingrained into > > > > the > > > > language itself, I would lean the other way. But users are used > > > > to > > > > scalars, and they "put" scalars into arrays. Array objects are in > > > > some > > > > ways strange in Python, and I feel not having scalars detaches > > > > them > > > > further. > > > > > > > > Having scalars, however also means we should preserve them. I > > > > feel in > > > > principle that is actually fairly straight forward. E.g. for > > > > ufuncs: > > > > > > > > * np.add(scalar, scalar) -> scalar > > > > * np.add.reduce(arr, axis=None) -> scalar > > > > * np.add.reduce(arr, axis=1) -> array (even if arr is 1d) > > > > * np.add.reduce(scalar, axis=()) -> array > > > > > > > > Of course libraries that do `np.asarray` would/could basically > > > > chose to > > > > not preserve scalars: Their signature is defined as taking > > > > strictly > > > > array input. > > > > > > > > Cheers, > > > > > > > > Sebastian > > > > > > > > > > > > [0] At best this can be a vision to decide which way they may > > > > evolve. > > > > > > > > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is > > > > arguably > > > > strange. E.g. Quantity defines hash correctly, but does not fully > > > > ensure immutability for 0-D Quantities. Ensuring immutability in > > > > a > > > > world where "views" are a central concept requires a write-only > > > > copy. > > > > > > > > [2] Arguably `.item()` would always return a scalar, but it would > > > > be a > > > > second class citizen. (Although if it returns a scalar, at least > > > > we > > > > already have a scalar implementation.) > > > > > > > > [3] They are necessary due to technical debt for NumPy datatypes > > > > though. > > > > > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion@python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion@python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion