Re: [Numpy-discussion] Proposal to accept NEP #44: Restructuring the NumPy Documentation

2020-02-21 Thread Stefan van der Walt
On Wed, Feb 19, 2020, at 03:58, Melissa Mendonça wrote:
> I am proposing the acceptance of NEP 44 - Restructuring the NumPy 
> Documentation. 
> 
> https://numpy.org/neps/nep-0044-restructuring-numpy-docs.html

Thanks, Melissa, for developing this NEP! The plan makes sense to me.

Best regards,
Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Scikit-learn in the news.

2020-02-21 Thread Charles R Harris
Hi All,

Just thought I mention a new paper where scikit-learn was used: A Deep
Learning Approach to Antibiotic Discovery
.
Congratulations to the scikit-learn team.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] New DTypes: Are scalars a central concept in NumPy or not?

2020-02-21 Thread Sebastian Berg
Hi all,

When we create new datatypes, we have the option to make new choices
for the new datatypes [0] (not the existing ones).

The question is: Should every NumPy datatype have a scalar associated
and should operations like indexing return a scalar or a 0-D array?

This is in my opinion a complex, almost philosophical, question, and we
do not have to settle anything for a long time. But, if we do not
decide a direction before we have many new datatypes the decision will
make itself...
So happy about any ideas, even if its just a gut feeling :).

There are various points. I would like to mostly ignore the technical
ones, but I am listing them anyway here:

  * Scalars are faster (although that can be optimized likely)

  * Scalars have a lower memory footprint

  * The current implementation incurs a technical debt in NumPy.
(I do not think that is a general issue, though. We could
automatically create scalars for each new datatype probably.)

Advantages of having no scalars:

  * No need to keep track of scalars to preserve them in ufuncs, or
libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
(or decide they return always arrays, although ufuncs may not)

  * Seems simpler in many ways, you always know the output will be an
array if it has to do with NumPy.

Advantages of having scalars:

  * Scalars are immutable and we are used to them from Python.
A 0-D array cannot be used as a dictionary key consistently [1].

I.e. without scalars as first class citizen `dict[arr1d[0]]`
cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]

  * Object arrays as we have them now make sense, `arr1d[0]` can
reasonably return a Python object. I.e. arrays feel more like
container if you can take elements out easily.

Could go both ways:

  * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
without scalars. With scalars `arr1d[0, ...]` clarifies the
meaning. (In principle it is good to never use `arr2d[0]` to
get a 1D slice, probably more-so if scalars exist.)

Note: array-scalars (the current NumPy scalars) are not useful in my
opinion [3]. A scalar should not be indexed or have a shape. I do not
believe in scalars pretending to be arrays.

I personally tend towards liking scalars.  If Python was a language
where the array (array-programming) concept was ingrained into the
language itself, I would lean the other way. But users are used to
scalars, and they "put" scalars into arrays. Array objects are in some
ways strange in Python, and I feel not having scalars detaches them
further.

Having scalars, however also means we should preserve them. I feel in
principle that is actually fairly straight forward. E.g. for ufuncs:

   * np.add(scalar, scalar) -> scalar
   * np.add.reduce(arr, axis=None) -> scalar
   * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
   * np.add.reduce(scalar, axis=()) -> array

Of course libraries that do `np.asarray` would/could basically chose to
not preserve scalars: Their signature is defined as taking strictly
array input.

Cheers,

Sebastian


[0] At best this can be a vision to decide which way they may evolve.

[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
strange. E.g. Quantity defines hash correctly, but does not fully 
ensure immutability for 0-D Quantities. Ensuring immutability in a
world where "views" are a central concept requires a write-only copy.

[2] Arguably `.item()` would always return a scalar, but it would be a
second class citizen. (Although if it returns a scalar, at least we
already have a scalar implementation.)

[3] They are necessary due to technical debt for NumPy datatypes
though.


signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] New DTypes: Are scalars a central concept in NumPy or not?

2020-02-21 Thread Juan Nunez-Iglesias
I personally have always found it weird and annoying to deal with 0-D arrays, 
so +1 for scalars!*

Juan

*: admittedly, I have almost no grasp of the underlying NumPy implementation 
complexities, but I will happily take Sebastian's word that scalars can be 
consistent with the library.

On Fri, 21 Feb 2020, at 7:37 PM, Sebastian Berg wrote:
> Hi all,
> 
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
> 
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0-D array?
> 
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
> 
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
> 
>  * Scalars are faster (although that can be optimized likely)
> 
>  * Scalars have a lower memory footprint
> 
>  * The current implementation incurs a technical debt in NumPy.
>  (I do not think that is a general issue, though. We could
>  automatically create scalars for each new datatype probably.)
> 
> Advantages of having no scalars:
> 
>  * No need to keep track of scalars to preserve them in ufuncs, or
>  libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
>  (or decide they return always arrays, although ufuncs may not)
> 
>  * Seems simpler in many ways, you always know the output will be an
>  array if it has to do with NumPy.
> 
> Advantages of having scalars:
> 
>  * Scalars are immutable and we are used to them from Python.
>  A 0-D array cannot be used as a dictionary key consistently [1].
> 
>  I.e. without scalars as first class citizen `dict[arr1d[0]]`
>  cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
>  and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
> 
>  * Object arrays as we have them now make sense, `arr1d[0]` can
>  reasonably return a Python object. I.e. arrays feel more like
>  container if you can take elements out easily.
> 
> Could go both ways:
> 
>  * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
>  without scalars. With scalars `arr1d[0, ...]` clarifies the
>  meaning. (In principle it is good to never use `arr2d[0]` to
>  get a 1D slice, probably more-so if scalars exist.)
> 
> Note: array-scalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
> 
> I personally tend towards liking scalars. If Python was a language
> where the array (array-programming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
> 
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
> 
>  * np.add(scalar, scalar) -> scalar
>  * np.add.reduce(arr, axis=None) -> scalar
>  * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
>  * np.add.reduce(scalar, axis=()) -> array
> 
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
> 
> Cheers,
> 
> Sebastian
> 
> 
> [0] At best this can be a vision to decide which way they may evolve.
> 
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully 
> ensure immutability for 0-D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a write-only copy.
> 
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
> 
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
> *Attachments:*
>  * signature.asc
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion