[Numpy-discussion] Re: ENH: Efficient operations on already sorted arrays

2024-01-16 Thread Peter Schneider-Kamp via NumPy-Discussion
Dear all,

I second Yagiz’ proposal. I do, however, see that we need to ensure code style 
(and probably other forms of) consistency before merging these new functions 
in. And, particularly important, adherence to the conventions of function and 
method names.

Cheers,
Peter

From: Yağız Ölmez 
Date: Monday, 15 January 2024 at 14.16
To: numpy-discussion@python.org 
Subject: [Numpy-discussion] ENH: Efficient operations on already sorted arrays
You don't often get email from yagiz.ol...@gmail.com. Learn why this is 
important
Dear Numpy Community

It has come to my attention that there is no function in Numpy to merge two 
sorted arrays. There was a request for it in 2014, but it did not go anywhere:

https://github.com/numpy/numpy/issues/5000

I have come across this package by Frank Sauerburger, which implements this and 
many other operations on sorted arrays:

https://gitlab.sauerburger.com/frank/sortednp

This package is distributed under MIT License, so it can be merged into Numpy.
Please let me know what you think!

Best
Yagiz Olmez
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: NEP 56: array API standard support in the main numpy namespace

2024-01-16 Thread Stephan Hoyer
On Sun, Jan 7, 2024 at 8:08 AM Ralf Gommers  wrote:

> This NEP will supersede the following NEPs:
>
> - :ref:`NEP30` (never implemented)
> - :ref:`NEP31` (never implemented)
> - :ref:`NEP37` (never implemented; the ``__array_module__`` idea is
> basically
>   the same as ``__array_namespace__``)
> - :ref:`NEP47` (implemented with an experimental label in
> ``numpy.array_api``,
>   will be removed)
>

Thanks Ralf, Mateusz and Nathan for putting this together.

I just wanted to comment briefly to voice my strong support for this
proposal, and especially for marking these other NEPs as superseded. This
will go a long way towards clarifying NumPy's support for generic array
interfaces.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Feature request: Extension of the np.argsort Function - Returning Positional Information for Data

2024-01-16 Thread hao chen
When dealing with lists that contain duplicate data, np.argsort fails to
return index values that correspond to the actual sorting positions of the
data, as it does when handling arrays without duplicates.

Dear Author:

When I use the np.argsort function on an array without duplicate data, the
returned index values correspond to the sorting positions of the respective
data.😀

x = [1, 2, 5, 4]
rank = np.argsort(x)
print(rank)
# [0 1 3 2]

However, when there are duplicate values, the results from np.argsort
sometimes do not correspond to the sorting positions of the respective data
.

x = [1, 4, 1, 1, 2, 4, 5]
rank = np.argsort(x)
print(rank)
# [0 2 3 4 1 5 6]

Assuming a person frequently uses np.argsort to obtain positions by sorting
data without duplicates, introducing duplicate values in the data may lead
to inconspicuous errors in positions that are difficult to detect.
Moreover, as the dataset grows, identifying such issues may become even
more challenging.

For users in this situation, the desired results might be achieved using
the following function:

import numpy as np

def my_sort(x):
arg_x = np.sort(x)
rank = [np.where(arg_x == i)[0][0] for i in x]
return np.array(rank)

x = [1, 4, 1, 1, 2, 4, 5]
rank_arg = np.argsort(x)
rank_position = my_sort(x)
print("rank_arg",rank_arg)
print("rank_position",rank_position)
# rank_arg [0 2 3 4 1 5 6]
# rank_position [0 4 0 0 3 4 6]

This method produces results consistent with np.argsort when applied to
arrays without duplicate values.

x = [1, 2, 5, 4]
rank_arg = np.argsort(x)
rank_position = my_sort(x)
print("rank_arg",rank_arg)
print("rank_position",rank_position)
# rank_arg [0 1 3 2]
# rank_position [0 1 3 2]

Although there is no issue with the documentation of np.argsort itself, the
need you've highlighted may be widespread. Therefore, it might be worth
considering the addition of a function, for example, np.position(data), to
enhance the functionality of numpy.

My device:

system: win10 64x python version: 3.9.11 numpy version: 1.21.2

Sincerely, Looking forward to your response! 
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Feature request: Extension of the np.argsort Function - Returning Positional Information for Data

2024-01-16 Thread Robert Kern
On Tue, Jan 16, 2024 at 11:05 PM hao chen 
wrote:

> When dealing with lists that contain duplicate data, np.argsort fails to
> return index values that correspond to the actual sorting positions of the
> data, as it does when handling arrays without duplicates.
>
> Dear Author:
>
> When I use the np.argsort function on an array without duplicate data, the
> returned index values correspond to the sorting positions of the respective
> data.😀
>
> x = [1, 2, 5, 4]
> rank = np.argsort(x)
> print(rank)
> # [0 1 3 2]
>
> That is not what `argsort` is intended or documented to do. It returns an
array of indices _into `x`_ such that if you took the values from `x` in
that order, you would get a sorted array. That is, if `x` were sorted into
the array `sorted_x`, then `x[rank[i]] == sorted_x[i]` for all `i in
range(len(x))`. The indices in `rank` are positions in `x`, not positions
in `sorted_x`. They happen to correspond in this case, but that's a
coincidence that's somewhat common in these small examples. But consider
`[20, 30, 10, 40]`:

>>> x = np.array([20, 30, 10, 40])
>>> ix = np.argsort(x)
>>> def position(x):
... sorted_x = np.array(x)
... sorted_x.sort()
... return np.searchsorted(sorted_x, x)
...
>>> ip = position(x)
>>> ix
array([2, 0, 1, 3])
>>> ip
array([1, 2, 0, 3])

But also notice:

>>> np.argsort(np.argsort(x))
array([1, 2, 0, 3])

This double-argsort is what you seem to be looking for, though it depends
on what you want from the handling of duplicates (do you return the first
index into the sorted array with the same value as in my `position()`
implementation, or do you return the index that particular item was
actually sorted to).

Either way, we probably aren't going to add this as its own function. Both
options are straightforward combinations of existing primitives.

-- 
Robert Kern
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com