Dear all,
There is a PR that adds a lookup table approach to `unique`, shown below. You
can get up to ~16x speedup for large integer arrays, at the cost of potentially
greater memory usage.
https://github.com/numpy/numpy/pull/21843
This is controlled by a new `kind` parameter, which is describ
On Tue, Jun 28, 2022 at 6:33 PM Miles Cranmer
wrote:
> Dear all,
>
> There is a PR that adds a lookup table approach to `unique`, shown below.
> You can get up to ~16x speedup for large integer arrays, at the cost of
> potentially greater memory usage.
>
I've seen multiple requests for not sorti
On Tue, 28 Jun 2022, 6:50 pm Ralf Gommers, wrote:
>
>
>> ```
>> kind : {None, 'sort', 'table'}, optional
>>
>
> Regarding the name, `'table'` is an implementation detail. The end user
> should not have to care what the data structure is that is used. I suggest
> to use something like "unsorte
Thanks for the comments Ralf!
> You cannot switch the default behavior, that will break backwards
> compatibility.
The default `kind=None` have no effect on input/output behavior of the
function. The only changes a user will see are in terms of speed and memory
usage. `unique` will select this
Ah, I did not clarify this: `kind="table"` will *also* return a sorted array.
It simply does not use a sorting algorithm to get to it. This is because the
table is generated using `np.arange` (i.e., already sorted) which is then
masked.
___
NumPy-Discu
On Tue, Jun 28, 2022 at 7:21 PM Miles Cranmer
wrote:
> Thanks for the comments Ralf!
>
> > You cannot switch the default behavior, that will break backwards
> compatibility.
>
> The default `kind=None` have no effect on input/output behavior of the
> function. The only changes a user will see are
Regarding 2., did you have a particular approach in mind? This new lookup table
method is already O(n) scaling (similar to a counting sort), so I cannot fathom
a method that, as you suggest, would get significantly better performance for
integer arrays. The sorting here is "free" in some sense s