On Fri, 2022-06-17 at 08:40 -0700, Stephan Hoyer wrote: > I think this is a great idea! I don't see any downsides here. > > As for the method name, I would lean towards calling it "kind" and > using a > default value of None for automatic selection, for consistency with > np.sort.
:+1:, I agree with both points. I am also not sure about the choice of "dictionary", since we don't use a dictionary (or even hashtable)? (Although I may have missed discussion on the name.) Cheers, Sebastian > > On Thu, Jun 16, 2022 at 6:14 AM Sebastian Berg > <sebast...@sipsolutions.net> > wrote: > > > Hi all, > > > > there is a PR to add a faster path to `np.isin`, that uses a look- > > up- > > table for all the elements that are included in the haystack > > (`test_elements`): > > > > https://github.com/numpy/numpy/pull/12065/files > > > > Such a table means that the memory overhead can be very > > significant, > > but the speedup as well, so there was the idea of adding an option > > to > > pick which version is used. > > > > The current documentation for this new `method` keyword argument > > would > > be. So the main questions are: > > > > * Is there any concern about adding such a new kwarg? > > * Is `method` the best name? Sorts uses `kind` which may also be > > good > > > > There is also the smaller question of what heuristic 'auto' would > > use, > > but that can be tweaked at any time. > > > > ``` > > method : {'auto', 'sort', 'dictionary'}, optional > > The algorithm to use. This will not affect the final > > result, > > but will affect the speed. Default is 'auto'. > > > > - If 'sort', will use a mergesort-based approach. This > > will have > > a memory usage of roughly 6 times the sum of the sizes > > of > > `ar1` and `ar2`, not accounting for size of dtypes. > > - If 'dictionary', will use a key-dictionary approach > > similar > > to a counting sort. This is only available for boolean > > and > > integer arrays. This will have a memory usage of the > > size of `ar1` plus the max-min value of `ar2`. This > > tends > > to be the faster method if the following formula is > > true: > > `log10(len(ar2)) > (log10(max(ar2)-min(ar2)) - 2.27) / > > 0.927`, > > but may use greater memory. > > - If 'auto', will automatically choose the method which is > > expected to perform the fastest, using the above > > formula. For larger sizes or smaller range, > > 'dictionary' is chosen. For larger range or smaller > > sizes, 'sort' is chosen.` > > ``` > > > > Cheers, > > > > Sebastian > > _______________________________________________ > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > To unsubscribe send an email to numpy-discussion-le...@python.org > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > Member address: sho...@gmail.com > > > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: sebast...@sipsolutions.net
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com