On Thu, Sep 14, 2023 at 11:34 AM Devulapalli, Raghuveer < raghuveer.devulapa...@intel.com> wrote:
> What processor you are running this on? np.sort uses AVX-512 accelerated > sorting for np.int32, so just wondering if you that is the reason for this > difference. > > Raghuveer We also have radix sort for stable sorting of int8, int16, which should be quite fast. Hmm, I wonder if radix sort could be vectorized? When we dropped Python 2.7, there were some folks who ended up using a uint8 array subtype for storing data. All they needed to add was automatic translation to strings for certain accesses. This gave them a 4x advantage in storage space. > > > > -----Original Message----- > > From: sal...@caltech.edu <sal...@caltech.edu> > > Sent: Wednesday, September 13, 2023 6:14 PM > > To: numpy-discussion@python.org > > Subject: [Numpy-discussion] Curious performance different with np.unique > on > > arrays of characters > > > > Hello - > > > > In the course of some genomics simulations, I seem to have come across a > curious > > (to me at least) performance difference in np.unique that I wanted to > share. (If > > this is not the right forum for this, please let me know!) > > > > With a np.array of characters (U1), np.unique seems to be much faster > when > > doing np.view as int -> np.unique -> np.view as U1 for arrays of decent > size. I > > would not have expected this since np.unique knows what's coming in as > S1 and > > could handle the view-stuff internally. I've played with this a number > of ways (e.g. > > S1 vs U1; int32 vs int64; return_counts = True vs False; 100, 1000, or > 10k > > elements) and seem to notice the same pattern. A short illustration > below with > > U1, int32, return_counts = False, 10 vs 10k. > > > > I wonder if this is actually intended behavior, i.e. the view-stuff is > actually a good > > idea for the user to think about and implement if appropriate for their > usecase (as > > it is for me). > > > > Best regards, > > Shyam > > > > > > import numpy as np > > > > charlist_10 = np.array(list('ASDFGHJKLZ'), dtype='U1') charlist_10k = > > np.array(list('ASDFGHJKLZ' * 1000), dtype='U1') > > > > def unique_basic(x): > > return np.unique(x) > > > > def unique_view(x): > > return np.unique(x.view(np.int32)).view(x.dtype) > > > > In [27]: %timeit unique_basic(charlist_10) > > 2.17 µs ± 40.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops > each) > > > > In [28]: %timeit unique_view(charlist_10) > > 2.53 µs ± 38.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops > each) > > > > In [29]: %timeit unique_basic(charlist_10k) > > 204 µs ± 4.61 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) > > > > In [30]: %timeit unique_view(charlist_10k) > > 66.7 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops > each) > > > > In [31]: np.__version__ > > Out[31]: '1.25.2' > > > > > > > > -- > > Shyam Saladi > > https://shyam.saladi.org > > _______________________________________________ > > NumPy-Discussion mailing list -- numpy-discussion@python.org To > unsubscribe > > send an email to numpy-discussion-le...@python.org > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > Member address: raghuveer.devulapa...@intel.com > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: charlesr.har...@gmail.com >
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com