[issue45902] Bytes and bytesarrays can be sorted with a much faster count sort.

Mark Dickinson Fri, 26 Nov 2021 07:50:36 -0800

Mark Dickinson <dicki...@gmail.com> added the comment:

> If there are enough use cases for it.


Well, that's the question. :-) *I* can't think of any good use cases, but maybe 
others can. But if we can't come up with some use cases, then this feels like a 
solution looking for a problem, and that makes it hard to justify both the 
short-term effort and the longer-term maintenance costs of adding the 
complexity.

FWIW, given a need to efficiently compute frequency tables for reasonably long 
byte data, I'd probably reach first for NumPy (and numpy.bincount in 
particular):

Python 3.10.0 (default, Nov 12 2021, 12:32:57) [Clang 12.0.5 
(clang-1205.0.22.11)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.28.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import collections, numpy as np

In [2]: t = 
b'MDIAIHHPWIRRPFFPFHSPSRLFDQFFGEHLLESDLFSTATSLSPFYLRPPSFLRAPSWIDTGLSEMRLEKDRFSVNLDVKHFSPEELKVKVLGDVIEVHGKHEERQDEHGFISREFHRKYRI
   ...: PADVDPLAITSSLSSDGVLTVNGPRKQVSGPERTIPITREEKPAVAAAPKK';  t *= 100

In [3]: %timeit np.bincount(np.frombuffer(t, np.uint8))
32.7 µs ± 3.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [4]: %timeit collections.Counter(t)
702 µs ± 25.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: %timeit sorted(t)
896 µs ± 64.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue45902>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue45902] Bytes and bytesarrays can be sorted with a much faster count sort.

Reply via email to