New submission from Michał Górny <[email protected]>:
The setup.py file for Python states:
if (not cross_compiling and
os.uname().machine == "x86_64" and
sys.maxsize > 2**32):
# Every x86_64 machine has at least SSE2. Check for sys.maxsize
# in case that kernel is 64-bit but userspace is 32-bit.
blake2_macros.append(('BLAKE2_USE_SSE', '1'))
While the assertion about having SSE2 is true, it doesn't mean that it's
worthwhile to use. I've tested pure (i.e. without SSSE3 and so on) on three
different machines, getting the following results:
Athlon64 X2 (SSE2 is the best supported variant), 540 MiB of data:
SSE2: [5.189988004000043, 5.070812243997352]
ref: [2.0161159170020255, 2.0475422790041193]
Core i3, same data file:
SSE2: [1.924425926999902, 1.92461746999993, 1.9298037500000191]
ref: [1.7940209749999667, 1.7900855569999976, 1.7835538760000418]
Xeon E5630 server, 230 MiB data file:
SSE2: [0.7671358410007088, 0.7797677099879365, 0.7648976119962754]
ref: [0.5784736709902063, 0.5717909929953748, 0.5717219939979259]
So in all the tested cases, pure SSE2 implementation is *slower* than the
reference implementation. SSSE3 and other variants are faster and AFAIU they
are enabled automatically based on CFLAGS, so it doesn't matter for most of the
systems.
However, for old CPUs that do not support SSSE3, the choice of SSE2 makes the
algorithm prohibitively slow -- it's 2.5 times slower than the reference
implementation!
----------
components: Extension Modules
messages: 304696
nosy: mgorny
priority: normal
severity: normal
status: open
title: BLAKE2: the (pure) SSE2 impl forced on x86_64 is slower than reference
type: performance
versions: Python 3.6
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue31834>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com