You do not talk about the SSE 4.1 version in your bench.
Have you tried this use case ?
Thanks !
Le 04/07/2022 à 19:23, Martijn van Beurden a écrit :
Op ma 4 jul. 2022 om 15:06 schreef olivier tristan <o.tris...@uvi.net>:
While I can understand the rationale for manual assembly as 32
bits x86
is dead, it seems a greater deal to remove all optimization including
intrinsic ones.
Yes, it does seem a great deal to remove all optimization, but it
really isn't. See the pull request associated with that change for
more information: https://github.com/xiph/flac/pull/347 I did quite a
bit of testing before merging this change, on two different CPUs, each
with 3 different compilers, each with 4 variants of the
non-intrinsics-accelerated functions. It turns out that there is no
performance loss at all, and in many cases this change makes flac
actually faster, not slower as one would expect.
Maybe there should be a an opt in if you don't want to be included by
default but some people including me don't want to see those
optimization been removed ?
There would be no advantage of that over keeping the original code: it
still needs to be maintained and tested, even if it is hidden behind
some configuration option. The only case where this patch could be
problematic in terms of speed is when one compiles flac to be used on
CPUs that do not support SSE2.
--
Olivier Tristan
Research & Development
www.uvi.net
_______________________________________________
flac-dev mailing list
flac-dev@xiph.org
http://lists.xiph.org/mailman/listinfo/flac-dev