On 5/26/23 08:42, Stefan Kanthak wrote:
I could have added PROPERLY, because that's where it CLEARLY fails, as
shown by the generated unoptimised code.
From what I've seen so far, I find your arguments unconvincing.
In this thread alone, you've proven that you don't know how to properly
control gcc via its command-line flags, and that you don't know how to
properly generate assembly code for your own C example (properly in this
case meaning to exhibit the behavior the ISO C standard requires) which
makes it hard for me to accept your claims at face value (your C example
is also logically incorrect, but that's not important to this discussion).
That said assuming that your "optimized assembly" examples (with the
exception of the first) are correct, all you've done is shown that your
versions are slightly smaller in both instruction count and size and
declared your examples "proper". The optimization flag -O3 (like most of
the -On flags) optimize for speed over all else, and it has been proven
that the faster code isn't necessarily the code with fewer instructions
or the smallest size (see the RISC v CISC debate).
To accept that your suggestions are the proper ways to generate code
using SSE4.1 instructions at -O3, I insist on data that clearly
demonstrates that your suggestions are at least as performant than what
GCC's currently does.