On 5/26/23 08:42, Stefan Kanthak wrote:

I could have added PROPERLY, because that's where it CLEARLY fails, as
shown by the generated unoptimised code.

From what I've seen so far, I find your arguments unconvincing.

In this thread alone, you've proven that you don't know how to properly control gcc via its command-line flags, and that you don't know how to properly generate assembly code for your own C example (properly in this case meaning to exhibit the behavior the ISO C standard requires) which makes it hard for me to accept your claims at face value (your C example is also logically incorrect, but that's not important to this discussion).

That said assuming that your "optimized assembly" examples (with the exception of the first) are correct, all you've done is shown that your versions are slightly smaller in both instruction count and size and declared your examples "proper". The optimization flag -O3 (like most of the -On flags) optimize for speed over all else, and it has been proven that the faster code isn't necessarily the code with fewer instructions or the smallest size (see the RISC v CISC debate).

To accept that your suggestions are the proper ways to generate code using SSE4.1 instructions at -O3, I insist on data that clearly demonstrates that your suggestions are at least as performant than what GCC's currently does.

Reply via email to