https://gcc.gnu.org/g:0caa152ba34d2cf53a6555455fa10d6130fd7dc5

commit r17-917-g0caa152ba34d2cf53a6555455fa10d6130fd7dc5
Author: Roger Sayle <[email protected]>
Date:   Thu May 28 20:54:17 2026 +0100

    x86_64 SSE: Tweak/correct STV cost of 128-bit rotate by constant.
    
    This one line change resolves the failure of gcc.target/i386/rotate-2.c
    when compiled with -march=cascadelake triggered by recent STV improvements.
    https://gcc.gnu.org/pipermail/gcc-patches/2026-May/716996.html
    
    The decision of whether to perform STV is finely balanced, and affected
    by the microarchitecture's timings/costs, but in this case the underlying
    issue appears to be the parameterized cost for performing a 128-bit
    rotation by a constant in SSE registers.  Depending upon the number
    of bits to rotate by, SSE requires either 1 or 2 shuffles, followed
    by a left shift, a right shift and an any_or_plus to combine the result.
    This is therefore 4 or 5 instructions, but currently returns
    COSTS_N_INSNS(1) instead of COSTS_N_INSNS(4) [probably a typo].
    
    As an aside, it might be more useful for this gain to based on latency;
    as both the shuffles and the shifts can each be performed in parallel,
    so a reasonable vcost may therefore be COSTS_N_INSNS(3), but such fine
    tuning might require microbenchmarking.  I mention it here just in case
    using COSTS_N_INSNS(4) is bisected as a performance regression.
    
    2026-05-28  Roger Sayle  <[email protected]>
    
    gcc/ChangeLog
            * config/i386/i386-features.cc (compute_convert_gain): Tweak
            the cost of a 128-bit rotation to be 4 (or 5) instructions.

Diff:
---
 gcc/config/i386/i386-features.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 4f3f50a65248..0694811e9da9 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -1867,7 +1867,7 @@ timode_scalar_chain::compute_convert_gain ()
              else if (op1val > 32 && op1val < 96)
                vcost = COSTS_N_INSNS (5);
              else
-               vcost = COSTS_N_INSNS (1);
+               vcost = COSTS_N_INSNS (4);
            }
          igain = scost - vcost;
          break;

Reply via email to