https://gcc.gnu.org/g:0caa152ba34d2cf53a6555455fa10d6130fd7dc5
commit r17-917-g0caa152ba34d2cf53a6555455fa10d6130fd7dc5 Author: Roger Sayle <[email protected]> Date: Thu May 28 20:54:17 2026 +0100 x86_64 SSE: Tweak/correct STV cost of 128-bit rotate by constant. This one line change resolves the failure of gcc.target/i386/rotate-2.c when compiled with -march=cascadelake triggered by recent STV improvements. https://gcc.gnu.org/pipermail/gcc-patches/2026-May/716996.html The decision of whether to perform STV is finely balanced, and affected by the microarchitecture's timings/costs, but in this case the underlying issue appears to be the parameterized cost for performing a 128-bit rotation by a constant in SSE registers. Depending upon the number of bits to rotate by, SSE requires either 1 or 2 shuffles, followed by a left shift, a right shift and an any_or_plus to combine the result. This is therefore 4 or 5 instructions, but currently returns COSTS_N_INSNS(1) instead of COSTS_N_INSNS(4) [probably a typo]. As an aside, it might be more useful for this gain to based on latency; as both the shuffles and the shifts can each be performed in parallel, so a reasonable vcost may therefore be COSTS_N_INSNS(3), but such fine tuning might require microbenchmarking. I mention it here just in case using COSTS_N_INSNS(4) is bisected as a performance regression. 2026-05-28 Roger Sayle <[email protected]> gcc/ChangeLog * config/i386/i386-features.cc (compute_convert_gain): Tweak the cost of a 128-bit rotation to be 4 (or 5) instructions. Diff: --- gcc/config/i386/i386-features.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index 4f3f50a65248..0694811e9da9 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -1867,7 +1867,7 @@ timode_scalar_chain::compute_convert_gain () else if (op1val > 32 && op1val < 96) vcost = COSTS_N_INSNS (5); else - vcost = COSTS_N_INSNS (1); + vcost = COSTS_N_INSNS (4); } igain = scost - vcost; break;
