https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115749
--- Comment #3 from kim.walisch at gmail dot com --- (In reply to Andrew Pinski from comment #2) > This seems like a tuning issue. In that gcc thinks the shifts and stuff is > faster than mulx. > > What happens if you do -march=native? > > Does it use mulx? I tried using g++-14 using both -march=native and -march=x86-64-v4 (on a 12th Gen Intel Core i5-12600K which supports BMI2 and AVX2) but GCC always produces that same 11 instruction assembly sequence without mulx for the integer modulo by a constant.