https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70046
--- Comment #2 from amker at gcc dot gnu.org --- I checked "-Ofast -march=native/-march=haswell" for GCC@230647 on Xeon, there is no regression. I also checked "-Ofast -mtune=core-avx2" for GCC@230689, there is no regression either. I looked into the generated assembly and the hot loops (the innermost two) is actually improved by the change with these options. So do I need some other options? Or it's a u-arch hazard that doesn't exist on Xeon? Thanks.