https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202
--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Alexander Monakov from comment #7) > Note that vpxor serves as a dependency-breaking instruction (see PR 110438). > So in negate1 we do the right thing for the wrong reasons, and in negate2 we > can cause a substantial stall if the previous computation of xmm0 has a > non-trivial dependency chain. For this one, we can load *a into %zmm0 to avoid false_dependence. vmovdqau ZMMWORD PTR [rdi], zmm0 vpternlogq zmm0, zmm0, zmm0, 85