https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202

--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Alexander Monakov from comment #7)
> Note that vpxor serves as a dependency-breaking instruction (see PR 110438).
> So in negate1 we do the right thing for the wrong reasons, and in negate2 we
> can cause a substantial stall if the previous computation of xmm0 has a
> non-trivial dependency chain.

For this one, we can load *a into %zmm0 to avoid false_dependence.

vmovdqau ZMMWORD PTR [rdi], zmm0
vpternlogq      zmm0, zmm0, zmm0, 85

Reply via email to