https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110438

            Bug ID: 110438
           Summary: generating all-ones zmm needs dep-breaking pxor before
                    ternlog
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64-*-*

VPTERNLOG is never a dependency-breaking instruction on existing x86
implementations, so generating a vector of all-ones via bare ternlog can stall
waiting on destination register. GCC should emit a dependency-breaking PXOR,
otherwise it will be a false-dependency-on-popcnt-lzcnt debacle all over again.

#include <immintrin.h>

__m512i g(void)
{
    return (__m512i){ 0 } - 1;
}

g:
        # waits until previous computation
        # of zmm0 has completed
        vpternlogd      zmm0, zmm0, zmm0, 0xFF
        ret

Reply via email to