https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789

            Bug ID: 96789
           Summary: x264: sub4x4_dct() improves when vectorization is
                    disabled
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: linkw at gcc dot gnu.org
  Target Milestone: ---

One of my workmates found that if we disable vectorization for SPEC2017
525.x264_r function sub4x4_dct in source file x264_src/common/dct.c with
explicit function attribute __attribute__((optimize("no-tree-vectorize"))), it
can speed up by 4%.

The option used is: -O3 -mcpu=power9 -fcommon -fno-strict-aliasing
-fgnu89-inline

I confirmed this finding and it can further narrow down to SLP vectorization
with attribute __attribute__((optimize("no-tree-slp-vectorize"))).

I also checked with r11-0 commit for this particular file, the performance keep
unchanged, with/without vectorization attribute. So I think it's a trunk
regression, probably exposes one SLP flaw or one cost modeling issue.

Reply via email to