https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92175

            Bug ID: 92175
           Summary: x86 backend claims V4SI multiplication support,
                    preventing more optimal pattern
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

Costing has

19010         /* Without sse4.1, we don't have PMULLD; it's emulated with 7
19011            insns, including two PMULUDQ.  */
19012         else if (mode == V4SImode && !(TARGET_SSE4_1 || TARGET_AVX))
19013           return ix86_vec_cost (mode, cost->mulss * 2 + cost->sse_op *
5);

but for a testcase doing just x * 2 that is excessive.  The vectorizer
would change that to x << 1 via vect_recog_mult_pattern (yeah, oddly
not to x + x ...).

This causes SSE vectorization to be disregarded easily, falling back to
MMX "emulation" mode which doesn't claim V4SImode multiplication support
producing essentially SSE code but with only half of the lanes doing useful
work.

I'm not sure if pattern recog should try costing here.  Certainly the
vectorizer won't try the PMULUDQ variant if the backend would claim to
not support V4SImode mult.

Noticed for the testcase in PR92173.

Reply via email to