https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866

            Bug ID: 115866
           Summary: missed optimization vectorizing switch statements.
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
            Blocks: 53947, 115130
  Target Milestone: ---

The following example:

short a[100];

int foo(int n, int counter)
{
   for (int i = 0; i < n; i++)
     {
        if (a[i] == 1 || a[i] == 2 || a[i] == 7 || a[i] == 4)
          return 1;
     }
    return 0;
}

fails to vectorize at -O3 -march=armv9-a because in GIMPLE the if is rewritten
into a switch statement:

  <bb 2> [local count: 114863530]:
  if (n_6(D) > 0)
    goto <bb 6>; [94.50%]
  else
    goto <bb 11>; [5.50%]

  <bb 6> [local count: 108546036]:

  <bb 3> [local count: 1014686025]:
  # i_3 = PHI <i_8(7), 0(6)>
  _1 = a[i_3];
  switch (_1) <default: <L9> [94.50%], case 1 ... 2: <L11> [5.50%], case 4:
<L11> [5.50%], case 7: <L11> [5.50%]>

  <bb 9> [local count: 55807731]:
<L11>:
  goto <bb 5>; [100.00%]

  <bb 4> [local count: 958878295]:
<L9>:
  i_8 = i_3 + 1;
  if (n_6(D) > i_8)
    goto <bb 7>; [94.50%]
  else
    goto <bb 12>; [5.50%]

  <bb 12> [local count: 52738306]:
  goto <bb 8>; [100.00%]

  <bb 7> [local count: 906139989]:
  goto <bb 3>; [100.00%]

  <bb 11> [local count: 6317494]:

  <bb 8> [local count: 59055800]:

  <bb 5> [local count: 114863531]:
  # _5 = PHI <1(9), 0(8)>
<L10>:
  return _5;

However such switch statements, where all the entries lead to the same
destination are easy to vectorize.  In SVE we have the MATCH instruction that
can be used here, and for others we can duplicate the constants and lower the
switch to a series of compare and ORRs.

This is similar as what's done for when the values don't fit inside a switch:

short a[100];

int foo(int n, int counter, short x, short b, short c)
{
   for (int i = 0; i < n; i++)
     {
        if (a[i] == x || a[i] == b || a[i] == 7 || a[i] == c)
          return 1;
     }
    return 0;
}

is vectorized as:

.L4:
        incb    x5
        incw    z30.s, all, mul #2
        cmp     w6, w1
        bcc     .L15
.L6:
        ld1h    z31.h, p7/z, [x5]
        cmpeq   p15.h, p7/z, z31.h, z27.h
        cmpeq   p11.h, p7/z, z31.h, z28.h
        cmpeq   p14.h, p7/z, z31.h, #7
        cmpeq   p12.h, p7/z, z31.h, z29.h
        orr     p15.b, p7/z, p15.b, p11.b
        orr     p14.b, p7/z, p14.b, p12.b
        inch    x1
        orr     p15.b, p7/z, p15.b, p14.b
        ptest   p13, p15.b
        b.none  .L4
        umov    w1, v30.s[0]
.L3:
        sxtw    x1, w1
        b       .L7

which is great! but should be an SVE MATCH instruction as well.

This kind of code shows up through the use of std::find_if as well:

#include <bits/stdc++.h>
using namespace std;

bool pred(int i) { return i == 1 || i == 2 || i == 7 || i == 4; }

int foo(vector<int> vec)
{
    vector<int>::iterator it;

    it = find_if(vec.begin(), vec.end(), pred);

    return *it;
}

and once the unrolled loop is gone we should be able to vectorize it.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130
[Bug 115130] [meta-bug] early break vectorization

Reply via email to