https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866
Bug ID: 115866 Summary: missed optimization vectorizing switch statements. Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Blocks: 53947, 115130 Target Milestone: --- The following example: short a[100]; int foo(int n, int counter) { for (int i = 0; i < n; i++) { if (a[i] == 1 || a[i] == 2 || a[i] == 7 || a[i] == 4) return 1; } return 0; } fails to vectorize at -O3 -march=armv9-a because in GIMPLE the if is rewritten into a switch statement: <bb 2> [local count: 114863530]: if (n_6(D) > 0) goto <bb 6>; [94.50%] else goto <bb 11>; [5.50%] <bb 6> [local count: 108546036]: <bb 3> [local count: 1014686025]: # i_3 = PHI <i_8(7), 0(6)> _1 = a[i_3]; switch (_1) <default: <L9> [94.50%], case 1 ... 2: <L11> [5.50%], case 4: <L11> [5.50%], case 7: <L11> [5.50%]> <bb 9> [local count: 55807731]: <L11>: goto <bb 5>; [100.00%] <bb 4> [local count: 958878295]: <L9>: i_8 = i_3 + 1; if (n_6(D) > i_8) goto <bb 7>; [94.50%] else goto <bb 12>; [5.50%] <bb 12> [local count: 52738306]: goto <bb 8>; [100.00%] <bb 7> [local count: 906139989]: goto <bb 3>; [100.00%] <bb 11> [local count: 6317494]: <bb 8> [local count: 59055800]: <bb 5> [local count: 114863531]: # _5 = PHI <1(9), 0(8)> <L10>: return _5; However such switch statements, where all the entries lead to the same destination are easy to vectorize. In SVE we have the MATCH instruction that can be used here, and for others we can duplicate the constants and lower the switch to a series of compare and ORRs. This is similar as what's done for when the values don't fit inside a switch: short a[100]; int foo(int n, int counter, short x, short b, short c) { for (int i = 0; i < n; i++) { if (a[i] == x || a[i] == b || a[i] == 7 || a[i] == c) return 1; } return 0; } is vectorized as: .L4: incb x5 incw z30.s, all, mul #2 cmp w6, w1 bcc .L15 .L6: ld1h z31.h, p7/z, [x5] cmpeq p15.h, p7/z, z31.h, z27.h cmpeq p11.h, p7/z, z31.h, z28.h cmpeq p14.h, p7/z, z31.h, #7 cmpeq p12.h, p7/z, z31.h, z29.h orr p15.b, p7/z, p15.b, p11.b orr p14.b, p7/z, p14.b, p12.b inch x1 orr p15.b, p7/z, p15.b, p14.b ptest p13, p15.b b.none .L4 umov w1, v30.s[0] .L3: sxtw x1, w1 b .L7 which is great! but should be an SVE MATCH instruction as well. This kind of code shows up through the use of std::find_if as well: #include <bits/stdc++.h> using namespace std; bool pred(int i) { return i == 1 || i == 2 || i == 7 || i == 4; } int foo(vector<int> vec) { vector<int>::iterator it; it = find_if(vec.begin(), vec.end(), pred); return *it; } and once the unrolled loop is gone we should be able to vectorize it. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130 [Bug 115130] [meta-bug] early break vectorization