https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111770

            Bug ID: 111770
           Summary: predicated loads inactive lane values not modelled
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---

For this example:

int foo(int n, char *a, char *b) {
  int sum = 0;
  for (int i = 0; i < n; ++i) {
    sum += a[i] * b[i];
  }
  return sum;
}

we generate with -O3 -march=armv8-a+sve

.L3:
        ld1b    z29.b, p7/z, [x1, x3]
        ld1b    z31.b, p7/z, [x2, x3]
        add     x3, x3, x4
        sel     z31.b, p7, z31.b, z28.b
        whilelo p7.b, w3, w0
        udot    z30.s, z29.b, z31.b
        b.any   .L3
        uaddv   d30, p6, z30.s
        fmov    w0, s30
        ret

Which is pretty good, but we completely ruin it with the SEL.

In gimple this is:

  vect__7.12_81 = .MASK_LOAD (_21, 8B, loop_mask_77);
  masked_op1_82 = .VCOND_MASK (loop_mask_77, vect__7.12_81, { 0, ... });
  vect_patt_33.13_83 = DOT_PROD_EXPR <vect__3.9_78, masked_op1_82,
vect_sum_19.6_74>;

The missed optimization here is that we don't model what happens with
predicated operations that zero inactive lanes.

i.e. in this case .MASK_LOAD will zero the unactive lanes, so the .VCOND_MASK
is  completely superfluous.

I'm not entirely sure how we should go about fixing this generally.

Reply via email to