https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114345

            Bug ID: 114345
           Summary: FRE missing knowledge of semantics of IFN loads
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---

The following testcase:

---
long tdiff = 10412095;

int main() {
  struct {
    long maximum;
    int nonprimary_delay;
  } delays[] = {{}, {}, {}, {9223372036854775807, 36 * 60 * 60}};

  for (unsigned i = 0; i < sizeof(delays) / sizeof(delays[0]); ++i)
    if (tdiff <= delays[i].maximum)
      return delays[i].nonprimary_delay;

  __builtin_abort();
}
---

compiled with -O2 -fno-vect-cost-model

generates on AArch64:

  vect_cst__45 = {tdiff.0_2, tdiff.0_2};
  vect_array.11 = .LOAD_LANES (MEM <long int[4]> [(long int *)&delays]);
  vect__1.12_40 = vect_array.11[0];
  vect_array.11 ={v} {CLOBBER};
  vect_array.14 = .LOAD_LANES (MEM <long int[4]> [(long int *)&delays + 32B]);
  vect__1.15_43 = vect_array.14[0];
  vect_array.14 ={v} {CLOBBER};
  mask_patt_15.17_46 = vect__1.12_40 >= vect_cst__45;
  mask_patt_15.17_47 = vect__1.15_43 >= vect_cst__45;
  vexit_reduc_51 = mask_patt_15.17_46 | mask_patt_15.17_47;

and on x86_64:

  vect_cst__53 = {tdiff.0_2, tdiff.0_2};
  _37 = { 0, 4294967295, 4294967294, 4294967293 };
  _32 = { 4, 5, 6, 7 };
  vect__1.11_42 = MEM <vector(2) long int> [(long int *)&delays];
  vectp_delays.9_43 = &delays + 16;
  vect__1.12_44 = MEM <vector(2) long int> [(long int *)vectp_delays.9_43];
  vect_perm_even_45 = VEC_PERM_EXPR <vect__1.11_42, vect__1.12_44, { 0, 2 }>;
  vectp_delays.9_47 = &delays + 32;
  vect__1.13_48 = MEM <vector(2) long int> [(long int *)vectp_delays.9_47];
  vectp_delays.9_49 = &delays + 48;
  vect__1.14_50 = MEM <vector(2) long int> [(long int *)vectp_delays.9_49];
  vect_perm_even_51 = VEC_PERM_EXPR <vect__1.13_48, vect__1.14_50, { 0, 2 }>;
  mask_patt_17.15_54 = vect_perm_even_45 >= vect_cst__53;
  mask_patt_17.15_55 = vect_perm_even_51 >= vect_cst__53;
  vexit_reduc_59 = mask_patt_17.15_54 | mask_patt_17.15_55;

which is eventually simplified by FRE into:

  vect_cst__53 = {tdiff.0_2, tdiff.0_2};
  mask_patt_17.15_54 = vect_cst__53 <= { 0, 0 };
  mask_patt_17.15_55 = vect_cst__53 <= { 0, 9223372036854775807 };
  vexit_reduc_59 = mask_patt_17.15_54 | mask_patt_17.15_55;

and realizing that the loads aren't needed.

It looks like the reason is that FRE doesn't understand LOAD_LANES and
MASKED_LOAD_LANES or the other load IFNs.

We thus end up with a spill to the stack and a load of the constants.

Reply via email to