https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114345
Bug ID: 114345 Summary: FRE missing knowledge of semantics of IFN loads Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- The following testcase: --- long tdiff = 10412095; int main() { struct { long maximum; int nonprimary_delay; } delays[] = {{}, {}, {}, {9223372036854775807, 36 * 60 * 60}}; for (unsigned i = 0; i < sizeof(delays) / sizeof(delays[0]); ++i) if (tdiff <= delays[i].maximum) return delays[i].nonprimary_delay; __builtin_abort(); } --- compiled with -O2 -fno-vect-cost-model generates on AArch64: vect_cst__45 = {tdiff.0_2, tdiff.0_2}; vect_array.11 = .LOAD_LANES (MEM <long int[4]> [(long int *)&delays]); vect__1.12_40 = vect_array.11[0]; vect_array.11 ={v} {CLOBBER}; vect_array.14 = .LOAD_LANES (MEM <long int[4]> [(long int *)&delays + 32B]); vect__1.15_43 = vect_array.14[0]; vect_array.14 ={v} {CLOBBER}; mask_patt_15.17_46 = vect__1.12_40 >= vect_cst__45; mask_patt_15.17_47 = vect__1.15_43 >= vect_cst__45; vexit_reduc_51 = mask_patt_15.17_46 | mask_patt_15.17_47; and on x86_64: vect_cst__53 = {tdiff.0_2, tdiff.0_2}; _37 = { 0, 4294967295, 4294967294, 4294967293 }; _32 = { 4, 5, 6, 7 }; vect__1.11_42 = MEM <vector(2) long int> [(long int *)&delays]; vectp_delays.9_43 = &delays + 16; vect__1.12_44 = MEM <vector(2) long int> [(long int *)vectp_delays.9_43]; vect_perm_even_45 = VEC_PERM_EXPR <vect__1.11_42, vect__1.12_44, { 0, 2 }>; vectp_delays.9_47 = &delays + 32; vect__1.13_48 = MEM <vector(2) long int> [(long int *)vectp_delays.9_47]; vectp_delays.9_49 = &delays + 48; vect__1.14_50 = MEM <vector(2) long int> [(long int *)vectp_delays.9_49]; vect_perm_even_51 = VEC_PERM_EXPR <vect__1.13_48, vect__1.14_50, { 0, 2 }>; mask_patt_17.15_54 = vect_perm_even_45 >= vect_cst__53; mask_patt_17.15_55 = vect_perm_even_51 >= vect_cst__53; vexit_reduc_59 = mask_patt_17.15_54 | mask_patt_17.15_55; which is eventually simplified by FRE into: vect_cst__53 = {tdiff.0_2, tdiff.0_2}; mask_patt_17.15_54 = vect_cst__53 <= { 0, 0 }; mask_patt_17.15_55 = vect_cst__53 <= { 0, 9223372036854775807 }; vexit_reduc_59 = mask_patt_17.15_54 | mask_patt_17.15_55; and realizing that the loads aren't needed. It looks like the reason is that FRE doesn't understand LOAD_LANES and MASKED_LOAD_LANES or the other load IFNs. We thus end up with a spill to the stack and a load of the constants.