From: Richard Sandiford <richard.sandif...@arm.com> Date: Tuesday, May 16, 2023 at 2:15 PM To: Tejas Belagod <tejas.bela...@arm.com> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org> Subject: Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab] Tejas Belagod <tejas.bela...@arm.com> writes: >> + { >> + int i; >> + int nelts = vector_cst_encoded_nelts (v); >> + int first_el = 0; >> + >> + for (i = first_el; i < nelts; i += step) >> + if (VECTOR_CST_ENCODED_ELT (v, i) != VECTOR_CST_ENCODED_ELT (v, > first_el)) > > I think this should use !operand_equal_p (..., ..., 0). > > > Oops! I wonder why I thought VECTOR_CST_ENCODED_ELT returned a constant! > Thanks > for spotting that. It does only return a constant. But there can be multiple trees with the same constant value, through things like TREE_OVERFLOW (not sure where things stand on expunging that from gimple) and the fact that gimple does not maintain a distinction between different types that have the same mode and signedness. (E.g. on ILP32 hosts, gimple does not maintain a distinction between int and long, even though int 0 and long 0 are different trees.) > Also, should the flags here be OEP_ONLY_CONST ? Nah, just 0 should be fine. >> + return false; >> + >> + return true; >> + } >> + >> + /* Fold a svlast{a/b} call with constant predicate to a BIT_FIELD_REF. >> + BIT_FIELD_REF lowers to a NEON element extract, so we have to make sure >> + the index of the element being accessed is in the range of a NEON > vector >> + width. */ > > s/NEON/Advanced SIMD/. Same in later comments > >> + gimple *fold (gimple_folder & f) const override >> + { >> + tree pred = gimple_call_arg (f.call, 0); >> + tree val = gimple_call_arg (f.call, 1); >> + >> + if (TREE_CODE (pred) == VECTOR_CST) >> + { >> + HOST_WIDE_INT pos; >> + unsigned int const_vg; >> + int i = 0; >> + int step = f.type_suffix (0).element_bytes; >> + int step_1 = gcd (step, VECTOR_CST_NPATTERNS (pred)); >> + int npats = VECTOR_CST_NPATTERNS (pred); >> + unsigned HOST_WIDE_INT nelts = vector_cst_encoded_nelts (pred); >> + tree b = NULL_TREE; >> + bool const_vl = aarch64_sve_vg.is_constant (&const_vg); > > I think this might be left over from previous versions, but: > const_vg isn't used and const_vl is only used once, so I think it > would be better to remove them. > >> + >> + /* We can optimize 2 cases common to variable and fixed-length cases >> + without a linear search of the predicate vector: >> + 1. LASTA if predicate is all true, return element 0. >> + 2. LASTA if predicate all false, return element 0. */ >> + if (is_lasta () && vect_all_same (pred, step_1)) >> + { >> + b = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), val, >> + bitsize_int (step * BITS_PER_UNIT), bitsize_int (0)); >> + return gimple_build_assign (f.lhs, b); >> + } >> + >> + /* Handle the all-false case for LASTB where SVE VL == 128b - >> + return the highest numbered element. */ >> + if (is_lastb () && known_eq (BYTES_PER_SVE_VECTOR, 16) >> + && vect_all_same (pred, step_1) >> + && integer_zerop (VECTOR_CST_ENCODED_ELT (pred, 0))) > > Formatting nit: one condition per line once one line isn't enough. > >> + { >> + b = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), val, >> + bitsize_int (step * BITS_PER_UNIT), >> + bitsize_int ((16 - step) * BITS_PER_UNIT)); >> + >> + return gimple_build_assign (f.lhs, b); >> + } >> + >> + /* If VECTOR_CST_NELTS_PER_PATTERN (pred) == 2 and every multiple of >> + 'step_1' in >> + [VECTOR_CST_NPATTERNS .. VECTOR_CST_ENCODED_NELTS - 1] >> + is zero, then we can treat the vector as VECTOR_CST_NPATTERNS >> + elements followed by all inactive elements. */ >> + if (!const_vl && VECTOR_CST_NELTS_PER_PATTERN (pred) == 2) > > Following on from the above, maybe use: > > !VECTOR_CST_NELTS (pred).is_constant () > > instead of !const_vl here. > > I have a horrible suspicion that I'm contradicting our earlier discussion > here, sorry, but: I think we have to return null if NELTS_PER_PATTERN != 2. > > > > IIUC, the NPATTERNS .. ENCODED_ELTS represent the repeated part of the encoded > constant. This means the repetition occurs if NELTS_PER_PATTERN == 2, IOW the > base1 repeats in the encoding. This loop is checking this condition and looks > for a 1 in the repeated part of the NELTS_PER_PATTERN == 2 in a VL vector. > Please correct me if I’m misunderstanding here. NELTS_PER_PATTERN == 1 is also a repeating pattern: it means that the entire sequence is repeated to fill a vector. So if an NELTS_PER_PATTERN == 1 constant has elements {0, 1, 0, 0}, the vector is: {0, 1, 0, 0, 0, 1, 0, 0, ...} Wouldn’t the vect_all_same(pred, step) cover this case for a given value of step? and the optimisation can't handle that. NELTS_PER_PATTERN == 3 isn't likely to occur for predicates, but in principle it has the same problem. OK, I had misunderstood the encoding to always make base1 the repeating value by adjusting the NPATTERNS accordingly – I didn’t know you could also have the base2 value and beyond encoding the repeat value. In this case could I just remove NELTS_PER_PATTERN == 2 condition and the enclosed loop would check for a repeating ‘1’ in the repeated part of the encoded pattern? Thanks, Tejas. Thanks, Richard
Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]
Tejas Belagod via Gcc-patches Tue, 16 May 2023 04:28:59 -0700
- Re: [PATCH] [PR96339] AArch64: Optimise ... Tejas Belagod via Gcc-patches
- Re: [PATCH] [PR96339] AArch64: Opti... Richard Sandiford via Gcc-patches
- Re: [PATCH] [PR96339] AArch64: ... Tejas Belagod via Gcc-patches
- Re: [PATCH] [PR96339] AArch... Richard Sandiford via Gcc-patches
- Re: [PATCH] [PR96339] A... Tejas Belagod via Gcc-patches
- Re: [PATCH] [PR963... Richard Sandiford via Gcc-patches
- Re: [PATCH] [P... Tejas Belagod via Gcc-patches