On Tue, Dec 19, 2023 at 2:40 AM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Richard Biener <rguent...@suse.de> writes: > > On Tue, 19 Dec 2023, juzhe.zh...@rivai.ai wrote: > > > >> Hi, Richard. > >> > >> After investigating the codes: > >> /* Return true if EXPR is the integer constant zero or a complex constant > >> of zero, or a location wrapper for such a constant. */ > >> > >> bool > >> integer_zerop (const_tree expr) > >> { > >> STRIP_ANY_LOCATION_WRAPPER (expr); > >> > >> switch (TREE_CODE (expr)) > >> { > >> case INTEGER_CST: > >> return wi::to_wide (expr) == 0; > >> case COMPLEX_CST: > >> return (integer_zerop (TREE_REALPART (expr)) > >> && integer_zerop (TREE_IMAGPART (expr))); > >> case VECTOR_CST: > >> return (VECTOR_CST_NPATTERNS (expr) == 1 > >> && VECTOR_CST_DUPLICATE_P (expr) > >> && integer_zerop (VECTOR_CST_ENCODED_ELT (expr, 0))); > >> default: > >> return false; > >> } > >> } > >> > >> I wonder whether we can simplify the codes as follows :? > >> if (integer_zerop (arg1) || integer_zerop (arg2)) > >> step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR > >> || code == BIT_XOR_EXPR); > > > > Possibly. I'll let Richard S. comment on the whole structure. > > The current code is handling cases that require elementwise arithmetic. > ISTM that what we're really doing here is identifying cases where > whole-vector arithmetic is possible instead. I think that should be > a separate pre-step, rather than integrated into the current code. > > Largely this would consist of writing out match.pd-style folds in > C++ code, so Andrew's fix in comment 7 seems neater to me.
I didn't like the change to match.pd (even with a comment on why) because it violates the whole idea behind canonicalization of constants being 2nd operand of commutative and comparison expressions. Maybe there are only a few limited match/simplify patterns which need to add the :c for constants not being the 2nd operand but there needs to be a comment on why :c is needed for this. > > But if this must happen in const_binop instead, then we could have > a function like: The reasoning of why it should be in const_binop rather than in match is because both operands are constants. Now for commutative expressions, we only need to check the first operand for zero/all_ones and try again swapping the operands. This will most likely solve the problem of writing so much code. We could even use lambdas to simplify things too. Thanks, Andrew Pinski > > /* OP is the INDEXth operand to CODE (counting from zero) and OTHER_OP > is the other operand. Try to use the value of OP to simplify the > operation in one step, without having to process individual elements. */ > tree > simplify_const_binop (tree_code code, rtx op, rtx other_op, int index) > { > ... > } > > Thanks, > Richard > > > > > Richard. > > > >> > >> > >> > >> juzhe.zh...@rivai.ai > >> > >> From: Richard Biener > >> Date: 2023-12-19 17:12 > >> To: juzhe.zh...@rivai.ai > >> CC: Robin Dapp; gcc-patches; pan2.li; richard.sandiford; Richard Biener; > >> pinskia > >> Subject: Re: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped > >> vectors [PR112971]. > >> On Tue, 19 Dec 2023, juzhe.zh...@rivai.ai wrote: > >> > >> > Hi?Richard. Do you mean add the check as follows ? > >> > > >> > if (VECTOR_CST_NELTS_PER_PATTERN (arg1) == 1 > >> > && VECTOR_CST_NELTS_PER_PATTERN (arg2) == 3 > >> > >> Or <= 3 which would allow combining. As said, not sure what > >> == 2 would be and whether that would work. > >> > >> Btw, integer_allonesp should also allow to be optimized for > >> and/ior at least. Possibly IOR/AND with the sign bit for > >> signed elements as well. > >> > >> I wonder if there's a programmatic way to identify OK cases > >> rather than enumerating them. > >> > >> > && integer_zerop (VECTOR_CST_ELT (arg1, 0))) > >> > step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR > >> > || code == BIT_XOR_EXPR); > >> > else if (VECTOR_CST_NELTS_PER_PATTERN (arg2) == 1 > >> > && VECTOR_CST_NELTS_PER_PATTERN (arg1) == 3 > >> > && integer_zerop (VECTOR_CST_ELT (arg2, 0))) > >> > step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR > >> > || code == BIT_XOR_EXPR); > >> > > >> > > >> > > >> > juzhe.zh...@rivai.ai > >> > > >> > From: Richard Biener > >> > Date: 2023-12-19 16:15 > >> > To: ??? > >> > CC: rdapp.gcc; gcc-patches; pan2.li; richard.sandiford; > >> > richard.guenther; Andrew Pinski > >> > Subject: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped > >> > vectors [PR112971]. > >> > On Tue, 19 Dec 2023, ??? wrote: > >> > > >> > > Thanks Robin send initial patch to fix this ICE bug. > >> > > > >> > > CC to Richard S, Richard B, and Andrew. > >> > > >> > Just one comment, it seems that VECTOR_CST_STEPPED_P should > >> > implicitly include VECTOR_CST_DUPLICATE_P since it would be > >> > a step of zero (but as implemented it doesn't catch this). > >> > Looking at the implementation it's odd that we can handle > >> > VECTOR_CST_NELTS_PER_PATTERN == 1 (duplicate) and > >> > == 3 (stepped) but not == 2 (not sure what that would be). > >> > > >> > Maybe the tests can be re-formulated in terms of > >> > VECTOR_CST_NELTS_PER_PATTERN? > >> > > >> > Richard. > >> > > >> > > Thanks. > >> > > > >> > > > >> > > > >> > > juzhe.zh...@rivai.ai > >> > > > >> > > From: Robin Dapp > >> > > Date: 2023-12-19 03:50 > >> > > To: gcc-patches > >> > > CC: rdapp.gcc; Li, Pan2; juzhe.zh...@rivai.ai > >> > > Subject: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors > >> > > [PR112971]. > >> > > Hi, > >> > > > >> > > found in PR112971, this patch adds folding support for bitwise > >> > > operations > >> > > of const duplicate zero vectors and stepped vectors. > >> > > On riscv we have the situation that a folding would perpetually > >> > > continue > >> > > without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would > >> > > not fold to {0, 0, 0, ...}. > >> > > > >> > > Bootstrapped and regtested on x86 and aarch64, regtested on riscv. > >> > > > >> > > I won't be available to respond quickly until next year. Pan or Juzhe, > >> > > as discussed, feel free to continue with possible revisions. > >> > > > >> > > Regards > >> > > Robin > >> > > > >> > > > >> > > gcc/ChangeLog: > >> > > > >> > > PR middle-end/112971 > >> > > > >> > > * fold-const.cc (const_binop): Handle > >> > > zerop@1 AND/IOR/XOR VECT_CST_STEPPED_P@2 > >> > > > >> > > gcc/testsuite/ChangeLog: > >> > > > >> > > * gcc.target/riscv/rvv/autovec/pr112971.c: New test. > >> > > --- > >> > > gcc/fold-const.cc | 14 +++++++++++++- > >> > > .../gcc.target/riscv/rvv/autovec/pr112971.c | 18 ++++++++++++++++++ > >> > > 2 files changed, 31 insertions(+), 1 deletion(-) > >> > > create mode 100644 > >> > > gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c > >> > > > >> > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > >> > > index f5d68ac323a..43ed097bf5c 100644 > >> > > --- a/gcc/fold-const.cc > >> > > +++ b/gcc/fold-const.cc > >> > > @@ -1653,8 +1653,20 @@ const_binop (enum tree_code code, tree arg1, > >> > > tree arg2) > >> > > { > >> > > tree type = TREE_TYPE (arg1); > >> > > bool step_ok_p; > >> > > + > >> > > + /* AND, IOR as well as XOR with a zerop can be handled > >> > > directly. */ > >> > > if (VECTOR_CST_STEPPED_P (arg1) > >> > > - && VECTOR_CST_STEPPED_P (arg2)) > >> > > + && VECTOR_CST_DUPLICATE_P (arg2) > >> > > + && integer_zerop (VECTOR_CST_ELT (arg2, 0))) > >> > > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR > >> > > + || code == BIT_XOR_EXPR; > >> > > + else if (VECTOR_CST_STEPPED_P (arg2) > >> > > + && VECTOR_CST_DUPLICATE_P (arg1) > >> > > + && integer_zerop (VECTOR_CST_ELT (arg1, 0))) > >> > > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR > >> > > + || code == BIT_XOR_EXPR; > >> > > + else if (VECTOR_CST_STEPPED_P (arg1) > >> > > + && VECTOR_CST_STEPPED_P (arg2)) > >> > > /* We can operate directly on the encoding if: > >> > > a3 - a2 == a2 - a1 && b3 - b2 == b2 - b1 > >> > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c > >> > > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c > >> > > new file mode 100644 > >> > > index 00000000000..816ebd3c493 > >> > > --- /dev/null > >> > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c > >> > > @@ -0,0 +1,18 @@ > >> > > +/* { dg-do compile } */ > >> > > +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 > >> > > -fno-vect-cost-model" } */ > >> > > + > >> > > +int a; > >> > > +short b[9]; > >> > > +char c, d; > >> > > +void e() { > >> > > + d = 0; > >> > > + for (;; d++) { > >> > > + if (b[d]) > >> > > + break; > >> > > + a = 8; > >> > > + for (; a >= 0; a--) { > >> > > + char *f = &c; > >> > > + *f &= d == (a & d); > >> > > + } > >> > > + } > >> > > +} > >> > > > >> > > >> > > >> > >>