On Tue, Aug 20, 2024 at 3:08 PM Tamar Christina <[email protected]> wrote:
>
> Hi All,
>
> IVOPTS normally uses affine trees to perform comparisons between different
> IVs,
> but these seem to have been missing in two key spots and instead normal tree
> equivalencies used.
>
> In some cases where we have a two-complements equivalence but not a strict
> signedness equivalencies we end up generating both a signed and unsigned IV
> for
> the same candidate.
>
> This patch implements a new OEP flag called OEP_STRUCTURAL_EQ. This flag will
> check if the operands would produce the same bit values after the computations
> even if the final sign is different.
I think the name is badly chosen - we already have OEP_LEXICOGRAPHIC and
OEP_BITWISE. I would suggest to use OEP_ASSUME_TWOS_COMPLEMENT
or OEP_ASSUME_WRAPV.
> This happens quite a lot with fortran but can also happen in C because this
> came
> code is unable to figure out when one expression is a multiple of another.
>
> As an example in the attached testcase we get:
>
> Initial set of candidates:
> cost: 24 (complexity 3)
> reg_cost: 9
> cand_cost: 15
> cand_group_cost: 0 (complexity 3)
> candidates: 1, 6, 8
> group:0 --> iv_cand:6, cost=(0,1)
> group:1 --> iv_cand:1, cost=(0,0)
> group:2 --> iv_cand:8, cost=(0,1)
> group:3 --> iv_cand:8, cost=(0,1)
> invariant variables: 6
> invariant expressions: 1, 2
>
> <Invariant Expressions>:
> inv_expr 1: stride.3_27 * 4
> inv_expr 2: (unsigned long) stride.3_27 * 4
>
> These end up being used in the same group:
>
> Group 1:
> cand cost compl. inv.expr. inv.vars
> 1 0 0 NIL; 6
> 2 0 0 NIL; 6
> 3 0 0 NIL; 6
>
> which ends up with IV opts picking the signed and unsigned IVs:
>
> Improved to:
> cost: 24 (complexity 3)
> reg_cost: 9
> cand_cost: 15
> cand_group_cost: 0 (complexity 3)
> candidates: 1, 6, 8
> group:0 --> iv_cand:6, cost=(0,1)
> group:1 --> iv_cand:1, cost=(0,0)
> group:2 --> iv_cand:8, cost=(0,1)
> group:3 --> iv_cand:8, cost=(0,1)
> invariant variables: 6
> invariant expressions: 1, 2
>
> and so generates the same IV as both signed and unsigned:
>
> ;; basic block 21, loop depth 3, count 214748368 (estimated locally, freq
> 58.2545), maybe hot
> ;; prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> ;; pred: 28 [always] count:23622320 (estimated locally, freq
> 6.4080) (FALLTHRU,EXECUTABLE)
> ;; 25 [always] count:191126046 (estimated locally, freq
> 51.8465) (FALLTHRU,DFS_BACK,EXECUTABLE)
> # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
> # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
> # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
> # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
>
> ...
>
> ;; basic block 24, loop depth 3, count 214748366 (estimated locally, freq
> 58.2545), maybe hot
> ;; prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> ;; pred: 22 [always] count:95443719 (estimated locally, freq
> 25.8909) (FALLTHRU)
> ;; 21 [33.3% (guessed)] count:71582790 (estimated locally,
> freq 19.4182) (TRUE_VALUE,EXECUTABLE)
> ;; 31 [33.3% (guessed)] count:47721860 (estimated locally,
> freq 12.9455) (TRUE_VALUE,EXECUTABLE)
> # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> ivtmp.22_82 = ivtmp.22_41 + 1;
> ivtmp.26_72 = ivtmp.26_51 + _80;
> ivtmp.28_98 = ivtmp.28_90 + _39;
>
> These two IVs are always used as unsigned, so IV ops generates:
>
> _73 = stride.3_27 * 4;
> _80 = (unsigned long) _73;
> _54 = (unsigned long) stride.3_27;
> _39 = _54 * 4;
>
> Which means that in e.g. exchange2 we generate a lot of duplicate code.
>
> This is because candidate 6 and 8 are equivalent under two's complement but
> have
> different signs.
>
> This patch changes it so that if you have two IVs that are affine equivalent
> to
> just pick one over the other. IV already has code for this, so the patch just
> uses affine trees instead of tree for the check.
>
> With it we get:
>
> <Invariant Expressions>:
> inv_expr 1: stride.3_27 * 4
>
> <Group-candidate Costs>:
> Group 0:
> cand cost compl. inv.expr. inv.vars
> 5 0 2 NIL; NIL;
> 6 0 3 NIL; NIL;
>
> Group 1:
> cand cost compl. inv.expr. inv.vars
> 1 0 0 NIL; 6
> 2 0 0 NIL; 6
> 3 0 0 NIL; 6
> 4 0 0 NIL; 6
>
> Initial set of candidates:
> cost: 16 (complexity 3)
> reg_cost: 6
> cand_cost: 10
> cand_group_cost: 0 (complexity 3)
> candidates: 1, 6
> group:0 --> iv_cand:6, cost=(0,3)
> group:1 --> iv_cand:1, cost=(0,0)
> invariant variables: 6
> invariant expressions: 1
>
> The two patches together results in a 10% performance increase in exchange2 in
> SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile
> time. There's also a 5% performance improvement in fotonik3d and similar
> reduction in binary size.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR tree-optimization/114932
> * fold-const.cc (operand_compare::operand_equal_p): Use it.
> (operand_compare::verify_hash_value): Likewise.
> (test_operand_equality::test): New.
> (fold_const_cc_tests): Use it.
> * tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ.
> * tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/114932
> * gfortran.dg/addressing-modes_2.f90: New test.
>
> ---
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index
> 71e82b1d76d4106c7c23c54af8b35905a1af9f1c..52a4465de0355220befa6d6b3301e21e038e3e41
> 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -3208,7 +3208,19 @@ operand_compare::operand_equal_p (tree type0,
> const_tree arg0,
> return tree_int_cst_equal (arg0, arg1);
> }
>
> - if (!(flags & OEP_ADDRESS_OF))
> + if ((flags & OEP_STRUCTURAL_EQ)
> + && (CONVERT_EXPR_P (arg0) || CONVERT_EXPR_P (arg1)))
> + {
> + const_tree t_arg0 = arg0;
> + const_tree t_arg1 = arg1;
> + STRIP_NOPS (arg0);
> + STRIP_NOPS (arg1);
> + /* Only recurse if the conversion was one that was valid to strip. */
> + if (t_arg0 != arg0 || t_arg1 != arg1)
> + return operand_equal_p (type0, arg0, type1, arg1, flags);
(*)
> + }
> +
> + if (!(flags & (OEP_ADDRESS_OF | OEP_STRUCTURAL_EQ)))
> {
> /* If both types don't have the same signedness, then we can't consider
> them equal. We must check this before the STRIP_NOPS calls
> @@ -3439,6 +3451,22 @@ operand_compare::operand_equal_p (tree type0,
> const_tree arg0,
> if (flags & (OEP_ONLY_CONST | OEP_BITWISE))
> return false;
>
> + /* Check if we are checking an operation where the two's compliment bitwise
> + representation of the result is not the same between signed and
> + unsigned arithmetic. */
> + switch (TREE_CODE (arg0))
> + {
> + case PLUS_EXPR:
> + case MINUS_EXPR:
> + case MULT_EXPR:
> + break;
> +
> + default:
> + /* Clear the structural comparison bit. */
> + flags &= ~OEP_STRUCTURAL_EQ;
So this handling means that a / (int)((unsigned)b - (unsigned)c)) and
a / (b - c) are not equal with respect to OEP_STRUCTURAL_EQ because
we drop that flag when processing the division and thus the recursion on
the minus operands doesn't have it any more.
So shouldn't this and (*) be integrated with the TYPE_UNSIGNED compare in
the !OEP_ADDRESS_OF case? Aka not require same signedness for
PLUS/MINUS/MULT/CONVERT?
I think we have to document what the TYPE, ARG combination means
semantically. As far as I understand TYPE doesn't specify the signedness
of the operation ARG but instead represents a possibly stripped conversion
of the result?
--
There's
/* Don't handle more cases for OEP_BITWISE, since we can't guarantee that
two instances of undefined behavior will give identical results. */
if (flags & (OEP_ONLY_CONST | OEP_BITWISE))
return false;
which is true - so I think we need to document what using OEP_STRUCTURAL_EQ
means, in particular that it does not mean you can replace ARG0 with
ARG1 without
externally guaranteeing you are not introducing new UB. I do wonder about
say comparing (int)((unsigned)a + (unsigned)b) + (c + d) with
(a + b) + (int)((unsigned)c + (unsigned)d) for example - we can replace both
with a all-unsigned computation or replace one with the other when both would
have been always executed. I think this needs to be clearly documented.
> + break;
> + }
> +
> /* Define macros to test an operand from arg0 and arg1 for equality and a
> variant that allows null and views null as being different from any
> non-null value. In the latter case, if either is null, the both
> @@ -4218,7 +4246,7 @@ operand_compare::verify_hash_value (const_tree arg0,
> const_tree arg1,
> {
> if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK))
> {
> - if (arg0 != arg1 && !(flags & OEP_DECL_NAME))
> + if (arg0 != arg1 && !(flags & (OEP_DECL_NAME | OEP_STRUCTURAL_EQ)))
> {
> inchash::hash hstate0 (0), hstate1 (0);
> hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK);
> @@ -17357,6 +17385,95 @@ test_arithmetic_folding ()
> x);
> }
>
> +namespace test_operand_equality {
> +
> +/* Verify structural equality. */
> +
> +/* Execute fold_vec_perm_cst unit tests. */
> +
> +static void
> +test ()
> +{
> + tree stype = integer_type_node;
> + tree utype = unsigned_type_node;
> + tree x = create_tmp_var_raw (stype, "x");
> + tree y = create_tmp_var_raw (stype, "y");
> + tree z = create_tmp_var_raw (stype, "z");
> + tree four = build_int_cst (stype, 4);
> + tree lhs1 = fold_build2 (PLUS_EXPR, stype, x, y);
> + tree rhs1 = fold_convert (stype,
> + fold_build2 (PLUS_EXPR, utype,
> + fold_convert (utype, x),
> + fold_convert (utype, y)));
> +
> + /* (int)((unsigned x) + (unsigned y)) == x + y. */
> + ASSERT_TRUE (operand_equal_p (lhs1, rhs1, OEP_STRUCTURAL_EQ));
> + ASSERT_FALSE (operand_equal_p (lhs1, rhs1, 0));
> +
> + /* (int)(unsigned) x == x. */
> + tree lhs2 = build1 (NOP_EXPR, stype,
> + build1 (NOP_EXPR, utype, x));
> + tree rhs2 = x;
> + ASSERT_TRUE (operand_equal_p (lhs2, rhs2, OEP_STRUCTURAL_EQ));
> + ASSERT_TRUE (operand_equal_p (lhs2, rhs2, 0));
That's not too useful - we'd never see this due to folding.
> +
> + /* (unsigned x) + (unsigned y) == x + y. */
> + tree lhs3 = lhs1;
> + tree rhs3 = fold_build2 (PLUS_EXPR, utype,
> + fold_convert (utype, x),
> + fold_convert (utype, y));
> + ASSERT_TRUE (operand_equal_p (lhs3, rhs3, OEP_STRUCTURAL_EQ));
> + ASSERT_FALSE (operand_equal_p (lhs3, rhs3, 0));
> +
> + /* (unsigned x) / (unsigned y) == x / y. */
> + tree lhs4 = fold_build2 (TRUNC_DIV_EXPR, stype, x, y);;
> + tree rhs4 = fold_build2 (TRUNC_DIV_EXPR, utype,
> + fold_convert (utype, x),
> + fold_convert (utype, y));
> + ASSERT_FALSE (operand_equal_p (lhs4, rhs4, OEP_STRUCTURAL_EQ));
> + ASSERT_FALSE (operand_equal_p (lhs4, rhs4, 0));
> +
> + /* (unsigned x) / 4 == x / 4. */
> + tree lhs5 = fold_build2 (TRUNC_DIV_EXPR, stype, x, four);;
> + tree rhs5 = fold_build2 (TRUNC_DIV_EXPR, utype,
> + fold_convert (utype, x),
> + fold_convert (utype, four));
> + ASSERT_FALSE (operand_equal_p (lhs5, rhs5, OEP_STRUCTURAL_EQ));
> + ASSERT_FALSE (operand_equal_p (lhs5, rhs5, 0));
> +
> + /* (unsigned x) + 4 == x + 4. */
> + tree lhs6 = fold_build2 (PLUS_EXPR, stype, x, four);
> + tree rhs6 = fold_build2 (PLUS_EXPR, utype,
> + fold_convert (utype, x),
> + fold_convert (utype, four));
> + ASSERT_TRUE (operand_equal_p (lhs6, rhs6, OEP_STRUCTURAL_EQ));
> + ASSERT_FALSE (operand_equal_p (lhs6, rhs6, 0));
> +
> + /* (unsigned x) + 4 == 4 + x. */
> + tree lhs7 = fold_build2 (PLUS_EXPR, stype, four, x);
> + tree rhs7 = fold_build2 (PLUS_EXPR, utype,
> + fold_convert (utype, x),
> + fold_convert (utype, four));
> + ASSERT_TRUE (operand_equal_p (lhs7, rhs7, OEP_STRUCTURAL_EQ));
> + ASSERT_FALSE (operand_equal_p (lhs7, rhs7, 0));
> +
> + /* ((unsigned x) + 4) * (unsigned y)) + z == ((4 + x) * y) + z. */
> + tree lhs8 = fold_build2 (PLUS_EXPR, stype,
> + fold_build2 (MULT_EXPR, stype,
> + fold_build2 (PLUS_EXPR, stype, four,
> x),
> + y),
> + z);
> + tree rhs8 = fold_build2 (MULT_EXPR, utype,
> + fold_build2 (PLUS_EXPR, utype,
> + fold_convert (utype, x),
> + fold_convert (utype, four)),
> + fold_convert (utype, y));
> + rhs8 = fold_build2 (PLUS_EXPR, stype, fold_convert (stype, rhs8), z);
> + ASSERT_TRUE (operand_equal_p (lhs8, rhs8, OEP_STRUCTURAL_EQ));
> + ASSERT_FALSE (operand_equal_p (lhs8, rhs8, 0));
> +}
> +}
> +
> namespace test_fold_vec_perm_cst {
>
> /* Build a VECTOR_CST corresponding to VMODE, and has
> @@ -18150,6 +18267,7 @@ fold_const_cc_tests ()
> test_vector_folding ();
> test_vec_duplicate_folding ();
> test_fold_vec_perm_cst::test ();
> + test_operand_equality::test ();
> }
>
> } // namespace selftest
> diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> @@ -0,0 +1,20 @@
> +! { dg-do compile { target aarch64-*-* } }
> +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> +
> +module a
> +integer, parameter :: b=3, c=b
> +contains
> +subroutine d(block)
> +integer f, col , block(:, :, :), e
> +do f = 1, c
> + do col = 1, c
> + block(:f, :, e()) = do
> + end do
> + end do
> + end
> + end
> +
> +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3
> IVs:} ivopts } }
> +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2
> IVs:} 1 ivopts } }
> +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1
> IVs:} 1 ivopts } }
> +
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index
> 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e07d133bc393aa4a7
> 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -962,7 +962,9 @@ enum operand_equal_flag {
> /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations
> of the same kind. Used to compare VLA bounds involving parameters
> across redeclarations of the same function. */
> - OEP_DECL_NAME = 512
> + OEP_DECL_NAME = 512,
> + /* Check for structural equivalence and ignore NOP_CONVERT casts. */
This should better document the assumptions - it treats expressions producing
the same bit value even though they differ in sign the same(?), and it ignores
differences in UB(?).
OEP_BITWISE is a bit of an odd thing here, it seems to require type
compatibility
for constants in particular - something operand_equal_p usually doesn't. But
while ignoring sign-conversions there it doesn't elsewhere (it
presents itself as
stricter than the default). I'd have called the new mode OEP_BITWISE if that
were not already taken ...
> + OEP_STRUCTURAL_EQ = 1024
> };
>
> /* Enum and arrays used for tree allocation stats.
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index
> c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3ed5e614b7ccc702f
> 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree *use_p,
>
> /* Check if it has the same stripped base and step. */
> if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> - && operand_equal_p (iv->step, use->iv->step, 0)
> - && operand_equal_p (addr_base, use->addr_base, 0))
> + && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ)
> + && operand_equal_p (addr_base, use->addr_base,
> OEP_STRUCTURAL_EQ))
> break;
> }
> if (i == data->vgroups.length ())
>
>
>
>
> --