Re: [PATCH 0/3][POPCOUNT]
Hi Jeff, Thanks for looking into it. On 6 July 2018 at 08:03, Jeff Law wrote: > On 06/24/2018 08:41 PM, Kugan Vivekanandarajah wrote: >> Hi Jeff, >> >> Thanks for the comments. >> >> On 23 June 2018 at 02:06, Jeff Law wrote: >>> On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote: When we set niter with maybe_zero, currently final_value_relacement will not happen due to expression_expensive_p not handling. Patch 1 adds this. With that we have the following optimized gimple. [local count: 118111601]: if (b_4(D) != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: _2 = (unsigned long) b_4(D); _9 = __builtin_popcountl (_2); c_3 = b_4(D) != 0 ? _9 : 1; [local count: 118111601]: # c_12 = PHI I assume that 1 in b_4(D) != 0 ? _9 : 1; is OK (?) because when the latch execute zero times for b_4 == 0 means that the body will execute ones. >>> ISTM that DOM ought to have simplified the conditional, unless there's >>> some other way to get to bb3. We know that b_4 is nonzero and thus c_3 >>> must have the value _9. >> As of now, dom is not optimizing it. With the attached hack, it can be made >> to. > What's strange is I'm not getting the c_3 = (b_4 != 0) ... in any of the > dumps I'm looking at. Instead it's c_3 = _9, which is what I would > expect since we know that b_4 != 0 > > > My tests have been on x86_64 and aarch64 linux targets. I've tried with > patch#1 installed as well as with patch #1 and patch #2 together. > > What target, what flags and what patches do I need to see this? You need the patch 1 (attaching) to get that. With Patch 2 in this series, it will be optimized. I haven't committed the patches yet as I am testing all the three patches. I will commit after testing on current trunk. Thanks, Kugan > > Jeff From 12263df77931aa55d205b9db470436848d762684 Mon Sep 17 00:00:00 2001 From: Kugan Vivekanandarajah Date: Fri, 22 Jun 2018 14:10:26 +1000 Subject: [PATCH 1/3] generate popcount when checked for zero Change-Id: I951e6d487268b757cbdaa8dcf671ab1377490db6 --- gcc/gimplify.c | 2 +- gcc/gimplify.h | 1 + gcc/testsuite/gcc.dg/tree-ssa/pr64183.c | 2 +- gcc/testsuite/gcc.target/i386/pr85073.c | 2 +- gcc/tree-scalar-evolution.c | 12 5 files changed, 16 insertions(+), 3 deletions(-) diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 48ac92e..c86ad1a 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -3878,7 +3878,7 @@ gimplify_pure_cond_expr (tree *expr_p, gimple_seq *pre_p) EXPR is GENERIC, while tree_could_trap_p can be called only on GIMPLE. */ -static bool +bool generic_expr_could_trap_p (tree expr) { unsigned i, n; diff --git a/gcc/gimplify.h b/gcc/gimplify.h index dd0e4c0..62ca869 100644 --- a/gcc/gimplify.h +++ b/gcc/gimplify.h @@ -83,6 +83,7 @@ extern enum gimplify_status gimplify_arg (tree *, gimple_seq *, location_t, extern void gimplify_function_tree (tree); extern enum gimplify_status gimplify_va_arg_expr (tree *, gimple_seq *, gimple_seq *); +extern bool generic_expr_could_trap_p (tree expr); gimple *gimplify_assign (tree, tree, gimple_seq *); #endif /* GCC_GIMPLIFY_H */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64183.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64183.c index 7a854fc..50d0c5a 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr64183.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64183.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -fno-tree-vectorize -fdump-tree-cunroll-details" } */ +/* { dg-options "-O3 -fno-tree-vectorize -fdisable-tree-sccp -fdump-tree-cunroll-details" } */ int bits; unsigned int size; diff --git a/gcc/testsuite/gcc.target/i386/pr85073.c b/gcc/testsuite/gcc.target/i386/pr85073.c index 187102d..71a5d23 100644 --- a/gcc/testsuite/gcc.target/i386/pr85073.c +++ b/gcc/testsuite/gcc.target/i386/pr85073.c @@ -1,6 +1,6 @@ /* PR target/85073 */ /* { dg-do compile } */ -/* { dg-options "-O2 -mbmi" } */ +/* { dg-options "-O2 -mbmi -fdisable-tree-sccp" } */ int foo (unsigned x) diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c index 4b0ec02..8e29005 100644 --- a/gcc/tree-scalar-evolution.c +++ b/gcc/tree-scalar-evolution.c @@ -3508,6 +3508,18 @@ expression_expensive_p (tree expr) return false; } + if (code == COND_EXPR) +return (expression_expensive_p (TREE_OPERAND (expr, 0)) + || (EXPR_P (TREE_OPERAND (expr, 1)) + && EXPR_P (TREE_OPERAND (expr, 2))) + /* If either branch has side effects or could trap. */ + || TREE_SIDE_EFFECTS (TREE_OPERAND (expr, 1)) + || generic_expr_could_trap_p (TREE_OPERAND (expr, 1)) + || TREE_SIDE_EFFECTS (TREE_OPERAND (expr, 0)) + || generic_expr_could_trap_p (TREE_OPERAND (expr, 0)) + || expression_expensive_p (TREE_OPERAND (expr, 1)) + || expression_expensive_p (TREE_OPERAND
Re: [PATCH 0/3][POPCOUNT]
On 07/05/2018 11:39 PM, Richard Biener wrote: > On July 6, 2018 12:03:11 AM GMT+02:00, Jeff Law wrote: >> On 06/24/2018 08:41 PM, Kugan Vivekanandarajah wrote: >>> Hi Jeff, >>> >>> Thanks for the comments. >>> >>> On 23 June 2018 at 02:06, Jeff Law wrote: On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote: > When we set niter with maybe_zero, currently final_value_relacement > will not happen due to expression_expensive_p not handling. Patch 1 > adds this. > > With that we have the following optimized gimple. > >[local count: 118111601]: > if (b_4(D) != 0) > goto ; [89.00%] > else > goto ; [11.00%] > >[local count: 105119324]: > _2 = (unsigned long) b_4(D); > _9 = __builtin_popcountl (_2); > c_3 = b_4(D) != 0 ? _9 : 1; > >[local count: 118111601]: > # c_12 = PHI > > I assume that 1 in b_4(D) != 0 ? _9 : 1; is OK (?) because when >> the > latch execute zero times for b_4 == 0 means that the body will >> execute > ones. ISTM that DOM ought to have simplified the conditional, unless >> there's some other way to get to bb3. We know that b_4 is nonzero and thus >> c_3 must have the value _9. >>> As of now, dom is not optimizing it. With the attached hack, it can >> be made to. >> What's strange is I'm not getting the c_3 = (b_4 != 0) ... in any of >> the >> dumps I'm looking at. Instead it's c_3 = _9, which is what I would >> expect since we know that b_4 != 0 >> >> >> My tests have been on x86_64 and aarch64 linux targets. I've tried >> with >> patch#1 installed as well as with patch #1 and patch #2 together. >> >> What target, what flags and what patches do I need to see this? > > I believe it has been fixed in niters analysis to avoid the condition if it > is known to be true. Ah. Presumably the change from 6-16. I'll back that out and retry. jeff
Re: [PATCH 0/3][POPCOUNT]
On July 6, 2018 12:03:11 AM GMT+02:00, Jeff Law wrote: >On 06/24/2018 08:41 PM, Kugan Vivekanandarajah wrote: >> Hi Jeff, >> >> Thanks for the comments. >> >> On 23 June 2018 at 02:06, Jeff Law wrote: >>> On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote: When we set niter with maybe_zero, currently final_value_relacement will not happen due to expression_expensive_p not handling. Patch 1 adds this. With that we have the following optimized gimple. [local count: 118111601]: if (b_4(D) != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: _2 = (unsigned long) b_4(D); _9 = __builtin_popcountl (_2); c_3 = b_4(D) != 0 ? _9 : 1; [local count: 118111601]: # c_12 = PHI I assume that 1 in b_4(D) != 0 ? _9 : 1; is OK (?) because when >the latch execute zero times for b_4 == 0 means that the body will >execute ones. >>> ISTM that DOM ought to have simplified the conditional, unless >there's >>> some other way to get to bb3. We know that b_4 is nonzero and thus >c_3 >>> must have the value _9. >> As of now, dom is not optimizing it. With the attached hack, it can >be made to. >What's strange is I'm not getting the c_3 = (b_4 != 0) ... in any of >the >dumps I'm looking at. Instead it's c_3 = _9, which is what I would >expect since we know that b_4 != 0 > > >My tests have been on x86_64 and aarch64 linux targets. I've tried >with >patch#1 installed as well as with patch #1 and patch #2 together. > >What target, what flags and what patches do I need to see this? I believe it has been fixed in niters analysis to avoid the condition if it is known to be true. Richard. >Jeff
Re: [PATCH 0/3][POPCOUNT]
On 06/24/2018 08:41 PM, Kugan Vivekanandarajah wrote: > Hi Jeff, > > Thanks for the comments. > > On 23 June 2018 at 02:06, Jeff Law wrote: >> On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote: >>> When we set niter with maybe_zero, currently final_value_relacement >>> will not happen due to expression_expensive_p not handling. Patch 1 >>> adds this. >>> >>> With that we have the following optimized gimple. >>> >>>[local count: 118111601]: >>> if (b_4(D) != 0) >>> goto ; [89.00%] >>> else >>> goto ; [11.00%] >>> >>>[local count: 105119324]: >>> _2 = (unsigned long) b_4(D); >>> _9 = __builtin_popcountl (_2); >>> c_3 = b_4(D) != 0 ? _9 : 1; >>> >>>[local count: 118111601]: >>> # c_12 = PHI >>> >>> I assume that 1 in b_4(D) != 0 ? _9 : 1; is OK (?) because when the >>> latch execute zero times for b_4 == 0 means that the body will execute >>> ones. >> ISTM that DOM ought to have simplified the conditional, unless there's >> some other way to get to bb3. We know that b_4 is nonzero and thus c_3 >> must have the value _9. > As of now, dom is not optimizing it. With the attached hack, it can be made > to. What's strange is I'm not getting the c_3 = (b_4 != 0) ... in any of the dumps I'm looking at. Instead it's c_3 = _9, which is what I would expect since we know that b_4 != 0 My tests have been on x86_64 and aarch64 linux targets. I've tried with patch#1 installed as well as with patch #1 and patch #2 together. What target, what flags and what patches do I need to see this? Jeff
Re: [PATCH 0/3][POPCOUNT]
On 06/24/2018 08:41 PM, Kugan Vivekanandarajah wrote: > Hi Jeff, > > Thanks for the comments. > > On 23 June 2018 at 02:06, Jeff Law wrote: >> On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote: >>> When we set niter with maybe_zero, currently final_value_relacement >>> will not happen due to expression_expensive_p not handling. Patch 1 >>> adds this. >>> >>> With that we have the following optimized gimple. >>> >>>[local count: 118111601]: >>> if (b_4(D) != 0) >>> goto ; [89.00%] >>> else >>> goto ; [11.00%] >>> >>>[local count: 105119324]: >>> _2 = (unsigned long) b_4(D); >>> _9 = __builtin_popcountl (_2); >>> c_3 = b_4(D) != 0 ? _9 : 1; >>> >>>[local count: 118111601]: >>> # c_12 = PHI >>> >>> I assume that 1 in b_4(D) != 0 ? _9 : 1; is OK (?) because when the >>> latch execute zero times for b_4 == 0 means that the body will execute >>> ones. >> ISTM that DOM ought to have simplified the conditional, unless there's >> some other way to get to bb3. We know that b_4 is nonzero and thus c_3 >> must have the value _9. > As of now, dom is not optimizing it. With the attached hack, it can be made > to. Thanks. It's not as much of a hack as you might think. Right now there's two ways to get at the data we're looking for. One is to query the VRP engine like you've done. The other is to query the expression table which should have something like 1 = (b4 != 0) in it as that expression holds on the 2->3 edge which dominates bb3. One of the things I want to do this summer is integrate the VRP engine queries better into DOM and stop relying on adding conditionals into the expression table (which isn't as powerful and doesn't scale well). So your patch isn't a terrible hack. It's just not the well integrated solution I want to pull together :-) I think it was just last year when I started wondering if we were handling COND_EXPRs on the RHS of an assignment correctly, but it got lost on my stack of TODO items. It's good to have a testcase which confirms my suspicion that we're not handling them as well as we should. I've opened bz86339 to track this issue. Jeff
Re: [PATCH 0/3][POPCOUNT]
On Mon, Jun 25, 2018 at 11:30 AM Bin.Cheng wrote: > > On Mon, Jun 25, 2018 at 1:50 PM, Kugan Vivekanandarajah > wrote: > > Hi Bin, > > > > On 25 June 2018 at 13:56, Bin.Cheng wrote: > >> On Mon, Jun 25, 2018 at 11:37 AM, Kugan Vivekanandarajah > >> wrote: > >>> Hi Bin, > >>> > >>> Thanks for your comments. > >>> > >>> On 25 June 2018 at 11:15, Bin.Cheng wrote: > On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah > wrote: > > When we set niter with maybe_zero, currently final_value_relacement > > will not happen due to expression_expensive_p not handling. Patch 1 > > adds this. > > > > With that we have the following optimized gimple. > > > >[local count: 118111601]: > > if (b_4(D) != 0) > > goto ; [89.00%] > > else > > goto ; [11.00%] > > > >[local count: 105119324]: > > _2 = (unsigned long) b_4(D); > > _9 = __builtin_popcountl (_2); > > c_3 = b_4(D) != 0 ? _9 : 1; > > > >[local count: 118111601]: > > # c_12 = PHI > > > > I assume that 1 in b_4(D) != 0 ? _9 : 1; is OK (?) because when the > No, it doesn't make much sense. when b_4(D) == 0, the popcount > computed should be 0. Point is you can never get b_4(D) == 0 with > guard condition in basic block 2. So the result should simply be: > >>> > >>> When we do calculate niter (for the copy header case), we set > >>> may_be_zero as (which I think is correct) > >>> niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src, > >>> build_zero_cst > >>> (TREE_TYPE (src))); > >>> > >>> Then in final_value_replacement_loop (struct loop *loop) > >>> > >>> for the PHI stmt for which we are going to do the final value replacement, > >>> we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC. > >>> > >>> then we do > >>> compute_overall_effect_of_inner_loop (struct loop *loop, tree > >>> evolution_fn) > >>> > >>> where when we do chrec_apply to the polynomial_chrec with niter from > >>> popcount which also has the may_be_zero, we end up with the 1. > >>> Looking at this, I am not sure if this is wrong. May be I am missing > >>> something. > >> I think it is wrong. How could you get popcount == 1 when b_4(D) == > >> 0? Though it never happens in this case. > > > > We dont set popcount = 1. When we set niter for popcount pattern with > > niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src, > > build_zero_cst (TREE_TYPE (src))); > Hmm, I think this is unnecessary and causing the weird cond_expr in > following optimization. What happens if you simply set it to false? It is surely not unnecessary because we set it to non-NULL only when the result is _not_ simply popcount() but popcount()-1. All the above suggests that SCEV does sth wrong. Dumps show (set_nb_iterations_in_loop = b_4(D) != 0 ? (unsigned long) __builtin_popcountl ((unsigned long) b_4(D)) + 18446744073709551615 : 0)) (chrec_apply (varying_loop = 1 ) (chrec = {1, +, 1}_1) (x = b_4(D) != 0 ? (int) ((unsigned int) __builtin_popcountl ((unsigned long) b_4(D)) + 4294967295) : 0) (res = b_4(D) != 0 ? __builtin_popcountl ((unsigned long) b_4(D)) : 1)) ) this is because the analysis works on a header-copied loop and thus the IV for C is {1, +, 1} so this is all correct and to be simply optimized by following passes factoring in the range of b_4(D) which was tested to be _not_zero before. Richard. > Thanks, > bin > > > > Because of which, we have an niter in the final_value_replacement, we have > > (gdb) p debug_tree (niter) > > > type > size > > unit-size > > align:64 warn_if_not_align:0 symtab:0 alias-set -1 > > canonical-type 0x7694d1f8 precision:64 min > 0x7694a120 0> max > 18446744073709551615>> > > > > arg:0 > type > size > > unit-size > > align:8 warn_if_not_align:0 symtab:0 alias-set -1 > > canonical-type 0x76945b28 precision:1 min > 0x7694a048 0> max > > > > > arg:0 > 0x76945738 long int> > > visited var > > def_stmt GIMPLE_NOPvolu > > version:4> > > arg:1 > > > arg:1 > > > > arg:0 > > > > arg:0 > 0x769455e8 int> > > > > fn > 0x76a55888> > > readonly constant arg:0 > 0x769ff600 __builtin_popcountl>> > > arg:0 > 0x7694d1f8> > > arg:0 >>> > > arg:1 > > > arg:2 > 0x7694d1f8> constant 0>> > > > > Then from there then we do compute_overall_effect_of_inner_loop for > > scalar evolution of PHI with niter we get the 1. > > > >>> > >>> In this testcase, before we enter the loop we have a check for (b_4(D) > 0). Thus, setting niter->may_be_zero is not strictly necessary but > >>> conservatively correct (?). > >> Yes, but not necessarily. Setting maybe_zero could con
Re: [PATCH 0/3][POPCOUNT]
On Mon, Jun 25, 2018 at 1:50 PM, Kugan Vivekanandarajah wrote: > Hi Bin, > > On 25 June 2018 at 13:56, Bin.Cheng wrote: >> On Mon, Jun 25, 2018 at 11:37 AM, Kugan Vivekanandarajah >> wrote: >>> Hi Bin, >>> >>> Thanks for your comments. >>> >>> On 25 June 2018 at 11:15, Bin.Cheng wrote: On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah wrote: > When we set niter with maybe_zero, currently final_value_relacement > will not happen due to expression_expensive_p not handling. Patch 1 > adds this. > > With that we have the following optimized gimple. > >[local count: 118111601]: > if (b_4(D) != 0) > goto ; [89.00%] > else > goto ; [11.00%] > >[local count: 105119324]: > _2 = (unsigned long) b_4(D); > _9 = __builtin_popcountl (_2); > c_3 = b_4(D) != 0 ? _9 : 1; > >[local count: 118111601]: > # c_12 = PHI > > I assume that 1 in b_4(D) != 0 ? _9 : 1; is OK (?) because when the No, it doesn't make much sense. when b_4(D) == 0, the popcount computed should be 0. Point is you can never get b_4(D) == 0 with guard condition in basic block 2. So the result should simply be: >>> >>> When we do calculate niter (for the copy header case), we set >>> may_be_zero as (which I think is correct) >>> niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src, >>> build_zero_cst >>> (TREE_TYPE (src))); >>> >>> Then in final_value_replacement_loop (struct loop *loop) >>> >>> for the PHI stmt for which we are going to do the final value replacement, >>> we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC. >>> >>> then we do >>> compute_overall_effect_of_inner_loop (struct loop *loop, tree evolution_fn) >>> >>> where when we do chrec_apply to the polynomial_chrec with niter from >>> popcount which also has the may_be_zero, we end up with the 1. >>> Looking at this, I am not sure if this is wrong. May be I am missing >>> something. >> I think it is wrong. How could you get popcount == 1 when b_4(D) == >> 0? Though it never happens in this case. > > We dont set popcount = 1. When we set niter for popcount pattern with > niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src, > build_zero_cst (TREE_TYPE (src))); Hmm, I think this is unnecessary and causing the weird cond_expr in following optimization. What happens if you simply set it to false? Thanks, bin > > Because of which, we have an niter in the final_value_replacement, we have > (gdb) p debug_tree (niter) > type size > unit-size > align:64 warn_if_not_align:0 symtab:0 alias-set -1 > canonical-type 0x7694d1f8 precision:64 min 0x7694a120 0> max 18446744073709551615>> > > arg:0 type size > unit-size > align:8 warn_if_not_align:0 symtab:0 alias-set -1 > canonical-type 0x76945b28 precision:1 min 0x7694a048 0> max > > > arg:0 0x76945738 long int> > visited var > def_stmt GIMPLE_NOPvolu > version:4> > arg:1 > > arg:1 > > arg:0 > > arg:0 0x769455e8 int> > > fn 0x76a55888> > readonly constant arg:0 0x769ff600 __builtin_popcountl>> > arg:0 0x7694d1f8> > arg:0 >>> > arg:1 > > arg:2 0x7694d1f8> constant 0>> > > Then from there then we do compute_overall_effect_of_inner_loop for > scalar evolution of PHI with niter we get the 1. > >>> >>> In this testcase, before we enter the loop we have a check for (b_4(D) 0). Thus, setting niter->may_be_zero is not strictly necessary but >>> conservatively correct (?). >> Yes, but not necessarily. Setting maybe_zero could confuse following >> optimizations and we should avoid doing that whenever possible. If >> any pass goes wrong because it's not set conservatively, it is that >> pass' responsibility and should be fixed accordingly. Here IMHO, we >> don't need to set it. > > My patch 2 is for not setting this when we know know a_4(D) is not > zero in this path. > > Thanks, > Kugan > > > > >> >> Thanks, >> bin >>> >>> Thanks, >>> Kugan >>> >[local count: 118111601]: > if (b_4(D) != 0) > goto ; [89.00%] > else > goto ; [11.00%] > >[local count: 105119324]: > _2 = (unsigned long) b_4(D); > c_3 = __builtin_popcountl (_2); > >[local count: 118111601]: > # c_12 = PHI I think this is the code generated if maybe_zero is not set? which it should not be set here. For the same reason, it can be further optimized into: >[local count: 118111601]: > _2 = (unsigned long) b_4(D); > c_12 = __builtin_popcountl (_2); > > latch execute zero times for b_4 == 0 means that the body will execute >>
Re: [PATCH 0/3][POPCOUNT]
Hi Bin, On 25 June 2018 at 13:56, Bin.Cheng wrote: > On Mon, Jun 25, 2018 at 11:37 AM, Kugan Vivekanandarajah > wrote: >> Hi Bin, >> >> Thanks for your comments. >> >> On 25 June 2018 at 11:15, Bin.Cheng wrote: >>> On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah >>> wrote: When we set niter with maybe_zero, currently final_value_relacement will not happen due to expression_expensive_p not handling. Patch 1 adds this. With that we have the following optimized gimple. [local count: 118111601]: if (b_4(D) != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: _2 = (unsigned long) b_4(D); _9 = __builtin_popcountl (_2); c_3 = b_4(D) != 0 ? _9 : 1; [local count: 118111601]: # c_12 = PHI I assume that 1 in b_4(D) != 0 ? _9 : 1; is OK (?) because when the >>> No, it doesn't make much sense. when b_4(D) == 0, the popcount >>> computed should be 0. Point is you can never get b_4(D) == 0 with >>> guard condition in basic block 2. So the result should simply be: >> >> When we do calculate niter (for the copy header case), we set >> may_be_zero as (which I think is correct) >> niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src, >> build_zero_cst >> (TREE_TYPE (src))); >> >> Then in final_value_replacement_loop (struct loop *loop) >> >> for the PHI stmt for which we are going to do the final value replacement, >> we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC. >> >> then we do >> compute_overall_effect_of_inner_loop (struct loop *loop, tree evolution_fn) >> >> where when we do chrec_apply to the polynomial_chrec with niter from >> popcount which also has the may_be_zero, we end up with the 1. >> Looking at this, I am not sure if this is wrong. May be I am missing >> something. > I think it is wrong. How could you get popcount == 1 when b_4(D) == > 0? Though it never happens in this case. We dont set popcount = 1. When we set niter for popcount pattern with niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src, build_zero_cst (TREE_TYPE (src))); Because of which, we have an niter in the final_value_replacement, we have (gdb) p debug_tree (niter) unit-size align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7694d1f8 precision:64 min max > arg:0 unit-size align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x76945b28 precision:1 min max > arg:0 visited var def_stmt GIMPLE_NOPvolu version:4> arg:1 > arg:1 arg:0 arg:0 fn readonly constant arg:0 > arg:0 arg:0 >>> arg:1 > arg:2 constant 0>> Then from there then we do compute_overall_effect_of_inner_loop for scalar evolution of PHI with niter we get the 1. >> >> In this testcase, before we enter the loop we have a check for (b_4(D) >>> 0). Thus, setting niter->may_be_zero is not strictly necessary but >> conservatively correct (?). > Yes, but not necessarily. Setting maybe_zero could confuse following > optimizations and we should avoid doing that whenever possible. If > any pass goes wrong because it's not set conservatively, it is that > pass' responsibility and should be fixed accordingly. Here IMHO, we > don't need to set it. My patch 2 is for not setting this when we know know a_4(D) is not zero in this path. Thanks, Kugan > > Thanks, > bin >> >> Thanks, >> Kugan >> >>> [local count: 118111601]: if (b_4(D) != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: _2 = (unsigned long) b_4(D); c_3 = __builtin_popcountl (_2); [local count: 118111601]: # c_12 = PHI >>> >>> I think this is the code generated if maybe_zero is not set? which it >>> should not be set here. >>> For the same reason, it can be further optimized into: >>> [local count: 118111601]: _2 = (unsigned long) b_4(D); c_12 = __builtin_popcountl (_2); >>> latch execute zero times for b_4 == 0 means that the body will execute ones. >>> You never get zero times latch here with the aforementioned guard condition. >>> >>> BTW, I didn't look at following patches which could be wanted optimizations. >>> >>> Thanks, >>> bin The issue here is, since we are checking if (b_4(D) != 0) before entering the loop means we don't need to set maybe_zero. Patch 2 handles this. With that we have [local count: 118111601]: if (b_4(D) != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: _2 = (unsigned long) b_4(D); _9 = __builtin_popcountl (_2);
Re: [PATCH 0/3][POPCOUNT]
On Mon, Jun 25, 2018 at 11:37 AM, Kugan Vivekanandarajah wrote: > Hi Bin, > > Thanks for your comments. > > On 25 June 2018 at 11:15, Bin.Cheng wrote: >> On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah >> wrote: >>> When we set niter with maybe_zero, currently final_value_relacement >>> will not happen due to expression_expensive_p not handling. Patch 1 >>> adds this. >>> >>> With that we have the following optimized gimple. >>> >>>[local count: 118111601]: >>> if (b_4(D) != 0) >>> goto ; [89.00%] >>> else >>> goto ; [11.00%] >>> >>>[local count: 105119324]: >>> _2 = (unsigned long) b_4(D); >>> _9 = __builtin_popcountl (_2); >>> c_3 = b_4(D) != 0 ? _9 : 1; >>> >>>[local count: 118111601]: >>> # c_12 = PHI >>> >>> I assume that 1 in b_4(D) != 0 ? _9 : 1; is OK (?) because when the >> No, it doesn't make much sense. when b_4(D) == 0, the popcount >> computed should be 0. Point is you can never get b_4(D) == 0 with >> guard condition in basic block 2. So the result should simply be: > > When we do calculate niter (for the copy header case), we set > may_be_zero as (which I think is correct) > niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src, > build_zero_cst > (TREE_TYPE (src))); > > Then in final_value_replacement_loop (struct loop *loop) > > for the PHI stmt for which we are going to do the final value replacement, > we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC. > > then we do > compute_overall_effect_of_inner_loop (struct loop *loop, tree evolution_fn) > > where when we do chrec_apply to the polynomial_chrec with niter from > popcount which also has the may_be_zero, we end up with the 1. > Looking at this, I am not sure if this is wrong. May be I am missing > something. I think it is wrong. How could you get popcount == 1 when b_4(D) == 0? Though it never happens in this case. > > In this testcase, before we enter the loop we have a check for (b_4(D) >> 0). Thus, setting niter->may_be_zero is not strictly necessary but > conservatively correct (?). Yes, but not necessarily. Setting maybe_zero could confuse following optimizations and we should avoid doing that whenever possible. If any pass goes wrong because it's not set conservatively, it is that pass' responsibility and should be fixed accordingly. Here IMHO, we don't need to set it. Thanks, bin > > Thanks, > Kugan > >> >>>[local count: 118111601]: >>> if (b_4(D) != 0) >>> goto ; [89.00%] >>> else >>> goto ; [11.00%] >>> >>>[local count: 105119324]: >>> _2 = (unsigned long) b_4(D); >>> c_3 = __builtin_popcountl (_2); >>> >>>[local count: 118111601]: >>> # c_12 = PHI >> >> I think this is the code generated if maybe_zero is not set? which it >> should not be set here. >> For the same reason, it can be further optimized into: >> >>>[local count: 118111601]: >>> _2 = (unsigned long) b_4(D); >>> c_12 = __builtin_popcountl (_2); >>> >> >>> latch execute zero times for b_4 == 0 means that the body will execute >>> ones. >> You never get zero times latch here with the aforementioned guard condition. >> >> BTW, I didn't look at following patches which could be wanted optimizations. >> >> Thanks, >> bin >>> >>> The issue here is, since we are checking if (b_4(D) != 0) before >>> entering the loop means we don't need to set maybe_zero. Patch 2 >>> handles this. >>> >>> With that we have >>>[local count: 118111601]: >>> if (b_4(D) != 0) >>> goto ; [89.00%] >>> else >>> goto ; [11.00%] >>> >>>[local count: 105119324]: >>> _2 = (unsigned long) b_4(D); >>> _9 = __builtin_popcountl (_2); >>> >>>[local count: 118111601]: >>> # c_12 = PHI <0(2), _9(3)> >>> >>> As advised earlier, patch 3 adds phiopt support to remove this. >>> >>> Bootstrap and regression testing are ongoing. >>> >>> Is this OK for trunk. >>> >>> Thanks, >>> Kugan
Re: [PATCH 0/3][POPCOUNT]
Hi Bin, Thanks for your comments. On 25 June 2018 at 11:15, Bin.Cheng wrote: > On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah > wrote: >> When we set niter with maybe_zero, currently final_value_relacement >> will not happen due to expression_expensive_p not handling. Patch 1 >> adds this. >> >> With that we have the following optimized gimple. >> >>[local count: 118111601]: >> if (b_4(D) != 0) >> goto ; [89.00%] >> else >> goto ; [11.00%] >> >>[local count: 105119324]: >> _2 = (unsigned long) b_4(D); >> _9 = __builtin_popcountl (_2); >> c_3 = b_4(D) != 0 ? _9 : 1; >> >>[local count: 118111601]: >> # c_12 = PHI >> >> I assume that 1 in b_4(D) != 0 ? _9 : 1; is OK (?) because when the > No, it doesn't make much sense. when b_4(D) == 0, the popcount > computed should be 0. Point is you can never get b_4(D) == 0 with > guard condition in basic block 2. So the result should simply be: When we do calculate niter (for the copy header case), we set may_be_zero as (which I think is correct) niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src, build_zero_cst (TREE_TYPE (src))); Then in final_value_replacement_loop (struct loop *loop) for the PHI stmt for which we are going to do the final value replacement, we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC. then we do compute_overall_effect_of_inner_loop (struct loop *loop, tree evolution_fn) where when we do chrec_apply to the polynomial_chrec with niter from popcount which also has the may_be_zero, we end up with the 1. Looking at this, I am not sure if this is wrong. May be I am missing something. In this testcase, before we enter the loop we have a check for (b_4(D) > 0). Thus, setting niter->may_be_zero is not strictly necessary but conservatively correct (?). Thanks, Kugan > >>[local count: 118111601]: >> if (b_4(D) != 0) >> goto ; [89.00%] >> else >> goto ; [11.00%] >> >>[local count: 105119324]: >> _2 = (unsigned long) b_4(D); >> c_3 = __builtin_popcountl (_2); >> >>[local count: 118111601]: >> # c_12 = PHI > > I think this is the code generated if maybe_zero is not set? which it > should not be set here. > For the same reason, it can be further optimized into: > >>[local count: 118111601]: >> _2 = (unsigned long) b_4(D); >> c_12 = __builtin_popcountl (_2); >> > >> latch execute zero times for b_4 == 0 means that the body will execute >> ones. > You never get zero times latch here with the aforementioned guard condition. > > BTW, I didn't look at following patches which could be wanted optimizations. > > Thanks, > bin >> >> The issue here is, since we are checking if (b_4(D) != 0) before >> entering the loop means we don't need to set maybe_zero. Patch 2 >> handles this. >> >> With that we have >>[local count: 118111601]: >> if (b_4(D) != 0) >> goto ; [89.00%] >> else >> goto ; [11.00%] >> >>[local count: 105119324]: >> _2 = (unsigned long) b_4(D); >> _9 = __builtin_popcountl (_2); >> >>[local count: 118111601]: >> # c_12 = PHI <0(2), _9(3)> >> >> As advised earlier, patch 3 adds phiopt support to remove this. >> >> Bootstrap and regression testing are ongoing. >> >> Is this OK for trunk. >> >> Thanks, >> Kugan
Re: [PATCH 0/3][POPCOUNT]
Hi Jeff, Thanks for the comments. On 23 June 2018 at 02:06, Jeff Law wrote: > On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote: >> When we set niter with maybe_zero, currently final_value_relacement >> will not happen due to expression_expensive_p not handling. Patch 1 >> adds this. >> >> With that we have the following optimized gimple. >> >>[local count: 118111601]: >> if (b_4(D) != 0) >> goto ; [89.00%] >> else >> goto ; [11.00%] >> >>[local count: 105119324]: >> _2 = (unsigned long) b_4(D); >> _9 = __builtin_popcountl (_2); >> c_3 = b_4(D) != 0 ? _9 : 1; >> >>[local count: 118111601]: >> # c_12 = PHI >> >> I assume that 1 in b_4(D) != 0 ? _9 : 1; is OK (?) because when the >> latch execute zero times for b_4 == 0 means that the body will execute >> ones. > ISTM that DOM ought to have simplified the conditional, unless there's > some other way to get to bb3. We know that b_4 is nonzero and thus c_3 > must have the value _9. As of now, dom is not optimizing it. With the attached hack, it can be made to. > > >> >> The issue here is, since we are checking if (b_4(D) != 0) before >> entering the loop means we don't need to set maybe_zero. Patch 2 >> handles this. >> >> With that we have >>[local count: 118111601]: >> if (b_4(D) != 0) >> goto ; [89.00%] >> else >> goto ; [11.00%] >> >>[local count: 105119324]: >> _2 = (unsigned long) b_4(D); >> _9 = __builtin_popcountl (_2); >> >>[local count: 118111601]: >> # c_12 = PHI <0(2), _9(3)> >> >> As advised earlier, patch 3 adds phiopt support to remove this. > So if DOM or some other pass fixed up the assignment to c_3 I'd hope we > wouldn't set maybe_zero. > > So I'd like to start by first determining if we should have already > simplified the COND_EXPR in bb3. Do you have a testcase for that? Sorry, It is hidden in patch 3 (attaching now). You will need patch 1 as well. Thanks, Kugan > > > jeff diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c index a6f176c..77ae7d1b 100644 --- a/gcc/tree-ssa-dom.c +++ b/gcc/tree-ssa-dom.c @@ -1991,6 +1991,25 @@ dom_opt_dom_walker::optimize_stmt (basic_block bb, gimple_stmt_iterator si) } } + if (gimple_code (stmt) == GIMPLE_ASSIGN + && gimple_assign_rhs_code (stmt) == COND_EXPR) + { + /* If this is a conditional stmt, see if we can optimize the +condition. */ + x_vr_values = evrp_range_analyzer.get_vr_values (); + tree exp = gimple_assign_rhs1 (stmt); + tree lhs = TREE_OPERAND (exp, 0); + tree rhs = TREE_OPERAND (exp, 1); + tree val = x_vr_values->vrp_evaluate_conditional (TREE_CODE (exp), + lhs, rhs, stmt); + if (is_gimple_min_invariant (val)) + { + gimple_assign_set_rhs1 (stmt, val); + update_stmt (stmt); + } + x_vr_values = NULL; + } + if (gimple_code (stmt) == GIMPLE_COND) { tree lhs = gimple_cond_lhs (stmt); int PopCount (long b) { int c = 0; while (b) { b &= b - 1; c++; } return c; }
Re: [PATCH 0/3][POPCOUNT]
On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah wrote: > When we set niter with maybe_zero, currently final_value_relacement > will not happen due to expression_expensive_p not handling. Patch 1 > adds this. > > With that we have the following optimized gimple. > >[local count: 118111601]: > if (b_4(D) != 0) > goto ; [89.00%] > else > goto ; [11.00%] > >[local count: 105119324]: > _2 = (unsigned long) b_4(D); > _9 = __builtin_popcountl (_2); > c_3 = b_4(D) != 0 ? _9 : 1; > >[local count: 118111601]: > # c_12 = PHI > > I assume that 1 in b_4(D) != 0 ? _9 : 1; is OK (?) because when the No, it doesn't make much sense. when b_4(D) == 0, the popcount computed should be 0. Point is you can never get b_4(D) == 0 with guard condition in basic block 2. So the result should simply be: >[local count: 118111601]: > if (b_4(D) != 0) > goto ; [89.00%] > else > goto ; [11.00%] > >[local count: 105119324]: > _2 = (unsigned long) b_4(D); > c_3 = __builtin_popcountl (_2); > >[local count: 118111601]: > # c_12 = PHI I think this is the code generated if maybe_zero is not set? which it should not be set here. For the same reason, it can be further optimized into: >[local count: 118111601]: > _2 = (unsigned long) b_4(D); > c_12 = __builtin_popcountl (_2); > > latch execute zero times for b_4 == 0 means that the body will execute > ones. You never get zero times latch here with the aforementioned guard condition. BTW, I didn't look at following patches which could be wanted optimizations. Thanks, bin > > The issue here is, since we are checking if (b_4(D) != 0) before > entering the loop means we don't need to set maybe_zero. Patch 2 > handles this. > > With that we have >[local count: 118111601]: > if (b_4(D) != 0) > goto ; [89.00%] > else > goto ; [11.00%] > >[local count: 105119324]: > _2 = (unsigned long) b_4(D); > _9 = __builtin_popcountl (_2); > >[local count: 118111601]: > # c_12 = PHI <0(2), _9(3)> > > As advised earlier, patch 3 adds phiopt support to remove this. > > Bootstrap and regression testing are ongoing. > > Is this OK for trunk. > > Thanks, > Kugan
Re: [PATCH 0/3][POPCOUNT]
On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote: > When we set niter with maybe_zero, currently final_value_relacement > will not happen due to expression_expensive_p not handling. Patch 1 > adds this. > > With that we have the following optimized gimple. > >[local count: 118111601]: > if (b_4(D) != 0) > goto ; [89.00%] > else > goto ; [11.00%] > >[local count: 105119324]: > _2 = (unsigned long) b_4(D); > _9 = __builtin_popcountl (_2); > c_3 = b_4(D) != 0 ? _9 : 1; > >[local count: 118111601]: > # c_12 = PHI > > I assume that 1 in b_4(D) != 0 ? _9 : 1; is OK (?) because when the > latch execute zero times for b_4 == 0 means that the body will execute > ones. ISTM that DOM ought to have simplified the conditional, unless there's some other way to get to bb3. We know that b_4 is nonzero and thus c_3 must have the value _9. > > The issue here is, since we are checking if (b_4(D) != 0) before > entering the loop means we don't need to set maybe_zero. Patch 2 > handles this. > > With that we have >[local count: 118111601]: > if (b_4(D) != 0) > goto ; [89.00%] > else > goto ; [11.00%] > >[local count: 105119324]: > _2 = (unsigned long) b_4(D); > _9 = __builtin_popcountl (_2); > >[local count: 118111601]: > # c_12 = PHI <0(2), _9(3)> > > As advised earlier, patch 3 adds phiopt support to remove this. So if DOM or some other pass fixed up the assignment to c_3 I'd hope we wouldn't set maybe_zero. So I'd like to start by first determining if we should have already simplified the COND_EXPR in bb3. Do you have a testcase for that? jeff
[PATCH 0/3][POPCOUNT]
When we set niter with maybe_zero, currently final_value_relacement will not happen due to expression_expensive_p not handling. Patch 1 adds this. With that we have the following optimized gimple. [local count: 118111601]: if (b_4(D) != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: _2 = (unsigned long) b_4(D); _9 = __builtin_popcountl (_2); c_3 = b_4(D) != 0 ? _9 : 1; [local count: 118111601]: # c_12 = PHI I assume that 1 in b_4(D) != 0 ? _9 : 1; is OK (?) because when the latch execute zero times for b_4 == 0 means that the body will execute ones. The issue here is, since we are checking if (b_4(D) != 0) before entering the loop means we don't need to set maybe_zero. Patch 2 handles this. With that we have [local count: 118111601]: if (b_4(D) != 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: _2 = (unsigned long) b_4(D); _9 = __builtin_popcountl (_2); [local count: 118111601]: # c_12 = PHI <0(2), _9(3)> As advised earlier, patch 3 adds phiopt support to remove this. Bootstrap and regression testing are ongoing. Is this OK for trunk. Thanks, Kugan