Re: [PATCH] PR tree-optimization/100081 - Limit depth of logical expression windback.
On 4/19/2021 1:40 AM, Jakub Jelinek via Gcc-patches wrote: On Sun, Apr 18, 2021 at 08:11:21PM -0400, Andrew MacLeod via Gcc-patches wrote: --- a/gcc/gimple-range-gori.cc +++ b/gcc/gimple-range-gori.cc @@ -29,6 +29,36 @@ along with GCC; see the file COPYING3. If not see #include "gimple-pretty-print.h" #include "gimple-range.h" +// Limit the nested depth thru logical expressions which GORI will build +// def chains. +#define LOGICAL_LIMIT 6 Such limits should be in params.def so that users can override them. +// Return TRUE if GS is a logical && or || expression. + +static inline bool +is_gimple_logical_p (const gimple *gs) +{ + // Look for boolean and/or condition. + if (gimple_code (gs) == GIMPLE_ASSIGN) if (is_gimple_assign (gs)) is the normal spelling of this check. But more importantly, isn't 6 too low for logicals, and wouldn't it be better to put a cap not on the number of seen logicals, but on how many SSA_NAMEs are handled in a query? Because it is easy to construct testcases where the logical limit would not trigger but the operands of comparisons used in logicals would be thousands of arithmetic/bitwise statements etc. And it isn't just artificial, we have various examples of generated tests in bugzilla that typically can't be compiled in reasonable time at -O2 and need to use -O1 and have huge basic blocks with very deep chains. FWIW, the DOM code which tries to do similar things has a 4 level recursion limit which seemed to catch the vast majority of cases. That translates into ~8 operands most of the time. So Andrew's check seems to be in the right ballpark (it's doing something slightly different, but I think it's close enough to be comparable). jeff
Re: [Patch] PR tree-optimization/100081 - Limit depth of logical expression windback.
On 4/19/21 4:18 AM, Richard Biener wrote: On Sun, Apr 18, 2021 at 4:40 AM Andrew MacLeod via Gcc-patches wrote: I also disable the logical cache since it isn't really doing anything any more. OK for trunk? Please make +#define LOGICAL_LIMIT 6 a --param (params.opt, documented in invoke.texi) OK with that change. Thanks, Richard. I presume this is still OK? This is based off the second variant of the patch which simply doesn't build the dependency chains thru too many logicals, along with: a) the removal of the logical cache like in the first patch, b) the suggested addition of the --param option for the constant, and c) Jakubs suggestion to use is_gimple-assign (). I'll check it in once it finishes the build/regression tests. Andrew commit 09b8f25e6e32534a904ce7bfe94ae97bf93d77c3 Author: Andrew MacLeod Date: Fri Apr 16 17:08:51 2021 -0400 Limit depth of logical expression windback. Limit how many logical expressions GORI will look back through when evaluating outgoing edge range. PR tree-optimization/100081 * gimple_range_cache.h (ranger_cache): Inherit from gori_compute rather than gori_compute_cache. * gimple_range_gori.cc (is_gimple_logical_p): Move to top of file. (range_def_chain::m_logical_depth): New member. (range_def_chain::range_def_chain): Initialize m_logical_depth. (range_def_chain::get_def_chain): Don't build defchains through more than LOGICAL_LIMIT logical expressions. * params.opt (param_ranger_logical_depth): New. diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h index c98e9871734..2b36a02654b 100644 --- a/gcc/gimple-range-cache.h +++ b/gcc/gimple-range-cache.h @@ -87,7 +87,7 @@ private: // them available for gori-computes to query so outgoing edges can be // properly calculated. -class ranger_cache : public gori_compute_cache +class ranger_cache : public gori_compute { public: ranger_cache (class gimple_ranger ); diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc index 7f7f3dc0d69..420282deb2d 100644 --- a/gcc/gimple-range-gori.cc +++ b/gcc/gimple-range-gori.cc @@ -29,6 +29,32 @@ along with GCC; see the file COPYING3. If not see #include "gimple-pretty-print.h" #include "gimple-range.h" +// Return TRUE if GS is a logical && or || expression. + +static inline bool +is_gimple_logical_p (const gimple *gs) +{ + // Look for boolean and/or condition. + if (is_gimple_assign (gs)) +switch (gimple_expr_code (gs)) + { + case TRUTH_AND_EXPR: + case TRUTH_OR_EXPR: + return true; + + case BIT_AND_EXPR: + case BIT_IOR_EXPR: + // Bitwise operations on single bits are logical too. + if (types_compatible_p (TREE_TYPE (gimple_assign_rhs1 (gs)), + boolean_type_node)) + return true; + break; + + default: + break; + } + return false; +} /* RANGE_DEF_CHAIN is used to determine what SSA names in a block can have range information calculated for them, and what the @@ -76,6 +102,7 @@ public: private: vec m_def_chain; // SSA_NAME : def chain components. void build_def_chain (tree name, bitmap result, basic_block bb); + int m_logical_depth; }; @@ -85,6 +112,7 @@ range_def_chain::range_def_chain () { m_def_chain.create (0); m_def_chain.safe_grow_cleared (num_ssa_names); + m_logical_depth = 0; } // Destruct a range_def_chain. @@ -157,6 +185,7 @@ range_def_chain::get_def_chain (tree name) { tree ssa1, ssa2, ssa3; unsigned v = SSA_NAME_VERSION (name); + bool is_logical = false; // If it has already been processed, just return the cached value. if (has_def_chain (name)) @@ -169,6 +198,15 @@ range_def_chain::get_def_chain (tree name) gimple *stmt = SSA_NAME_DEF_STMT (name); if (gimple_range_handler (stmt)) { + is_logical = is_gimple_logical_p (stmt); + // Terminate the def chains if we see too many cascading logical stmts. + if (is_logical) + { + if (m_logical_depth == param_ranger_logical_depth) + return NULL; + m_logical_depth++; + } + ssa1 = gimple_range_ssa_p (gimple_range_operand1 (stmt)); ssa2 = gimple_range_ssa_p (gimple_range_operand2 (stmt)); ssa3 = NULL_TREE; @@ -195,6 +233,9 @@ range_def_chain::get_def_chain (tree name) if (ssa3) build_def_chain (ssa3, m_def_chain[v], bb); + if (is_logical) +m_logical_depth--; + // If we run into pathological cases where the defintion chains are // huge (ie huge basic block fully unrolled) we might be able to limit // this by deciding here that if some criteria is satisfied, we change the @@ -562,32 +603,6 @@ gori_compute::compute_operand_range_switch (irange , gswitch *s, return false; } -// Return TRUE if GS is a logical && or || expression. - -static inline bool -is_gimple_logical_p (const gimple *gs) -{ - // Look for boolean and/or condition. - if (gimple_code (gs) == GIMPLE_ASSIGN) -switch
Re: [PATCH] PR tree-optimization/100081 - Limit depth of logical expression windback.
On 4/19/21 3:40 AM, Jakub Jelinek wrote: On Sun, Apr 18, 2021 at 08:11:21PM -0400, Andrew MacLeod via Gcc-patches wrote: --- a/gcc/gimple-range-gori.cc +++ b/gcc/gimple-range-gori.cc @@ -29,6 +29,36 @@ along with GCC; see the file COPYING3. If not see #include "gimple-pretty-print.h" #include "gimple-range.h" +// Limit the nested depth thru logical expressions which GORI will build +// def chains. +#define LOGICAL_LIMIT 6 Such limits should be in params.def so that users can override them. +// Return TRUE if GS is a logical && or || expression. + +static inline bool +is_gimple_logical_p (const gimple *gs) +{ + // Look for boolean and/or condition. + if (gimple_code (gs) == GIMPLE_ASSIGN) if (is_gimple_assign (gs)) is the normal spelling of this check. But more importantly, isn't 6 too low for logicals, and wouldn't it be better to put a cap not on the number of seen logicals, but on how many SSA_NAMEs are handled in a query? Because it is easy to construct testcases where the logical limit would not trigger but the operands of comparisons used in logicals would be thousands of arithmetic/bitwise statements etc. And it isn't just artificial, we have various examples of generated tests in bugzilla that typically can't be compiled in reasonable time at -O2 and need to use -O1 and have huge basic blocks with very deep chains. The chains aren't typically the issue, its a linear calculation.. I gave a small example in the PR of why the logical expressions cause an exponential growth. Most values are only ever calculated once and are cached. the only time we do this backward calculation is when a range is changed by the condition at the bottom of a block, and used outside the block and therefore the edge will have a different value than the range in the block. I dont think 6 logicals is too low... in fact 4 is probably plenty 99.% of the time. This is not during a forward walk... we will walk thru every logical forward and calculate ranges as appropriate.. This is only when trying to determine a refined range on an edge. so for example [local count: 1073741823]: _1 = x_20(D) == 10; _2 = y_21(D) == 10; _3 = _1 & _2; _4 = x_20(D) == 20; _5 = y_21(D) == 20; _6 = _4 & _5; _7 = x_20(D) == 30; _8 = y_21(D) == 30; _9 = _7 & _8; _10 = x_20(D) == 40; _11 = y_21(D) == 40; _12 = _10 & _11; _13 = x_20(D) == 50; _14 = y_21(D) == 50; _15 = _13 & _14; _16 = x_20(D) == 60; _17 = y_21(D) == 60; _18 = _16 & _17; _19 = _3 | _6 // depth 3 _20 = _9 | _12 // depth 3 _21 = _15 | _18 // depth 2 _22 = _19 | _20 // depth 2 _23 = _21 | _22 // depth 1 if (_23 != 0) goto ; [50.00%] else goto ; [50.00%] We can still look back thru these conditions and could find a range for x_20 or y_21 of [10,10][20,20][30,30][40,40],[50,50][60,60] because the logical nesting depth never exceeds 3 when walking from back from the branch to the use of x_20 or y_21. we still evaluate everything walking forward.. ie we get global ranges for everything in the block and evaluate the final condition based on the block values. If we added a whole bunch more logicals, we'd still do that forward processing in EVRP. This change would then say, you know, its not worth trying to calculate back thru that many logical expressions, so we'll just say x_20 and y_21 are whatever their range on entry to the block was rather than trying to refine it on an outgoing edge. so we would get less rpecise ranges for x_20 and y_21 on these edges then. Sure, a similar argument can be made that if we go through a long list of ssa_names in a straight calculation, the odds of getting a refined range decrease... but for example: a_2 = a_1 + 1 a_3 = a_2 + 1 <...> a_44 = a_43 + 1 if (a_44 > 100) goto if (a_1 > 50) asking for the range of a_1 in bb_7 causes a lookup to say "a_1 is an export and can be calculated on the edge 2->7", what is it? We will walk back from a_44 > 100, evaluating those names based on the edge value of a_44 [101, +INF] until we get to a_2 = a_1 + 1... a that point, a_2 will have an edge range of [58, +INF - 42] or something to that effect, and it will return the evaluated a_2 range on the edge as [57, +INF - 43] , and we can fold the stmt. We only do that once. bb_7 will have the value of a_1 cached since it was explicitly requested. These really long back calculations don't really happen that often, so I havent seen the need to throttle them. We also don't cache the intervening values (a_3 thru a_44) ... Odds are they don't get asked for outside the block, so it would just be a memory waste.. we only ever cache whats actually asked for. So a lot depends on whats actually requested.. most of the time, these really long blocks do not have many, if any, uses outside the block.. in which case we will never even do a wind back range request. IF we find such cases, then maybe a limit is
Re: [Patch] PR tree-optimization/100081 - Limit depth of logical expression windback.
On Sun, Apr 18, 2021 at 4:40 AM Andrew MacLeod via Gcc-patches wrote: > > The code presented at the time wrestrict was invoking ranger has a basic > block with about 7200 statements, all of them calculations which fed > into 2614 logical ORs and 1480 logical ANDs. > > the GORI component which calculates outgoing ranges starts at the exit > of the block and works it way back thru the basic block to the ssa_name > requested, solving the equations along the way. > > Logical expressions require a TRUE and FALSE results to be calculated > for the ssa_name for each of two condition register operands on the > logical expression. Sometimes we can shortcut it, but frequently it its > requires all 4 to properly evaluate. > > When logical operations feed each other, this can become exponential.. > and that was the situation here. > > We had introduced a logical cache to attempt to reduce the number of > calculations, but it was insufficient. We had previously discussed > among ourselves that limiting the depth was probably suifficient, since > the odds of being able to extract a meaningful range thru a large series > of logical operations becomes highly unlikely. > > THis patch implements the depth limiter for logical statements and sets > it at 6. so more than 6 logical statements feeding each other, and we > decide its not worth looking back any further and say this edge does not > compute a range. > > This passes bootstrap and regression testing on x86_64-pc-linux-gnu. I > also ran it against the unlimited depth version on a full set of GCC > source files, and there was 0 difference between this and the original. > > I also disable the logical cache since it isn't really doing anything > any more. > > OK for trunk? Please make +#define LOGICAL_LIMIT 6 a --param (params.opt, documented in invoke.texi) OK with that change. Thanks, Richard. > > Andrew > > > PS. first mail from a new mailer system... hopefully it formats ok.. > >
Re: [PATCH] PR tree-optimization/100081 - Limit depth of logical expression windback.
On Sun, Apr 18, 2021 at 08:11:21PM -0400, Andrew MacLeod via Gcc-patches wrote: > --- a/gcc/gimple-range-gori.cc > +++ b/gcc/gimple-range-gori.cc > @@ -29,6 +29,36 @@ along with GCC; see the file COPYING3. If not see > #include "gimple-pretty-print.h" > #include "gimple-range.h" > > +// Limit the nested depth thru logical expressions which GORI will build > +// def chains. > +#define LOGICAL_LIMIT6 Such limits should be in params.def so that users can override them. > +// Return TRUE if GS is a logical && or || expression. > + > +static inline bool > +is_gimple_logical_p (const gimple *gs) > +{ > + // Look for boolean and/or condition. > + if (gimple_code (gs) == GIMPLE_ASSIGN) if (is_gimple_assign (gs)) is the normal spelling of this check. But more importantly, isn't 6 too low for logicals, and wouldn't it be better to put a cap not on the number of seen logicals, but on how many SSA_NAMEs are handled in a query? Because it is easy to construct testcases where the logical limit would not trigger but the operands of comparisons used in logicals would be thousands of arithmetic/bitwise statements etc. And it isn't just artificial, we have various examples of generated tests in bugzilla that typically can't be compiled in reasonable time at -O2 and need to use -O1 and have huge basic blocks with very deep chains. One thing is evrp/vrp passes where we want to compute ranges of all SSA_NAMEs and if caching is done, compute range of everything at most once (or with some bounded iteration counts for loops or whatever ranger uses). And another thing is where passes ask for ranges, perhaps just those should use the limit or should use lower limit. Jakub
[PATCH] PR tree-optimization/100081 - Limit depth of logical expression windback.
The code presented at the time wrestrict was invoking ranger has a basic block with about 7200 statements, all of them calculations which fed into 2614 logical ORs and 1480 logical ANDs. the GORI component which calculates outgoing ranges starts at the exit of the block and works it way back thru the basic block to the ssa_name requested, solving the equations along the way. Logical expressions require a TRUE and FALSE results to be calculated for the ssa_name for each of two condition register operands on the logical expression. Sometimes we can shortcut it, but frequently it its requires all 4 to properly evaluate. When logical operations feed each other, this can become exponential.. and that was the situation here. We had introduced a logical cache to attempt to reduce the number of calculations, but it was insufficient in this case. We had previously discussed among ourselves that limiting the depth was probably sufficient, since the odds of being able to extract a meaningful range thru a large series of logical operations becomes highly unlikely. This patch implements the depth limiter for logical statements and sets it at 6. If there are more than 6 logical statements feeding each other we decide its not worth looking back any further... and say this edge does not compute a range for that SSA_NAME. This is different than the original patch I proposed in the PR, but accomplishes the same thing. The original patch did the depth limit check at the time of evaluating an outgoing range. This patch moves the check earlier when building the def chains, so the names never even appear as possible exports. The previous patch was still building these def chains a few thousand SSA names deep, but never using them due to the cutoff. This patch will also therefore reduce the size of those dependency bitmaps which are not going to be used in either case. This passes bootstrap and regression testing on x86_64-pc-linux-gnu. I also ran it against the unlimited depth version on a full set of GCC source files, and there was 0 difference between this and the original in the cases ranger gets. OK for trunk? Andrew PS. Second try with a new mailer system... hopefully this time... commit a36a130a1305bc2c103e47955354c0f0edda47f4 Author: Andrew MacLeod Date: Fri Apr 16 17:08:51 2021 -0400 Limit depth of logical expression windback. Limit how many logical expressions GORI will look back through when evaluating outgoing edge range. PR tree-optimization/100081 * gimple_range_gori.cc (LOGICAL_LIMIT): New local definition. (is_gimple_logical_p): Move to top of file. (range_def_chain::m_logical_depth): New member. (range_def_chain::range_def_chain): Initialize m_logical_depth. (range_def_chain::get_def_chain): Don't build defchains through more than LOGICAL_LIMIT logical expressions. diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc index 7f7f3dc0d69..4be49369bbd 100644 --- a/gcc/gimple-range-gori.cc +++ b/gcc/gimple-range-gori.cc @@ -29,6 +29,36 @@ along with GCC; see the file COPYING3. If not see #include "gimple-pretty-print.h" #include "gimple-range.h" +// Limit the nested depth thru logical expressions which GORI will build +// def chains. +#define LOGICAL_LIMIT 6 + +// Return TRUE if GS is a logical && or || expression. + +static inline bool +is_gimple_logical_p (const gimple *gs) +{ + // Look for boolean and/or condition. + if (gimple_code (gs) == GIMPLE_ASSIGN) +switch (gimple_expr_code (gs)) + { + case TRUTH_AND_EXPR: + case TRUTH_OR_EXPR: + return true; + + case BIT_AND_EXPR: + case BIT_IOR_EXPR: + // Bitwise operations on single bits are logical too. + if (types_compatible_p (TREE_TYPE (gimple_assign_rhs1 (gs)), + boolean_type_node)) + return true; + break; + + default: + break; + } + return false; +} /* RANGE_DEF_CHAIN is used to determine what SSA names in a block can have range information calculated for them, and what the @@ -76,6 +106,7 @@ public: private: vec m_def_chain; // SSA_NAME : def chain components. void build_def_chain (tree name, bitmap result, basic_block bb); + int m_logical_depth; }; @@ -85,6 +116,7 @@ range_def_chain::range_def_chain () { m_def_chain.create (0); m_def_chain.safe_grow_cleared (num_ssa_names); + m_logical_depth = 0; } // Destruct a range_def_chain. @@ -157,6 +189,7 @@ range_def_chain::get_def_chain (tree name) { tree ssa1, ssa2, ssa3; unsigned v = SSA_NAME_VERSION (name); + bool is_logical = false; // If it has already been processed, just return the cached value. if (has_def_chain (name)) @@ -169,6 +202,15 @@ range_def_chain::get_def_chain (tree name) gimple *stmt = SSA_NAME_DEF_STMT (name); if (gimple_range_handler (stmt)) { + is_logical = is_gimple_logical_p (stmt); + // Terminate the def
[Patch] PR tree-optimization/100081 - Limit depth of logical expression windback.
The code presented at the time wrestrict was invoking ranger has a basic block with about 7200 statements, all of them calculations which fed into 2614 logical ORs and 1480 logical ANDs. the GORI component which calculates outgoing ranges starts at the exit of the block and works it way back thru the basic block to the ssa_name requested, solving the equations along the way. Logical expressions require a TRUE and FALSE results to be calculated for the ssa_name for each of two condition register operands on the logical expression. Sometimes we can shortcut it, but frequently it its requires all 4 to properly evaluate. When logical operations feed each other, this can become exponential.. and that was the situation here. We had introduced a logical cache to attempt to reduce the number of calculations, but it was insufficient. We had previously discussed among ourselves that limiting the depth was probably suifficient, since the odds of being able to extract a meaningful range thru a large series of logical operations becomes highly unlikely. THis patch implements the depth limiter for logical statements and sets it at 6. so more than 6 logical statements feeding each other, and we decide its not worth looking back any further and say this edge does not compute a range. This passes bootstrap and regression testing on x86_64-pc-linux-gnu. I also ran it against the unlimited depth version on a full set of GCC source files, and there was 0 difference between this and the original. I also disable the logical cache since it isn't really doing anything any more. OK for trunk? Andrew PS. first mail from a new mailer system... hopefully it formats ok.. commit 9d04ed1ca8c2acddd89a398d0dd96e3924dd15cb Author: Andrew MacLeod Date: Fri Apr 16 17:08:51 2021 -0400 Limit depth of logical expression windback. Limit how many exponentially expanded logical expressions GORI will look back thru when evaluating outgoing edge range. PR tree-optimization/100081 * gimple-range-cache.h (ranger_cache): Inherit from gori_compute not gori_compute_cache. * gimple_range_gori.cc (gori_compute::gori_compute): Initialize m_logical_depth. (gori_compute::compute_logical_operands): Limit depth to 6. * gimple_range_gori.h (gori_compute): Add m_logical_depth member. diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h index c98e9871734..2b36a02654b 100644 --- a/gcc/gimple-range-cache.h +++ b/gcc/gimple-range-cache.h @@ -87,7 +87,7 @@ private: // them available for gori-computes to query so outgoing edges can be // properly calculated. -class ranger_cache : public gori_compute_cache +class ranger_cache : public gori_compute { public: ranger_cache (class gimple_ranger ); diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc index 7f7f3dc0d69..6e0bd0fe6a5 100644 --- a/gcc/gimple-range-gori.cc +++ b/gcc/gimple-range-gori.cc @@ -464,6 +464,7 @@ gori_compute::gori_compute () if (bb) m_gori_map->exports (bb); } + m_logical_depth = 0; } // Destruct a gori_compute_object. @@ -851,6 +852,9 @@ gori_compute::compute_logical_operands_in_chain (tf_range , expr_range_in_bb (range.false_range, name, bb); } +// Set the maximum depth we will analyze logical combinations. +#define LOGICAL_LIMIT 6 + // Given a logical STMT, calculate true and false for each potential // path using NAME, and resolve the outcome based on the logical // operator. @@ -875,11 +879,17 @@ gori_compute::compute_logical_operands (irange , gimple *stmt, if (!op1_in_chain && !op2_in_chain) return false; + if (m_logical_depth > LOGICAL_LIMIT) +return false; + m_logical_depth++; + tf_range op1_range, op2_range; compute_logical_operands_in_chain (op1_range, stmt, lhs, name, op1, op1_in_chain); compute_logical_operands_in_chain (op2_range, stmt, lhs, name, op2, op2_in_chain); + m_logical_depth--; + return logical_combine (r, gimple_expr_code (stmt), lhs, op1_range, op2_range); } diff --git a/gcc/gimple-range-gori.h b/gcc/gimple-range-gori.h index 48c746d1f37..513c0acd11f 100644 --- a/gcc/gimple-range-gori.h +++ b/gcc/gimple-range-gori.h @@ -94,6 +94,7 @@ protected: const class tf_range _range); int_range<2> m_bool_zero; // Boolean false cached. int_range<2> m_bool_one;// Boolean true cached. + int m_logical_depth; private: bool compute_operand_range_switch (irange , gswitch *stmt,