https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151
--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> --- So what remains here is differences like - (chrec = {(long unsigned int) (col_stride_10 * _105), +, (long unsigned int) col_stride_10}_2) + (chrec = (long unsigned int) (int) {(unsigned int) col_stride_10 * (unsigned int) _105, +, (unsigned int) col_stride_10}_2) where we can't pull the sign-extension inside the CHREC because it might overflow. And (set_scalar_evolution instantiated_below = 22 (scalar = _59) - (scalar_evolution = {(long unsigned int) (col_stride_10 * _105) * 2, +, (long unsigned int) col_stride_10 * 2}_2)) + (scalar_evolution = _59)) +) which is failure to analyze at all. This one looks like <bb 4> [local count: 118111600]: # col_stride_10 = PHI <size_15(D)(11), 1(2)> if (size_15(D) > 0) goto <bb 21>; [89.00%] else goto <bb 5>; [11.00%] <bb 5> [local count: 118111600]: return; ... <bb 15> [local count: 343854870]: # RANGE [irange] int [0, 2147483646] # j_73 = PHI <_105(22), _68(19)> ... col_i_61 = col_stride_10 * j_73; # RANGE [irange] long unsigned int [0, 2147483647][18446744071562067968, +INF] _60 = (long unsigned int) col_i_61; # RANGE [irange] long unsigned int [0, 4294967294][18446744069414584320, 18446744073709551614] MASK 0xfffffffffffffffe VALUE 0x0 _59 = _60 * 2; j_73 is {_105, +, 1}_2 col_i_61 is (int) {(unsigned int) col_stride_10 * (unsigned int) _105, +, (unsigned int) col_stride_10}_2 _60 is (long unsigned int) (int) {(unsigned int) col_stride_10 * (unsigned int) _105, +, (unsigned int) col_stride_10}_2 and on the _60 * 2 multiply we fail. When applying Andrews proposed patch this doesn't help since the range of col_stride_10 can only conditionally be adjusted to positive. SCEV caches a scalar evolution based on SSA_NAME and 'instantiated below' block which is "block_before_loop" which is a loops preheader or the function ENTRY block for analyses of scalars in the loop tree root. A conservative context for analysis of the SCEV might be 1) the definition stmt of the SSA name 2) the instantiated-below block (on-exit ranges of it) With doing 2) by feeding the last stmt of the block as context (when the block is empty that won't work :/) the testcase is optimized again when I discard the SCEV cache at the start of IVOPTs and wrap IVOPTs in a ranger instance. While ranger has a range_on_exit API this doesn't work on GENERIC expressions as far as I can see but only SSA names but I guess that could be "fixed" given range_on_exit also looks at the last stmt and eventually defers to range_of_expr (or range_on_entry), but possibly get_tree_range needs variants for on_entry/on_exit (it doesn't seem to use it's 'stmt' context very consistently, notably not for SSA_NAMEs ...). Interestingly enough we somehow still need the diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc index c16b776c1e3..c0eda5fc51d 100644 --- a/gcc/gimple-range.cc +++ b/gcc/gimple-range.cc @@ -102,7 +102,15 @@ gimple_ranger::range_of_expr (vrange &r, tree expr, gimple *stmt) if (!stmt) { Value_Range tmp (TREE_TYPE (expr)); - m_cache.get_global_range (r, expr); + // If there is no global range for EXPR yet, try to evaluate it. + // THis call does set R to a global range regardless. + if (!m_cache.get_global_range (r, expr)) + { + gimple *s = SSA_NAME_DEF_STMT (expr); + // Calculate a range for S if it is safe to do so. + if (s && gimple_bb (s) && gimple_get_lhs (s) == expr) + return range_of_stmt (r, s); + } // Pick up implied context information from the on-entry cache // if current_bb is set. Do not attempt any new calculations. if (current_bb && m_cache.block_range (tmp, current_bb, expr, false)) hunk of Andrews patch to do it :/ There's one other detail - the problematical multiply folding is col_stride_10 * {_105, +, 1}_2 I'm thinking that similar to CHREC_LEFT == 0 we can handle CHREC_RIGHT == 1 without unsigned promotion. In the second iteration we are replacing (_105 + 1) * col_stride_10 with _105 * col_stride_10 + col_stride_10 but we know already that _105 * col_stride_10 doesn't overflow as we computed that in the first iteration. And 1 * X never overflows. The third iteration is problematic - we don't know whether 2 * col_stride_10 overflows if _105 was zero, if it was not it might have been -1 which means the second iteration computed 0 * col_stride_10 originally. Hmm, so _105 == -1 is problematic, so no - I don't think we can handle CHREC_RIGHT == 1 specially.