[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193

rguenth at gcc dot gnu.org via Gcc-bugs Tue, 12 Mar 2024 02:59:44 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151


--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> ---
So what remains here is differences like

-  (chrec = {(long unsigned int) (col_stride_10 * _105), +, (long unsigned int)
col_stride_10}_2)
+  (chrec = (long unsigned int) (int) {(unsigned int) col_stride_10 * (unsigned
int) _105, +, (unsigned int) col_stride_10}_2)

where we can't pull the sign-extension inside the CHREC because it might
overflow.

And

 (set_scalar_evolution 
   instantiated_below = 22 
   (scalar = _59)
-  (scalar_evolution = {(long unsigned int) (col_stride_10 * _105) * 2, +,
(long unsigned int) col_stride_10 * 2}_2))
+  (scalar_evolution = _59))
+)

which is failure to analyze at all.  This one looks like

  <bb 4> [local count: 118111600]:
  # col_stride_10 = PHI <size_15(D)(11), 1(2)>
  if (size_15(D) > 0)
    goto <bb 21>; [89.00%]
  else
    goto <bb 5>; [11.00%]

  <bb 5> [local count: 118111600]:
  return;
...
  <bb 15> [local count: 343854870]:
  # RANGE [irange] int [0, 2147483646]
  # j_73 = PHI <_105(22), _68(19)>
...
  col_i_61 = col_stride_10 * j_73;
  # RANGE [irange] long unsigned int [0, 2147483647][18446744071562067968,
+INF]
  _60 = (long unsigned int) col_i_61;
  # RANGE [irange] long unsigned int [0, 4294967294][18446744069414584320,
18446744073709551614] MASK 0xfffffffffffffffe VALUE 0x0
  _59 = _60 * 2;

j_73 is {_105, +, 1}_2
col_i_61 is (int) {(unsigned int) col_stride_10 * (unsigned int) _105, +,
(unsigned int) col_stride_10}_2
_60 is (long unsigned int) (int) {(unsigned int) col_stride_10 * (unsigned int)
_105, +, (unsigned int) col_stride_10}_2

and on the _60 * 2 multiply we fail.  When applying Andrews proposed patch
this doesn't help since the range of col_stride_10 can only conditionally
be adjusted to positive.

SCEV caches a scalar evolution based on SSA_NAME and 'instantiated below'
block which is "block_before_loop" which is a loops preheader or the
function ENTRY block for analyses of scalars in the loop tree root.
A conservative context for analysis of the SCEV might be
 1) the definition stmt of the SSA name
 2) the instantiated-below block (on-exit ranges of it)

With doing 2) by feeding the last stmt of the block as context (when the
block is empty that won't work :/) the testcase is optimized again when
I discard the SCEV cache at the start of IVOPTs and wrap IVOPTs in a
ranger instance.

While ranger has a range_on_exit API this doesn't work on GENERIC expressions
as far as I can see but only SSA names but I guess that could be "fixed"
given range_on_exit also looks at the last stmt and eventually defers to
range_of_expr (or range_on_entry), but possibly get_tree_range needs
variants for on_entry/on_exit (it doesn't seem to use it's 'stmt' context
very consistently, notably not for SSA_NAMEs ...).

Interestingly enough we somehow still need the

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index c16b776c1e3..c0eda5fc51d 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -102,7 +102,15 @@ gimple_ranger::range_of_expr (vrange &r, tree expr, gimple
*stmt)
   if (!stmt)
     {
       Value_Range tmp (TREE_TYPE (expr));
-      m_cache.get_global_range (r, expr);
+      // If there is no global range for EXPR yet, try to evaluate it.
+      // THis call does set R to a global range regardless.
+      if (!m_cache.get_global_range (r, expr))
+       {
+         gimple *s = SSA_NAME_DEF_STMT (expr);
+         // Calculate a range for S if it is safe to do so.
+         if (s && gimple_bb (s) && gimple_get_lhs (s) == expr)
+           return range_of_stmt (r, s);
+       }
       // Pick up implied context information from the on-entry cache
       // if current_bb is set.  Do not attempt any new calculations.
       if (current_bb && m_cache.block_range (tmp, current_bb, expr, false))

hunk of Andrews patch to do it :/

There's one other detail - the problematical multiply folding is
col_stride_10 * {_105, +, 1}_2
I'm thinking that similar to CHREC_LEFT == 0 we can handle CHREC_RIGHT == 1
without unsigned promotion.  In the second iteration we are replacing
(_105 + 1) * col_stride_10 with _105 * col_stride_10 + col_stride_10
but we know already that _105 * col_stride_10 doesn't overflow as we
computed that in the first iteration.  And 1 * X never overflows.
The third iteration is problematic - we don't know whether 2 * col_stride_10
overflows if _105 was zero, if it was not it might have been -1 which
means the second iteration computed 0 * col_stride_10 originally.  Hmm,
so _105 == -1 is problematic, so no - I don't think we can handle
CHREC_RIGHT == 1 specially.

[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193

Reply via email to