[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-07-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932

--- Comment #17 from GCC Commits  ---
The master branch has been updated by Tamar Christina :

https://gcc.gnu.org/g:735edbf1e2479fa2323a2b4a9714fae1a0925f74

commit r15-1809-g735edbf1e2479fa2323a2b4a9714fae1a0925f74
Author: Tamar Christina 
Date:   Wed Jul 3 09:31:09 2024 +0100

ivopts: replace constant_multiple_of with
aff_combination_constant_multiple_p [PR114932]

The current implementation of constant_multiple_of is doing a more limited
version of aff_combination_constant_multiple_p.

The only non-debug usage of constant_multiple_of will proceed with the
values
as affine trees.  There is scope for further optimization here, namely I
believe
that if constant_multiple_of returns the aff_tree after the conversion then
get_computation_aff_1 can use it instead of manually creating the aff_tree.

However I think it makes sense to first commit this smaller change and then
incrementally change things.

gcc/ChangeLog:

PR tree-optimization/114932
* tree-ssa-loop-ivopts.cc (constant_multiple_of): Use
aff_combination_constant_multiple_p instead.

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-07-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932

--- Comment #16 from GCC Commits  ---
The master branch has been updated by Tamar Christina :

https://gcc.gnu.org/g:25127123100f04c2d5d70c6933a5f5aedcd69c40

commit r15-1808-g25127123100f04c2d5d70c6933a5f5aedcd69c40
Author: Tamar Christina 
Date:   Wed Jul 3 09:30:28 2024 +0100

ivopts: fix wide_int_constant_multiple_p when VAL and DIV are 0. 
[PR114932]

wide_int_constant_multiple_p tries to check if for two tree expressions a
and b
that there is a multiplier which makes a == b * c.

This code however seems to think that there's no c where a=0 and b=0 are
equal
which is of course wrong.

This fixes it and also fixes the comment.

gcc/ChangeLog:

PR tree-optimization/114932
* tree-affine.cc (wide_int_constant_multiple_p): Support 0 and 0
being
multiples.

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932

--- Comment #15 from Tamar Christina  ---
(In reply to rguent...@suse.de from comment #14)
> On Thu, 6 Jun 2024, tnfchris at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932
> > 
> > --- Comment #13 from Tamar Christina  ---
> > (In reply to rguent...@suse.de from comment #12) 
> > > > since we don't care about overflow here, it looks like the stripping 
> > > > should
> > > > be recursive as long as it's a NOP expression between two integral 
> > > > types.
> > > > 
> > > > That would get them to hash to the same IV expression.  Trying now..
> > > 
> > > Note tree-affine is a tool that's used for this kind of "weak" equalities.
> > > Convert both to affine, subtract them and if that's zero they are equal.
> > 
> > Hmm that's useful, though in this case this becomes the actual expression 
> > that
> > IVOpts uses.
> > 
> > For instance this is done in alloc_iv and add_iv_candidate_for_use when
> > determining the uses for the IV.
> > 
> > It looks like it's trying to force a canonical representation with as 
> > minimum
> > casting as possible.
> > 
> > would the "affine"'ed tree be safe to use for this context?
> 
> No, I'd use that just for the comparison.
> 
> > What I've done currently is make a STRIP_ALL_NOPS that recursively strips 
> > NOPs
> > for PLUS/MULT/MINUS.
> 
> But the stripped expression isn't necessarily valid to use either because
> of possible undefined overflow.  It's probably safe to pick any of the
> existing expressions (all are evaluated at each iteration), but if you
> strip all NOPs from all of them you might end up with new undefined
> behavior.
> 
> At least if that stripped expression is inserted somewhere or other
> new expressions are built upon it.

does overflow matter for addressing mode though? if you have undefined behavior
in your address space then your program would have crashed anyway no?

In this case IVOpts would have already stripped away the outer NOPs so building
upon this one could also cause undefined overflow can it not? i.e. if the IV
was
((signed) unsigned_calculation.)

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-06 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932

--- Comment #14 from rguenther at suse dot de  ---
On Thu, 6 Jun 2024, tnfchris at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932
> 
> --- Comment #13 from Tamar Christina  ---
> (In reply to rguent...@suse.de from comment #12) 
> > > since we don't care about overflow here, it looks like the stripping 
> > > should
> > > be recursive as long as it's a NOP expression between two integral types.
> > > 
> > > That would get them to hash to the same IV expression.  Trying now..
> > 
> > Note tree-affine is a tool that's used for this kind of "weak" equalities.
> > Convert both to affine, subtract them and if that's zero they are equal.
> 
> Hmm that's useful, though in this case this becomes the actual expression that
> IVOpts uses.
> 
> For instance this is done in alloc_iv and add_iv_candidate_for_use when
> determining the uses for the IV.
> 
> It looks like it's trying to force a canonical representation with as minimum
> casting as possible.
> 
> would the "affine"'ed tree be safe to use for this context?

No, I'd use that just for the comparison.

> What I've done currently is make a STRIP_ALL_NOPS that recursively strips NOPs
> for PLUS/MULT/MINUS.

But the stripped expression isn't necessarily valid to use either because
of possible undefined overflow.  It's probably safe to pick any of the
existing expressions (all are evaluated at each iteration), but if you
strip all NOPs from all of them you might end up with new undefined
behavior.

At least if that stripped expression is inserted somewhere or other
new expressions are built upon it.

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-06 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932

--- Comment #13 from Tamar Christina  ---
(In reply to rguent...@suse.de from comment #12) 
> > since we don't care about overflow here, it looks like the stripping should
> > be recursive as long as it's a NOP expression between two integral types.
> > 
> > That would get them to hash to the same IV expression.  Trying now..
> 
> Note tree-affine is a tool that's used for this kind of "weak" equalities.
> Convert both to affine, subtract them and if that's zero they are equal.

Hmm that's useful, though in this case this becomes the actual expression that
IVOpts uses.

For instance this is done in alloc_iv and add_iv_candidate_for_use when
determining the uses for the IV.

It looks like it's trying to force a canonical representation with as minimum
casting as possible.

would the "affine"'ed tree be safe to use for this context?

What I've done currently is make a STRIP_ALL_NOPS that recursively strips NOPs
for PLUS/MULT/MINUS.

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-06 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932

--- Comment #12 from rguenther at suse dot de  ---
On Wed, 5 Jun 2024, tnfchris at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932
> 
> --- Comment #11 from Tamar Christina  ---
> (In reply to Richard Biener from comment #10)
> > I think the question is why IVOPTs ends up using both the signed and
> > unsigned variant of the same IV instead of expressing all uses of both with
> > one IV?
> > 
> > That's where I'd look into.
> 
> It looks like this is because of a subtle difference in the expressions.
> 
> In get_loop_invariant_expr IVOPTs first tries to strip away all casts with
> STRIP_NOPS.
> 
> The first expression is (unsigned long) (stride.3_27 * 4) and the second
> expression is ((unsigned long) stride.3_27) * 4 (The pretty printing here is
> pretty bad...)
> 
> So the first one becomes:
>   (unsigned long) (stride.3_27 * 4) -> stride.3_27 * 4
> 
> and second one:
>   ((unsigned long) stride.3_27) * 4 -> ((unsigned long) stride.3_27) * 4
> 
> since we don't care about overflow here, it looks like the stripping should
> be recursive as long as it's a NOP expression between two integral types.
> 
> That would get them to hash to the same IV expression.  Trying now..

Note tree-affine is a tool that's used for this kind of "weak" equalities.
Convert both to affine, subtract them and if that's zero they are equal.

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932

--- Comment #11 from Tamar Christina  ---
(In reply to Richard Biener from comment #10)
> I think the question is why IVOPTs ends up using both the signed and
> unsigned variant of the same IV instead of expressing all uses of both with
> one IV?
> 
> That's where I'd look into.

It looks like this is because of a subtle difference in the expressions.

In get_loop_invariant_expr IVOPTs first tries to strip away all casts with
STRIP_NOPS.

The first expression is (unsigned long) (stride.3_27 * 4) and the second
expression is ((unsigned long) stride.3_27) * 4 (The pretty printing here is
pretty bad...)

So the first one becomes:
  (unsigned long) (stride.3_27 * 4) -> stride.3_27 * 4

and second one:
  ((unsigned long) stride.3_27) * 4 -> ((unsigned long) stride.3_27) * 4

since we don't care about overflow here, it looks like the stripping should
be recursive as long as it's a NOP expression between two integral types.

That would get them to hash to the same IV expression.  Trying now..

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932

--- Comment #10 from Richard Biener  ---
I think the question is why IVOPTs ends up using both the signed and unsigned
variant of the same IV instead of expressing all uses of both with one IV?

That's where I'd look into.

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932

--- Comment #9 from Tamar Christina  ---
It's taken me a bit of time to track down all the reasons for the speedup with
the earlier patch.

This comes from two parts:

1. Signed IVs don't get simplified.  Due to possible UB with signed overflows
gimple expressions don't get simplified when the type is signed.

However for addressing modes it doesn't matter as simplifying the constants any
potential overflow can still happen.  Secondly most architectures say you can
never reach the full address space range anyway.  Those that due (like those
that offer baremetal variants like Arm and AArch64) explicitly specify that
overflow is defined as wrapping around.  That means that IVs for their use in
IV opts should be save to simplify as if they were unsigned.

I have a patch that during the creation of IV candidates folds them to unsigned
and then folds them back to their original signed types.  This maintains all
the original overflow analysis and the correct typing in gimple.

2. The second problem is that due to Fortran not having unsigned types, the
front-end generates a signed IV.  Some optimizations as they work can convert
these to unsigned due to folding, e.g. extract_muldiv is one place where this
is done.

This can make us end up having the same IV as both signed and unsigned, as is
the case here:

:   
   
   
 inv_expr 1: stride.3_27 * 4   
   
   
  inv_expr 2:
(unsigned long) stride.3_27 * 4   

These end up being used in the same group:

Group 1:   
   
   
   cand  costcompl.  inv.expr.   inv.vars  
   
   
1 0   0
  NIL;6
   
   
 2 0   0   NIL;6   
   
   
  3 0   0   NIL;6  
   
   
   4 0 
 0   NIL;6 

which ends up with IV opts picking the signed and unsigned IVs:

Improved to:
  cost: 24 (complexity 3)
  reg_cost: 9
  cand_cost: 15
  cand_group_cost: 0 (complexity 3)
  candidates: 1, 6, 8
   group:0 --> iv_cand:6, cost=(0,1)
   group:1 --> iv_cand:1, cost=(0,0)
   group:2 --> iv_cand:8, cost=(0,1)
   group:3 --> iv_cand:8, cost=(0,1)
  invariant variables: 6
  invariant expressions: 1, 2

and so generates the same IV as both signed and unsigned:

;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
58.2545), maybe hot
   
 ;;prev block 28, next block 31, flags:
(NEW, REACHABLE, VISITED)
;;pred:   28 [always]  count:23622320 (estimated locally, freq 6.4080)
(FALLTHRU,EXECUTABLE)  
   
  ;;25 [always]  count:191126046
(estimated locally, freq 51.8465) (FALLTHRU,DFS_BACK,EXECUTABLE)
  # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
  # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
  # ivtmp.26_51 = PHI 
  # ivtmp.28_90 = PHI 

...

;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
58.2545), maybe hot
   
 ;;prev block 22,