[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-24 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

Andrew Macleod  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #42 from Andrew Macleod  ---
I think we can close this now, I think everything we plan to do has been done.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #41 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:257c2be7ff8dfdc610202a1e1f5a8a668b939bdb

commit r14-1165-g257c2be7ff8dfdc610202a1e1f5a8a668b939bdb
Author: Andrew MacLeod 
Date:   Tue May 23 15:41:03 2023 -0400

Only update global value if it changes.

Do not update and propagate a global value if it hasn't changed.

PR tree-optimization/109695
* gimple-range-cache.cc (ranger_cache::get_global_range): Add
changed param.
* gimple-range-cache.h (ranger_cache::get_global_range): Ditto.
* gimple-range.cc (gimple_ranger::range_of_stmt): Pass changed
flag to set_global_range.
(gimple_ranger::prefill_stmt_dependencies): Ditto.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #40 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:cfd6569e9c41181231a8427235d0c0a7ad9262e4

commit r14-1164-gcfd6569e9c41181231a8427235d0c0a7ad9262e4
Author: Andrew MacLeod 
Date:   Tue May 23 15:20:56 2023 -0400

Use negative values to reflect always_current in the temporal cache.

Instead of using 0, use negative timestamps to reflect always_current
state.
If the value doesn't change, keep the timestamp rather than creating a new
one and invalidating any dependencies.

PR tree-optimization/109695
* gimple-range-cache.cc (temporal_cache::temporal_value): Return
a positive int.
(temporal_cache::current_p): Check always_current method.
(temporal_cache::set_always_current): Add param and set value
appropriately.
(temporal_cache::always_current_p): New.
(ranger_cache::get_global_range): Adjust.
(ranger_cache::set_global_range): set always current first.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #39 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:d8b058d3ca4ebbef5575105164417f125696f5ce

commit r14-1163-gd8b058d3ca4ebbef5575105164417f125696f5ce
Author: Andrew MacLeod 
Date:   Tue May 23 15:11:44 2023 -0400

Choose better initial values for ranger.

Instead of defaulting to VARYING, fold the stmt using just global ranges.

PR tree-optimization/109695
* gimple-range-cache.cc (ranger_cache::get_global_range): Call
fold_range with global query to choose an initial value.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-23 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #38 from Aldy Hernandez  ---
(In reply to Andrew Macleod from comment #37)
> (In reply to CVS Commits from comment #36)
> 
> > For the curious, a particular hot spot for IPA in this area was:
> > 
> > ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
> > {
> > ...
> > ...
> >   value_range save (m_vr);
> >   m_vr.union_ (*other_vr);
> >   return m_vr != save;
> > }
> > 
> > The problem isn't the resizing (since we do that at most once) but the
> > fact that for some functions with lots of callers we end up a huge
> > range that gets copied and compared for every meet operation.  Maybe
> > the IPA algorithm could be adjusted somehow??.
> 
> isn't the the same thing as 
> 
>   return m_vr.union_ (*other_vr);
> 
> which should be much faster without all the copying... ?

Indeed.  That is in the committed version of the patch.  I just forgot to
update the message.

The reason I didn't originally do as you suggested was that there was a bug in
the union code that returned TRUE for a union that hadn't changed anything. 
That has also been fixed in trunk.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-23 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #37 from Andrew Macleod  ---
(In reply to CVS Commits from comment #36)

> For the curious, a particular hot spot for IPA in this area was:
> 
> ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
> {
> ...
> ...
>   value_range save (m_vr);
>   m_vr.union_ (*other_vr);
>   return m_vr != save;
> }
> 
> The problem isn't the resizing (since we do that at most once) but the
> fact that for some functions with lots of callers we end up a huge
> range that gets copied and compared for every meet operation.  Maybe
> the IPA algorithm could be adjusted somehow??.

isn't the the same thing as 

  return m_vr.union_ (*other_vr);

which should be much faster without all the copying... ?

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #36 from CVS Commits  ---
The master branch has been updated by Aldy Hernandez :

https://gcc.gnu.org/g:76e11280e79c5dd5089c17d5726cae9a5a21bc2e

commit r14-862-g76e11280e79c5dd5089c17d5726cae9a5a21bc2e
Author: Aldy Hernandez 
Date:   Mon May 15 12:25:58 2023 +0200

Add auto-resizing capability to irange's [PR109695]


We can now have int_range for automatically
resizable ranges.  int_range_max is now int_range<3, true>
for a 69X reduction in size from current trunk, and 6.9X reduction from
GCC12.  This incurs a 5% performance penalty for VRP that is more than
covered by our > 13% improvements recently.


int_range_max is the temporary range object we use in the ranger for
integers.  With the conversion to wide_int, this structure bloated up
significantly because wide_ints are huge (80 bytes a piece) and are
about 10 times as big as a plain tree.  Since the temporary object
requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word.
This means the structure grew from 4112 bytes to 40912 bytes.

This patch adds the ability to resize ranges as needed, defaulting to
no resizing, while int_range_max now defaults to 3 sub-ranges (instead
of 255) and grows to 255 when the range being calculated does not fit.

For example:

int_range<1> foo;   // 1 sub-range with no resizing.
int_range<5> foo;   // 5 sub-ranges with no resizing.
int_range<5, true> foo; // 5 sub-ranges with resizing.

I ran some tests and found that 3 sub-ranges cover 99% of cases, so
I've set the int_range_max default to that:

typedef int_range<3, /*RESIZABLE=*/true> int_range_max;

We don't bother growing incrementally, since the default covers most
cases and we have a 255 hard-limit.  This hard limit could be reduced
to 128, since my tests never saw a range needing more than 124, but we
could do that as a follow-up if needed.

With 3-subranges, int_range_max is now 592 bytes versus 40912 for
trunk, and versus 4112 bytes for GCC12!  The penalty is 5.04% for VRP
and 3.02% for threading, with no noticeable change in overall
compilation (0.27%).  This is more than covered by our 13.26%
improvements for the legacy removal + wide_int conversion.

I think this approach is a good alternative, while providing us with
flexibility going forward.  For example, we could try defaulting to a
8 sub-ranges for a noticeable improvement in VRP.  We could also use
large sub-ranges for switch analysis to avoid resizing.

Another approach I tried was always resizing.  With this, we could
drop the whole int_range nonsense, and have irange just hold a
resizable range.  This simplified things, but incurred a 7% penalty on
ipa_cp.  This was hard to pinpoint, and I'm not entirely convinced
this wasn't some artifact of valgrind.  However, until we're sure,
let's avoid massive changes, especially since IPA changes are coming
up.

For the curious, a particular hot spot for IPA in this area was:

ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
{
...
...
  value_range save (m_vr);
  m_vr.union_ (*other_vr);
  return m_vr != save;
}

The problem isn't the resizing (since we do that at most once) but the
fact that for some functions with lots of callers we end up a huge
range that gets copied and compared for every meet operation.  Maybe
the IPA algorithm could be adjusted somehow??.

Anywhooo... for now there is nothing to worry about, since value_range
still has 2 subranges and is not resizable.  But we should probably
think what if anything we want to do here, as I envision IPA using
infinite ranges here (well, int_range_max) and handling frange's, etc.

gcc/ChangeLog:

PR tree-optimization/109695
* value-range.cc (irange::operator=): Resize range.
(irange::union_): Same.
(irange::intersect): Same.
(irange::invert): Same.
(int_range_max): Default to 3 sub-ranges and resize as needed.
* value-range.h (irange::maybe_resize): New.
(~int_range): New.
(int_range::int_range): Adjust for resizing.
(int_range::operator=): Same.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-10 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #35 from Aldy Hernandez  ---
We could also tweak the number of sub-ranges.  8 (??) also sounds good for a
few percent less in performance drop, if we care.

p.s. I did try the auto_vec thing for a 25% loss in VRP performance, even when
using address(), reserve(), etc.  I may have gotten something wrong, but it
didn't look promising.  I could post my attempt and someone could take it from
there, but I think the one irange approach with sensible defaults that
automatically grow to MAX, better.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-10 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #34 from Aldy Hernandez  ---
Excellent ideas!

For that matter, we may get away with defaulting to 3 sub-ranges and always
resizing as needed (up to MAX).  Needing more than 3 sub-ranges is so rare
(less than 0.5% of the time), that the penalty will be small.

Furthermore, these defaults are sensible enough that we could nuke int_range
altogether and have irange have this small [3*2] array.  After all, most uses
of int_range now are int_range_max, since we never know the size of the
range (except in rare cases such as boolean_type_node, etc).  This would
simplify the code and get rid of the annoying templates which I hate.  No need
for int_range_max, or int_range, etc.  Just plain irange.

This would give us an irange of 592 bytes compared to 40912 for int_range_max
currently.  Plus, it's not that far away from int_range<2> which currently is
432 bytes, and as I mentioned, barely happens as we mostly use int_range_max.

I think this is a nice trade off.  Cleaner more flexible code, without
templates.

Oh... preliminary tests show it's a 5% penalty for VRP, which is more than
covered by our 13.22% improvement (plus Andrew's cache improvements) to VRP.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-10 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #33 from Jakub Jelinek  ---
That would indeed simplify things and auto_vec member would be unnecessary, nor
any of the length/allocated members etc.  All we'd need is just a pointer and
small size buffer (and is_small check would be pointer == _size_buffer).

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-10 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #32 from rguenther at suse dot de  ---
On Tue, 9 May 2023, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695
> 
> --- Comment #29 from Jakub Jelinek  ---
> Comment on attachment 55031
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55031
> WIP patch for a dynamic int_range<>
> 
> What I meant is that by using a auto_vec could avoid reimplementing larger
> chunks of vec.h/vec.cc for this.
> Resizing by adding 10 more ranges can have higher compile time cost than what
> vec.cc (calculate_allocation_1) does - doubles the size each time for smaller
> sizes and and multiplying previous by 3 / 2 for larger ones.

If we still have a hard limit on the MAX number of ranges (previosly 255)
then I'd probably go straight from stack -> that MAX and never re-allocate
further?

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-09 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #31 from Jakub Jelinek  ---
Well, you don't need to use {,vec_}{safe,quick}_push etc. all the time, just
have auto_vec in there and use .address () on it to give you pointer to the
elements and then .length () / .allocated () and .reserve () to ask for more
elements.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #30 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #29)
> Comment on attachment 55031 [details]
> WIP patch for a dynamic int_range<>
> 
> What I meant is that by using a auto_vec could avoid reimplementing larger
> chunks of vec.h/vec.cc for this.
> Resizing by adding 10 more ranges can have higher compile time cost than
> what vec.cc (calculate_allocation_1) does - doubles the size each time for
> smaller sizes and and multiplying previous by 3 / 2 for larger ones.

Hmmm, that may require a lot more work reshuffling how irange is implemented
internally.  I'll take a peek, but I can't afford to spend another week on this
;-).  Also, adding 10 or multiplying by 2, or even adding 5 IIRC, didn't make
much of a difference in our benchmarks.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-09 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #29 from Jakub Jelinek  ---
Comment on attachment 55031
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55031
WIP patch for a dynamic int_range<>

What I meant is that by using a auto_vec could avoid reimplementing larger
chunks of vec.h/vec.cc for this.
Resizing by adding 10 more ranges can have higher compile time cost than what
vec.cc (calculate_allocation_1) does - doubles the size each time for smaller
sizes and and multiplying previous by 3 / 2 for larger ones.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #28 from Aldy Hernandez  ---
Created attachment 55031
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55031=edit
WIP patch for a dynamic int_range<>

Here's my WIP.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-09 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #27 from rguenther at suse dot de  ---
On Tue, 9 May 2023, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695
> 
> --- Comment #26 from Jakub Jelinek  ---
> new/delete rather than auto_vec member inside of the type using new/delete?

new/delete for growing auto_int_range<> from stack to heap storage
(I didn't see/look at any patches)

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-09 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #26 from Jakub Jelinek  ---
new/delete rather than auto_vec member inside of the type using new/delete?

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #25 from Richard Biener  ---
(In reply to Aldy Hernandez from comment #24)
> FYI.  I originally tried new/delete for allocation, which was a tad slower
> than ggc_alloc / ggc_free.  Not too much, but measurable.
> 
> Another idea would be to have a global obstack which auto_int_range<> uses,
> which gets freed at the end of every pass.  This saves us various
> ggc_free()s throughout, especially at destruction.

I think ggc_{alloc,free} is a no-go.  For now I'd go with new/delete, the
choice of N should be so that we do not get allocation in >90% of the
cases anyway.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #24 from Aldy Hernandez  ---
FYI.  I originally tried new/delete for allocation, which was a tad slower than
ggc_alloc / ggc_free.  Not too much, but measurable.

Another idea would be to have a global obstack which auto_int_range<> uses,
which gets freed at the end of every pass.  This saves us various ggc_free()s
throughout, especially at destruction.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-09 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #23 from Aldy Hernandez  ---
An update on the int_range_max memory bloat work.

As Andrew mentioned, having int_range<25> solves the problem, but is just
kicking the can down the road.  I ran some stats on what we actually need on a
bootstrap, and 99.7% of ranges fit in a 3 sub-range range, but we need more to
represent switches, etc.

There's no clear winner for choosing , as the distribution for anything past
<3> is rather random.  What I did see was that at no point do we need more than
125 sub-ranges (on a set of .ii files from a boostrap).

I've implemented various alternatives using a dynamic approach similar to what
we do for auto_vec.  I played with allocating 2x as much as needed, and
allocating 10 or 20 more than needed, as well going from N to 255 in one go. 
All of it required some shuffling to make sure the penalty isn't much wrt
virtuals, etc, but I think the dynamic approach is the way to go.

The question is how much of a performance hit are we willing take in order to
reduce the memory footprint.  Memory to speed is a linear relationship here, so
we just have to pick a number we're happy with.

Here are some numbers for various sub-ranges (the sub-ranges grow automatically
in union, intersect, invert, and assignment, which are the methods that grow in
sub-ranges).

trunk (wide_ints <255>) =>  40912 bytes  
GCC 12 (trees <255>)=>   4112 bytes
auto_int_range<2>   =>432 bytes  (5.14% penalty for VRP)
auto_int_range<3>   =>592 bytes  (4.01% penalty)
auto_int_range<8>   =>   1392 bytes  (2.68% penalty)
auto_int_range<10>  =>   1712 bytes  (2.14% penalty)

As you can see, even at N=10, we're still 24X smaller than trunk and 2.4X
smaller than GCC12 for a 2.14% performance drop.

I'm tempted to just pick a number and tweak this later as we have ultimate
flexibility here.  Plus, we can also revert to a very small N, and have passes
that care about switches use their own temporaries (auto_int_range<20> or
such).

Note that we got a 13.22% improvement for the wide_int+legacy work, so even the
absolute worst case of a 5.14% penalty would have us sitting on a net 8.76%
improvement over GCC12.

Bike shedding welcome ;-)

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-04 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #22 from Andrew Macleod  ---
OK, I've finished with my analysis.  There are multiple vectors of attack here,
and we should do them all.   Some where already on my radar for this release
anyway, but this gives a nice tidy place to track them all.

First, the increased size of int_range_max needs to be addressed.Short term
we could resolve this simply by making int_range<25> instead of int_range<255>.
 That of course papers over the real problem, but still needs addressing.  Aldy
and I are discussing.

Next, prefill_stmt_dependencies exists to short-cut long chains of calls to
range_of_stmt by managing it on a stack. It uses a *lot* less stack calls, but
it is still subject to re-evaluating statements whose dependencies are updated.

ssa-names that have not been calculated have no entry in rangers global cache. 
As soon as they get an entry, they are also given a monotonically increasing
timestamp.  This allows us to quickly see if a dependence has been updated
since a value was last calculated. 

prefill_stmt_dependencies pushes un-evaluated names from statements onto a
stack as they are seen, and evaluating those first, then finally evaluating the
stmt. 

_1012 = PHI <_1947(2), _1011(1016)>

we push _1012, then _1947 and _1011 on the stack to evaluate.  _1011 and _1947
will be evaluated before _1012, thus allowing for a decent calculation of
_1012.

We avoid cycles by providing an initial value for a name when it is first
registered. When we provide this initial value, we also set the timestamp to
"always current" so we don't end up with flip flopping dependencies in cycles
causing each other to be recalculated. When we store the newly calculated
value, we set a proper timestamp.

So, on to the issues that need to be addressed.

1) We unconditionally write the new value calculated to the global cache once
the dependencies are resolved.  This gives it a new timestamp, and thus makes
any other values which used it out of date when they really aren't.   This
causes a lot of extra churn.

TODO: This should be changed to only update it when it actually changes. 
Client code shouldn't have to do this, it shoud be handled right int the
cache's set_global_value ().

2) The initial value we choose is simply VARYING. This is why 1) alone won't
solve this problem.  when we push _1947 on the stack, we set it to VARYING.. 
then proceed down along chain of other dependencies Driven by _1011 which are
resolved first.  When we get back to _1947 finally, we see: 
  _1947 = 77;
which evaluated to [77, 77], and is this different than VARYING, and thus would
cause a new timestamp to be created even if (1) were implemented.

TODO: When setting the initial value in the cache, rather than being lazy and
using varying, we should invoke fold_stmt using get_global_range_query ().  
This will fold the stmt and produce a result which resolved any ssa-names just
using known global values. THis should not be expensive, and gives us a
reasonable first approximation.  And for cases like _1947, the final result as
well.

3) When we first set the intial value for _1947 and give it the ALWAYS_CURRENT
timestamp, we lose the context of when the initial value was set.  So even with
1) & 2) implemented, we are *still* need to set a timestamp for it when its
finally calculated, even though it is the same as the original.  This will
cause any names already evaluated using its range to become stale because we
can't leave it as ALWAYS_CURRENT.(There are other places where we do need
to be able to re-evaluate.. there are 2 testsuite failures caused by this if we
just leave it as always_current)

TODO: Alter the implementation of ALWAYS_CURRENT such that a name is also given
a timestamp at the time of setting the initial value.   Then set_global_range()
will clear the ALWAYS_CURRENT tag unconditionally, but leave the original
timestamp if the value hasn't changed.  This will then provide an accurate
timestamp for the initial_value.

4) Finally, There are multiple ssa-names that are dependent on _1947. Given the
stack oriented mechanism, the first such case will push the value on the stack
to be resolved, and all the other names that depend on it will use the initial
value.  When we finally get back to evaluating it, if it DID come out
different, then all those names would again be stale, and potentially get
recalculated down the road.  

TODO: This impact could be reduced by increasing the priority of its
evaluation. Instead of waiting for evaluation all the way back to the bottom of
the stack for _1947, when a new stmt is dependent on it, we could instead move
it to the top of the stack for consideration.  We'll still be resolving
dependencies, but it will be properly evaluated sooner than later.Cycles
will have to be avoided by not re prioritizing any dependencies on statement
which have been "bumped" like this.   You have to put a stake in the ground at
some point and 

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-04 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #21 from David Binderman  ---

Two cases, depending on the text of the warning:

$ fgrep warning: mk.out | fgrep Wstack | fgrep -v "might be unbounded" | fgrep
"usage is"  | sort -rnk 6
../../trunk.year/gcc/gimple-range-infer.cc:360:1: warning: stack usage is
410496 bytes [-Wstack-usage=]
../../trunk.year/gcc/gimple-range-cache.cc:1682:1: warning: stack usage is
410496 bytes [-Wstack-usage=]
../../trunk.year/gcc/range-op.cc:2822:1: warning: stack usage is 204992 bytes
[-Wstack-usage=]
../../trunk.year/gcc/range-op.cc:2556:1: warning: stack usage is 204960 bytes
[-Wstack-usage=]
../../trunk.year/gcc/value-range.cc:2169:1: warning: stack usage is 164688
bytes [-Wstack-usage=]
../../trunk.year/gcc/range-op.cc:5085:1: warning: stack usage is 164576 bytes
[-Wstack-usage=]
../../trunk.year/gcc/gimple-range-edge.cc:117:1: warning: stack usage is 163936
bytes [-Wstack-usage=]
../../trunk.year/gcc/tree-ssa-loop-niter.cc:343:1: warning: stack usage is
123808 bytes [-Wstack-usage=]
../../trunk.year/gcc/gimple-range-fold.cc:533:1: warning: stack usage is 123424
bytes [-Wstack-usage=]
../../trunk.year/gcc/gimple-range-cache.cc:1293:1: warning: stack usage is
123312 bytes [-Wstack-usage=]
../../trunk.year/gcc/gimple-range-fold.cc:734:1: warning: stack usage is 123232
bytes [-Wstack-usage=]
../../trunk.year/gcc/gimple-range-cache.cc:1154:1: warning: stack usage is
123200 bytes [-Wstack-usage=]
../../trunk.year/gcc/range-op.cc:4851:1: warning: stack usage is 123056 bytes
[-Wstack-usage=]
$ 

and

$ fgrep warning: mk.out | fgrep Wstack | fgrep -v "might be unbounded" | fgrep
-v "usage is"  | sort -rnk 7
../../trunk.year/gcc/gimple-range-gori.cc:1465:1: warning: stack usage might be
205360 bytes [-Wstack-usage=]
../../trunk.year/gcc/range-op.cc:5135:1: warning: stack usage might be 165008
bytes [-Wstack-usage=]
../../trunk.year/gcc/gimple-range-gori.cc:604:1: warning: stack usage might be
164352 bytes [-Wstack-usage=]
../../trunk.year/gcc/gimple-range-gori.cc:743:1: warning: stack usage might be
164144 bytes [-Wstack-usage=]
../../trunk.year/gcc/gimple-range-gori.cc:1187:1: warning: stack usage might be
123280 bytes [-Wstack-usage=]
../../trunk.year/gcc/gimple-range-gori.cc:1077:1: warning: stack usage might be
123280 bytes [-Wstack-usage=]
../../trunk.year/gcc/gimple-range-fold.cc:897:1: warning: stack usage might be
123216 bytes [-Wstack-usage=]
$ 

It appears that there are some large stack sizes in various parts of gcc.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-04 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #20 from David Binderman  ---
(In reply to Jakub Jelinek from comment #17)
> Or just try to check for functions with largest stack usages in cc1plus?
> Doing that on the trunk gives:
> objdump -dr cc1plus | grep
> '>:\|sub.*0x.*[0-9a-f][0-9a-f][0-9a-f][0-9a-f],.*rsp' | awk
> '/>:/{I=$2}/sub.*rsp/{J=gensub(/.*(0x[0-9a-f]+),%rsp/,"\\1","1",$0);
> K=strtonum(J);printf "%d %s %s\n", K, I, $0}' | grep -v 0x |
> sort -nr
> 410440 <_ZN19infer_range_manager17register_all_usesEP9tree_node>:  275120d:
> 48 81 ec 48 43 06 00  sub$0x64348,%rsp

That's 410K on the stack. Eek !

It looks to me like the data needs to go into the heap. RAII will help
with tidyup.

> Usual Linux stack is around 8MB, but e.g. on Windows I think just 2MB.

Linux kernel has a script checkstack.pl. This might help.

I will try an experimental build of gcc with -Wstack-usage=10 and report
back.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-04 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #19 from rguenther at suse dot de  ---
On Thu, 4 May 2023, aldyh at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695
> 
> Aldy Hernandez  changed:
> 
>What|Removed |Added
> 
>  CC||rguenth at gcc dot gnu.org
> 
> --- Comment #16 from Aldy Hernandez  ---
> Is there a way to measure peak memory usage in the compiler?

/usr/bin/time will output maxresident, so that's useful for this.

>  I'd like to
> gather some stats.  Also do we have a handful of programs we typically use to
> measure memory usage?  ISTR that Richard (??) had a set of memory hog 
> programs.

I gathered testcases from compile-time-hog/memory-hog bugzillas and run
them on an automated tester, nothing more.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-04 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

Sam James  changed:

   What|Removed |Added

 CC||sjames at gcc dot gnu.org

--- Comment #18 from Sam James  ---
Note that on musl, the default stack size is much smaller:
https://wiki.musl-libc.org/functional-differences-from-glibc.html#Thread-stack-size.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #17 from Jakub Jelinek  ---
For peak memory consumption there is -fmem-report, which prints stats about GC
memory.
For this PR I think you want instead measure maximum stack usage.
Perhaps GCC could be built with -fstack-usage and something could be derived
from callgraph and the stack usage info.  Or use systemtap on cc1/cc1plus and
try to determine something using it?
Or build gcc with -fsplit-stack and hook into morestack?
Or just try to check for functions with largest stack usages in cc1plus?
Doing that on the trunk gives:
objdump -dr cc1plus | grep
'>:\|sub.*0x.*[0-9a-f][0-9a-f][0-9a-f][0-9a-f],.*rsp' | awk
'/>:/{I=$2}/sub.*rsp/{J=gensub(/.*(0x[0-9a-f]+),%rsp/,"\\1","1",$0);K=strtonum(J);printf
"%d %s %s\n", K, I, $0}' | grep -v 0x | sort -nr
410440 <_ZN19infer_range_manager17register_all_usesEP9tree_node>:  275120d:
48 81 ec 48 43 06 00sub$0x64348,%rsp
410440 <_ZN12ranger_cache21apply_inferred_rangesEP6gimple>:  274075d:   48 81
ec 48 43 06 00sub$0x64348,%rsp
287320
<_ZN12gori_compute15logical_combineER6vrange9tree_codeRK6irangeRKS0_S7_S7_S7_>:
 2748d0c:48 81 ec 58 62 04 00sub$0x46258,%rsp
205320 <_ZN13gimple_ranger25prefill_stmt_dependenciesEP9tree_node>:  27398ed:  
48 81 ec 08 22 03 00sub$0x32208,%rsp
205288
<_ZN12gori_compute15condexpr_adjustER6vrangeS1_P6gimpleP9tree_nodeS5_S5_R10fur_source>:
 274c2c3:48 81 ec e8 21 03 00sub$0x321e8,%rsp
204840
<_ZNK15operator_rshift9op1_rangeER6irangeP9tree_nodeRKS0_S5_13relation_trio>: 
16481ba:  48 81 ec 28 20 03 00sub$0x32028,%rsp
164952 <_ZL21determine_value_rangeP4loopP9tree_nodeS2_P12__mpz_structS4_S4_>: 
189afc6: 48 81 ec 58 84 02 00sub$0x28458,%rsp
164512 <_ZN8selftestL26range_op_bitwise_and_testsEv>:  164fef4: 48 81 ec a0 82
02 00sub$0x282a0,%rsp
164360 <_ZN8selftestL25range_tests_int_range_maxEv>:  1a26584:  48 81 ec 08 82
02 00sub$0x28208,%rsp
164312 <_ZN18remove_unreachable25remove_and_update_globalsEb.part.0>:  19e1780:
48 81 ec d8 81 02 00sub$0x281d8,%rsp
164296 <_ZN12ranger_cache16fill_block_cacheEP9tree_nodeP15basic_block_defS3_>: 
27418ed:48 81 ec c8 81 02 00sub$0x281c8,%rsp
164280
<_ZN16fold_using_range17range_of_range_opER6vrangeR23gimple_range_op_handlerR10fur_source>:
 274680a:48 81 ec b8 81 02 00sub$0x281b8,%rsp
164280
<_ZN12gori_compute21compute_operand_rangeER6vrangeP6gimpleRKS0_P9tree_nodeR10fur_sourceP14value_relation.part.0>:
 274b259:  48 81 ec b8 81 02 00sub$0x281b8,%rsp
164248 <_ZN11range_query14get_tree_rangeER6vrangeP9tree_nodeP6gimple>: 
1a17e26:48 81 ec 98 81 02 00sub$0x28198,%rsp
164232
<_ZN12gori_compute21refine_using_relationEP9tree_nodeR6vrangeS1_S3_R10fur_source15relation_kind_t>:
 274a50a:48 81 ec 88 81 02 00sub$0x28188,%rsp
164104 <_ZN8selftestL21range_op_rshift_testsEv>:  1648826:  48 81 ec 08 81
02 00sub$0x28108,%rsp
164008
<_ZNK13operator_cast9op1_rangeER6irangeP9tree_nodeRKS0_S5_13relation_trio.part.0>:
 164e5f9: 48 81 ec a8 80 02 00sub$0x280a8,%rsp
163864 <_ZN21gimple_outgoing_range18calc_switch_rangesEP7gswitch>:  274335d:   
48 81 ec 18 80 02 00sub$0x28018,%rsp
123240
<_ZN12gori_compute22compute_operand1_rangeER6vrangeR23gimple_range_op_handlerRKS0_P9tree_nodeR10fur_sourceP14value_relation>:
 274d1de:  48 81 ec 68 e1 01 00sub$0x1e168,%rsp
123224
<_ZN12gori_compute22compute_operand2_rangeER6vrangeR23gimple_range_op_handlerRKS0_P9tree_nodeR10fur_sourceP14value_relation>:
 274dc8e:  48 81 ec 58 e1 01 00sub$0x1e158,%rsp
123176 <_ZN16path_range_query23compute_ranges_in_blockEP15basic_block_def>: 
18c708d:   48 81 ec 28 e1 01 00sub$0x1e128,%rsp
123176 <_ZN16fold_using_range12range_of_phiER6vrangeP4gphiR10fur_source>: 
2745820: 48 81 ec 28 e1 01 00sub$0x1e128,%rsp
123176 <_ZN13gimple_ranger7dump_bbEP8_IO_FILEP15basic_block_def>:  273cd4b:
48 81 ec 28 e1 01 00sub$0x1e128,%rsp
123144
<_ZN16fold_using_range18range_of_cond_exprER6vrangeP7gassignR10fur_source>: 
27450a3:48 81 ec 08 e1 01 00sub$0x1e108,%rsp
123128 <_ZN12ranger_cache15propagate_cacheEP9tree_node>:  2740d9d:  48 81
ec f8 e0 01 00sub$0x1e0f8,%rsp
83240 <_ZN8selftestL21range_op_lshift_testsEv>:  16461da:   48 81 ec 28 45
01 00sub$0x14528,%rsp
83048 <_ZL23adjust_op1_for_overflowR6irangeRKS_15relation_kind_tb.part.0>: 
1639c99:48 81 ec 68 44 01 00sub$0x14468,%rsp
82856 <_ZL15simplify_rotateP20gimple_stmt_iterator>:  185282e:  48 81 ec a8 43
01 00sub$0x143a8,%rsp
82560
<_ZN21simplify_using_ranges18fold_cond_with_opsE9tree_codeP9tree_nodeS2_P6gimple>:
 1a6bfca:  48 81 ec 80 42 01 00sub$0x14280,%rsp
82248
<_ZL21record_nonwrapping_ivP4loopP9tree_nodeS2_P6gimpleS2_S2_bb.constprop.0>: 
189571d:   48 81 ec 48 41 01 00sub$0x14148,%rsp
82168

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

Aldy Hernandez  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #16 from Aldy Hernandez  ---
Is there a way to measure peak memory usage in the compiler?  I'd like to
gather some stats.  Also do we have a handful of programs we typically use to
measure memory usage?  ISTR that Richard (??) had a set of memory hog programs.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-03 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #15 from Andrew Macleod  ---
(In reply to Mikael Morin from comment #8)
> (In reply to Andrew Macleod from comment #7)
> > 
> > diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
> > index 49e9d6b4de6..74afaaf2989 100644
> > --- a/gcc/gimple-range.cc
> > +++ b/gcc/gimple-range.cc
> > @@ -394,7 +394,9 @@ gimple_ranger::prefill_stmt_dependencies (tree ssa)
> >   Value_Range tmp (TREE_TYPE (name));
> >   m_cache.get_global_range (tmp, name);
> >   r.intersect (tmp);
> > - m_cache.set_global_range (name, r);
> > + // Only update the global range if it changes.
> > + if (r != tmp)
> > +   m_cache.set_global_range (name, r);
> > }
> >   continue;
> > }
> 
> Maybe the result of intersect could be used to check for changes?
> 
> if (tmp.intersect(r))
>   ...

Yes, my next iteration of the patch did just that..  

however, its still not quite right, theres a ripple effect with the way we
handle timestamps. the first time a name is processed, we set an "always
current" timestamp until we resolve all the dependencies and get back to it to
calculate a real value.  If the initial value and the calculated value ends up
being the same, the current fix won't clear the always-current timestamp..

still working on it.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #14 from Jakub Jelinek  ---
I think even 4080 bytes for int_range_max was quite a lot.  I know it perhaps
simplifies stuff a little bit and hopefully copy construction/assignments only
copy the actual ranges and not the whole thing, but still.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-03 Thread aldyh at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #13 from Aldy Hernandez  ---
(In reply to Jakub Jelinek from comment #12)
> Perhaps change int_range to have the wide_ints as auto_vec with reserved
> space for a few (perhaps 3 (times 2))?

Our original implementation was exactly that, but we switched to the current
one years ago (maybe it coincided with our switch back to trees, I can't
remember).  But yes, we're discussing options.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #12 from Jakub Jelinek  ---
Perhaps change int_range to have the wide_ints as auto_vec with reserved space
for a few (perhaps 3 (times 2))?

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #11 from Jakub Jelinek  ---
(In reply to Andrew Macleod from comment #7)
> The problem was exposed by the newly increased size of int_range_max.  (We
> may want to address that elsewhere)   It has grown an order of magnitude in
> size with wide-int.

Yes, 10 times.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #10 from Jakub Jelinek  ---
Can we either stop using int_range_max in functions which can be called
recursively, or invent some better representation for int_range_max (have first
8 or 16 ranges with wide_ints inline and rest dynamically allocated when really
needed)?
Because wide_int with wide_int_storage is not meant to have compact memory
footprint (e.g. for x86 target it is 80 bytes long) and having int_range_max
have 510 of these (40800 bytes) is simply too large on stack for one variable
even if no recursion is involved.