from:"amacleod at redhat dot com"

[Bug middle-end/116899] ICE: in quick_push, at vec.h:1041 with -O at _BitInt()

2024-09-30 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116899

--- Comment #10 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #9)
> Created attachment 59238 [details]
> gcc15-pr116899.patch
> 
> Untested fix in patch form.

Also pre-approved.
thanks

[Bug middle-end/116898] ICE: in block_range, at gimple-range-cache.cc:1293 with -O -finstrument-functions -fnon-call-exceptions and _BitInt()

2024-09-30 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116898

--- Comment #5 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #4)
> Created attachment 59236 [details]
> gcc15-pr116898.patch
> 
> So like this?  Will test tonight.

perfect.  consider it pre-pproved.

[Bug middle-end/116899] ICE: in quick_push, at vec.h:1041 with -O at _BitInt()

2024-09-30 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116899

--- Comment #7 from Andrew Macleod  ---
(In reply to Andrew Macleod from comment #6)
> (In reply to Jakub Jelinek from comment #5)
> > Even if create (n) didn't work (which would be good to debug), the
> > x.safe_grow_cleared (n); x.truncate (0); can be just done more efficiently
> > with x.reserve (n);
> > 
> > Anyway, the more important question is if range-cache can work even if the
> > number of basic blocks grows in the middle and whether say just using
> > safe_push instead of quick_push would do the trick.
> 
> It works fine with growing CFG. 
> 
OK, let me give you a more measured answer instead of yes. :-P

the m_workback vector is used for a Walk up the CFG to mark blocks which need
to be revisited because the incoming edge changes a specific SSA_NAMES range. 
These mostly become range-on-entry blocks that need updating, and only those
block get put into the stack.

As a result, even a walk from the bottom of the CFG to the top should require
far less than the full number of blocks.   As a result, the cache does use a
quick_push.

I suppose it is possible that one could add so many basic blocks that this
situation could exceed the original size of the vector, however unlikely.   The
completely safe approach would be to change ranger_cache::fill_block_cache ()
to either
   a) assert there is space before calling quick_push
or 
   b) change it to a safe_push

[Bug middle-end/116899] ICE: in quick_push, at vec.h:1041 with -O at _BitInt()

2024-09-30 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116899

--- Comment #6 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #5)
> Even if create (n) didn't work (which would be good to debug), the
> x.safe_grow_cleared (n); x.truncate (0); can be just done more efficiently
> with x.reserve (n);
> 
> Anyway, the more important question is if range-cache can work even if the
> number of basic blocks grows in the middle and whether say just using
> safe_push instead of quick_push would do the trick.

It works fine with growing CFG. 

whatever problem I had once upon a time is probably long gone, it was years
ago. 
I also wasn't/am not familiar with the nuances of vec initiations or that some
versions are more effiicet than others.  good to know.

[Bug middle-end/116899] ICE: in quick_push, at vec.h:1041 with -O at _BitInt()

2024-09-30 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116899

--- Comment #4 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #3)
> Similarly to the other BB, enable_ranger -> gimple_ranger -> ranger_cache
> does
> 1000m_workback.create (0);
> 1001m_workback.safe_grow_cleared (last_basic_block_for_fn (cfun));
> 1002m_workback.truncate (0);
> (which seems just weird, I'd expect m_workback.create
> (last_basic_block_for_fn (cfun));
> Creating a large vector, filling it with zeros and then immediately
> truncating is just a waste of time) and then just uses
> m_workback.quick_push.  That works only if no new basic blocks are created,
> which the gimple-lower-bitint creates a lot (split_block in tons of places,
> even manual create_basic_block).

with the truncate, clearly no need for the safe growing.   I think my use of
the
   create (0); safe_grow(size);
idiom dates back to a problem I once had (years ago) where "create (size)"
caused me a failure down the road that was hard to debug.. and it was fixed by
the ceate (0)  safe_grow(size) seqeuence.  , so I ended up just always doing it
that way :-P

I especially see no need for that safe_grow_cleared call in this case

[Bug middle-end/116898] ICE: in block_range, at gimple-range-cache.cc:1293 with -O -finstrument-functions -fnon-call-exceptions and _BitInt()

2024-09-30 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116898

--- Comment #3 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #2)
> The bitint lowering pass emits some gimple statements on edges
> (gsi_insert_on_edge) and the edge insertions are (intentionally) committed
> to edges only at the end of pass (gsi_commit_edge_inserts ()).
> So, what the gimple cache sees here is an SSA_NAME set in a stmt queued on
> edge.
> 
> Either the range cache can be adjusted not to fail assertion in cases like
> that but just pretend it doesn't know the range of something not yet in the
> IL (perhaps under some flag), or I'll need to revamp the pass to repeat the
> handle_operand_addr and .{ADD,SUB}_OVERFLOW discovery to find out where
> range_to_prec -> range_of_expr would be actually called, remember it in some
> hash table or whatever data structure and then during actual pass don't use
> range_of_expr anymore and just look up what has been remembered.  If the
> former would be possible, I'd certainly prefer that.


It should be perfectly safe to return false instead of trapping on that line. 
The routine already returns false if it can't produce a range on entry.   We've
added similar functionality at other places to have stmts that are no yet in
the IL, we just haven't tripped over this one before.

[Bug middle-end/114855] ICE: Segfault when compiling large autogenerated C source file

2024-09-27 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855

--- Comment #57 from Andrew Macleod  ---
FWIW, on my last run, enabling early VRP sped up my -O1 compilation by a fair
amount. total compilation dropped from 480 seconds to 380 seconds... 

It took 2.5 seconds to run, and im going to guess might have cleaned the code
up somewhat.

Might be worth looking at a Fast VRP pass at -O1?  I'm not sure how/where to
add NEXT_PASS (pass_fast_vrp);  to run it at -O1, but it can be easily
experimented with via

--enable-tree-evrp --param=vrp-block-limit=1

which will always run Fast VRP as the EVRP pass.

[Bug middle-end/114855] ICE: Segfault when compiling large autogenerated C source file

2024-09-25 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855

--- Comment #55 from Andrew Macleod  ---
(In reply to Richard Biener from comment #50)
> (In reply to Richard Biener from comment #4)
> > Trunk at -O1:
> > 
> > dominator optimization : 495.14 ( 82%)   0.20 (  5%) 495.44 (
> > 81%)   113M (  5%)
> 
> Compared to that we're now at the following state with -O1 (everything >=
> 4%):
> 
>  callgraph ipa passes   :  17.23 ( 10%)
>  df live regs   :   6.76 (  4%)
>  dominator optimization :  89.76 ( 50%)
>  backwards jump threading   :   7.94 (  4%)
>  TOTAL  : 180.77
> 
> So it's still DOM aka forward threading eating most of the time. 
> -fno-thread-jumps improves compile-time to 77s, DOM then still takes 25s
> (33%)
> (top offenders are then dom_oracle::register_transitives, bitmap_set_bit
> and wide_int_storage copying).  I noticed the unbound dominator traversal
> in register_transitives already.

Theres already a creation flag to disable registering transitives if we decide
we don't want them due to their expense.  fast VRP disables them always, but
regular VRP never disables it.  We switch to fast vrp when the size of the CFG
goes beyond a threshold.

The threader will use a normal ranger, so it isn't getting the benefit of this
flag.   It would be simple enough to disable transitives in ranger if the CFG
is beyond a specified size, and we could use the same --param VRP uses to
switch algorithms.

something like this ought to remove that hotspot:

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 4b94e0db7aa..f665925f630 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -1003,7 +1003,8 @@ ranger_cache::ranger_cache (int not_executable_flag, bool
use_imm_uses)
   m_temporal = new temporal_cache;

   // If DOM info is available, spawn an oracle as well.
-  create_relation_oracle ();
+  bool transitives_p = (last_basic_block_for_fn (cfun) <
param_vrp_block_limit);
+  create_relation_oracle (transitives_p);
   create_infer_oracle (use_imm_uses);
   create_gori (not_executable_flag, param_vrp_switch_limit);

[Bug tree-optimization/116588] wrong code with -O2 -fno-vect-cost-model -fno-tree-dominator-opts -fno-tree-fre --param=vrp-block-limit=0

2024-09-06 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116588

--- Comment #3 from Andrew Macleod  ---
Its happening in tree-ssa-propagate.cc after the simplification:

  /* If this is a control statement the propagator left edges
 unexecuted on force the condition in a way consistent with
 that.  See PR66945 for cases where the propagator can end
 up with a different idea of a taken edge than folding
 (once undefined behavior is involved).  */
  if (gimple_code (stmt) == GIMPLE_COND)
{
  if ((EDGE_SUCC (bb, 0)->flags & EDGE_EXECUTABLE)
  ^ (EDGE_SUCC (bb, 1)->flags & EDGE_EXECUTABLE))
{
  if (((EDGE_SUCC (bb, 0)->flags & EDGE_TRUE_VALUE) != 0)
  == ((EDGE_SUCC (bb, 0)->flags & EDGE_EXECUTABLE) != 0))
gimple_cond_make_true (as_a  (stmt));
  else
gimple_cond_make_false (as_a  (stmt));
  gimple_set_modified (stmt, true);

It appears that EDGE_EXECUTABLE is incorrect..
Thats as far as I have gotten so far.

[Bug middle-end/114855] ICE: Segfault when compiling large autogenerated C source file

2024-09-05 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855

--- Comment #34 from Andrew Macleod  ---
Btw, if we simply remove the call to that function, with -O1
-fenable-tree-thread1 I see:

 backwards jump threading   :  14.36 (  3%)   0.39 (  5%)  14.68 (  3%)
  238M (  9%)

which is a notable improvement from 1370 seconds :-P   SO clearly the culprit.

DOM appears to still be somewhat of an issue..  Ive done some experimenting in
this area, but nothing IM willing to commit to yet.  Do you have anything
outstanding for this Aldy?

dominator optimization : 208.04 ( 43%)   0.95 ( 12%) 209.77 ( 43%) 
 342M ( 13%)

[Bug middle-end/114855] ICE: Segfault when compiling large autogenerated C source file

2024-09-04 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855

--- Comment #29 from Andrew Macleod  ---
Huh. Do we calculate *all* paths ahead of time?

I tried running valgrind, which died, but up to that point it showed 77% of the
time spend in the body of 
back_jt_path_registry::adjust_paths_after_duplication ()

That functions looks kind of quadratic or worse in nature, and when I run it
with GDB and force a stop at a random point, I see:

(gdb) p m_paths->m_vec
$2 = (vec*, va_heap, vl_embed> *)
0x12f50750
(gdb) p *(m_paths->m_vec)
$3 = {m_vecpfx = {m_alloc = 745039, m_using_auto_storage = 0, m_num = 572595}}
(gdb) p cand_path_num
$4 = 309148
(gdb) p curr_path_num
$5 = 0


Am I reading that right? 572,595 paths, of which we are looking at candidate
#309,148  ??

It *appears* that this is called for path 0, then path 0 is removed and its all
done over again with one less path in the vector.

At one point a little later in back_jt_path_registry::update_cfg I see:

p m_num_threaded_edges
$19 = 18669

and the size of the m_path vector is down to 558637 
(gdb) p *(m_paths.m_vec)
$23 = {m_vecpfx = {m_alloc = 745039, m_using_auto_storage = 0, m_num = 558637}}

back_jt_path_registry::update_cfg() makes a lot of calls to
back_jt_path_registry::duplicate_thread_path which then invokes
adjust_paths_after_duplication

All this time is spend here copying and moving things.

Something seems horribly wrong about that

[Bug middle-end/114855] ICE: Segfault when compiling large autogenerated C source file

2024-09-03 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855

--- Comment #27 from Andrew Macleod  ---
(In reply to Aldy Hernandez from comment #26)

> 
> With -O1 -fenable-tree-thread1 (no threadfull):
>  dominator optimization : 127.76 (  7%)   0.57 (  7%) 128.58 ( 
> 7%)   236M (  9%)
>  backwards jump threading   :1375.45 ( 79%)   1.81 ( 22%)1382.33 (
> 79%)   519M ( 20%)
>  TOTAL  :1733.30  8.16   1747.35
> 2636M
> 

> Richi mentioned that the handcuffs he added weren't being triggered in the
> backwards threader because ranger was going outside the path, but
> -fenable-tree-thread1 adds a light-weight threading pass that shouldn't use
> anything but gori (*).  So, either we're doing something stupid in path
> discovery, or we're looking outside the path, even with a non-resolving
> threader.  Something definitely smells bad.
> 
> (*) Well, technically we use the global ranger to look at relations. 
> Andrew, would looking at relations cause us to look outside of the given
> path?

Sure. If a relation is not resolved within the path, and there might be a
relation determined before the path, then it queries the global oracle starting
at the root of the path. 

The relation oracle does not cache anything on lookups. It was anticipated as a
light weight query, and caching results would end up costing more in memory
than it would save in subsequent queries.

So if there are frequent queries that escape the path, and those potential
relations are way up in the CFG, it could do a lot of traversals.

Also, the oracle does not KNOW if there is a relation, it only knows when both
ssa-names HAVE relations, so it has to go look to see if they are related to
each other.

That said, I tried disabling the root oracle, and there was basically no
difference.   So that is probably a red herring.

I also tried disabling relations.. ie, hacked the path oracle to always return
VREL_VARYING and not to actually register anything.  the threader only dropped
from 79% to 77% with no relations, so I do not think the relation oracle is the
issue. 

Its sure pending a lot of time somewhere.   still looking... It does appear to
do a lot of the same work over and over.

[Bug middle-end/116003] [15 Regression] ICE: in register_initial_def, at value-relation.cc:610 with -O2 -fnon-call-exceptions -fprofile-arcs -finstrument-functions -fno-tree-copy-prop and _BitInt()

2024-09-03 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116003

Andrew Macleod  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Andrew Macleod  ---
fixed.

[Bug middle-end/114855] ICE: Segfault when compiling large autogenerated C source file

2024-08-10 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855

--- Comment #20 from Andrew Macleod  ---
I did an -O2 run after those patches. 

Highlights:

 tree SSA incremental   : 117.74 (  1%)   0.63 (  3%) 120.37 (  1%)
 1049M ( 24%)
 dominator optimization : 680.49 (  5%)   0.65 (  4%) 686.40 (  5%)
  170M (  4%)
 backwards jump threading   :11253.23 ( 90%)   9.15 ( 49%)11332.34 (
90%)  1332M ( 30%)
 TOTAL  :12520.81 18.54  12622.54  
  4459M

I suspect most of dom is the over and over edge recalculations, but I don't
actually know that.  I do know that most of the 200 seconds at -O1 is, so it
seems a reasonable guess.

the backwards threader I can't comment on, but time would be rational if that
were fixed.

[Bug middle-end/114855] ICE: Segfault when compiling large autogenerated C source file

2024-08-09 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855

--- Comment #19 from Andrew Macleod  ---
(In reply to GCC Commits from comment #17)
> The master branch has been updated by Andrew Macleod :
> 
> https://gcc.gnu.org/g:9e4da946c4263a4c89d5fc365b3c97ae244c5018
> 
> commit r15-2858-g9e4da946c4263a4c89d5fc365b3c97ae244c5018
> Author: Andrew MacLeod 
> Date:   Thu Aug 8 16:37:28 2024 -0400
> 
> Adjust rangers recomputation depth based on the number of BBs.
> 
> As the number of block increase, recomputations can become more
> expensive.  Adjust the depth limit to avoid excessive compile time.
> 
> PR tree-optimization/114855
> * gimple-range-gori.cc (gori_compute::gori_compute): Adjust
> ranger_recompute_depth limit based on the number of BBs.
> (gori_compute::may_recompute_p): Use previosuly calculated value.
> * gimple-range-gori.h (gori_compute::m_recompute_depth): New.

With the second change, on my machine at -O1 we're down to:
 callgraph ipa passes   :  47.83 ( 11%)   1.64 ( 25%)  49.65 ( 12%)
  338M ( 17%)
 df live regs   :  12.19 (  3%)   0.19 (  3%)  12.41 (  3%)
0  (  0%)
 dominator optimization : 214.20 ( 51%)   0.58 (  9%) 215.56 ( 51%)
  113M (  6%)
 backwards jump threading   :  27.58 (  7%)   0.20 (  3%)  27.95 (  7%)
   68M (  3%)
 tree CCP   :   5.46 (  1%)   0.15 (  2%)   5.72 (  1%)
   14M (  1%)
 tree FRE   :   5.48 (  1%)   0.20 (  3%)   5.69 (  1%)
 2063k (  0%)
 dominance computation  :   8.54 (  2%)   0.18 (  3%)   8.79 (  2%)
0  (  0%)
 integrated RA  :  28.48 (  7%)   0.03 (  0%)  28.61 (  7%)
   29M (  1%)
 LRA non-specific   :  12.43 (  3%)   0.01 (  0%)  12.48 (  3%)
 8977k (  0%)
 TOTAL  : 416.00  6.52424.24   
 2041M

So better.  Still have some DOM work to do.  That is mostly edge range
recalculations in the path ranger and GORI.

-O2 is a different beast :-P

[Bug tree-optimization/116166] [13/14/15 Regression] risc-v (last) insn-emit-nn.c build takes hours

2024-08-06 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116166

--- Comment #23 from Andrew Macleod  ---
(In reply to Richard Biener from comment #20)


> collected sofar.  I've noticed with the backwards threader that path
> ranger isn't very good in the ability to preserve a cache when adding
> blocks.  But in the case of forward threading resetting the cache shouldn't

I know only a little about the path ranger. I believe it is primarily a path
finding class which uses ranger to provide ranges on entry ot the path, and
then goes and walks the various path combinations utilizing GORI to calculate
any outgoing ranges of ssa-names it thinks are interesting to determine if any
conditions can folded on some path(s).


> be necessary - we might get additional "interesting" names to analyze but
> already analyzed names do not need re-analyzing (as opposed to the backward
> threader where we add blocks at the start of the path and thus would
> possibly get refined ranges).  We only need to clear the cache when
> popping blocks - and even then only for names defined in the popped block.
> 
> A more pragmatic fix would be to limit the number of EDGE_NO_COPY_SRC_BLOCK
> blocks we add to the path aka the depth of the thread_around_empty_blocks
> recursion.  Limiting that to 10 for example makes the testcase compile in 1s.
> Implementing search space limiting and re-using --param max-jump-thread-paths
> works as well, resulting in 3s.

If there are no limits on its depth, well that could be a problem :-)  Aldy
Understands this much better, we should engage him upon his return.

[Bug tree-optimization/116166] risc-v (last) insn-emit-nn.c build takes hours

2024-08-01 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116166

--- Comment #7 from Andrew Macleod  ---
I think fixing 114855 will probably resolve this one too. Its a more
"managable" test case.   I'm trying to have a look, but I am off next week so
it isn't imminent.

Meanwhile the "workaround" might be to use '-fno-tree-dominator-opts'  and
maybe '-fno-thread-jumps -fno-tree-reassoc'.

I don't know if the no-tree-reassoc is still necessary, but when I was looking
at 114855 for VRP it died in reassociation if I turned off thread-jumps.

[Bug middle-end/116003] [15 Regression] ICE: in register_initial_def, at value-relation.cc:610 with -O2 -fnon-call-exceptions -fprofile-arcs -finstrument-functions -fno-tree-copy-prop and _BitInt()

2024-07-19 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116003

--- Comment #1 from Andrew Macleod  ---
When registering an equivalence, the oracle creates an initial equivalence for
an SSA_NAME with itself in the def block.   This  prevents dominator queries
from searching past the definition block.

In this case the definition stmt has no gimple_bb set yet, so it is ice-ing

If the definition stmt for the SSA_NAME has not been placed in the CFG yet, we
should defer creating this record until it has.The next time an equivalence
is registered, if the SSA_NAME is in the IL then this record will be created.

Patch in testing

[Bug bootstrap/115951] [15 Regression] pgo+lto enabled bootstrap fails building gnat (ICE in fold_stmt, at gimple-range-fold.cc:701)

2024-07-16 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115951

--- Comment #3 from Andrew Macleod  ---
I think I have a handle on it. 

The issue is statements like
_80 = _79 == 0

where in ADA a a boolean type is 8 bits and GCC's default boolean type is 1
bit.
All through ranger we use the type of the LHS, but when the pointer operations
were split out for GCC 15 into their own type, we accidentally ended up with
the relational operators using the default boolean type instead of generating a
result with the appropriate type.

Thus for the above statement we were getting a 1 bit boolean for the RHS and an
8 bit boolean for the LHS, resulting in incompatible ranges (which require
identical precision)

I'll try making those adjustments and see if it bootstraps for me.

[Bug other/115241] header-tools scripts not compatible to python3

2024-05-27 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115241

--- Comment #2 from Andrew Macleod  ---
(In reply to sadineniharish8446 from comment #0)
> The scripts in contrib/header-tools/ are breaks with python3.
> These scripts were for python2 and we did update it for python3 and sent the
> patches to gcc upstream.
> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg337175.html
> 
> We did not get any response over there about the patch.
> 
> Can you let us know that our patches are ok OR gcc has any plan to upgrade
> these scripts for python3 compatibility?

Patches are OK by me.  

Jeff was planning to run them once a release or so, but clearly that has not
happened :-)

[Bug tree-optimization/115208] [15 Regression] Memory consumption get extremely high after r15-807-gfae5e6a4dfcf92

2024-05-25 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115208

--- Comment #6 from Andrew Macleod  ---
Created attachment 58287
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58287&action=edit
proposed patch

I'm testing this patch, does it resolve your problem?

I forgot to free the gori_nmap object when the companion gori object was freed.
 doh.

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-15 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985

--- Comment #21 from Andrew Macleod  ---
(In reply to Martin Jambor from comment #20)

> (I also cannot prevent myself from ranting a little that it would
> really help if all the ranger (helper) classes and functions were
> better documented.)

I am working on that this year.  Its one of my major tasks.   internals, APIs,
reusable components..  an entire encyclopedia.   first bits are coming shortly.

[Bug middle-end/114855] ICE: Segfault

2024-05-09 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855

--- Comment #8 from Andrew Macleod  ---
(In reply to Andrew Macleod from comment #7)
> LOoks like the primary culprits now are:
> 
> dominator optimization : 666.73 (  7%)   0.77 (  2%) 671.76 ( 
> 7%)   170M (  4%)
> backwards jump threading   :7848.77 ( 85%)  21.04 ( 65%)7920.05 (
> 85%)  1332M ( 29%)
> 
> TOTAL  :9250.99 32.58   9341.40 
> 4619M

If I turn off threading, then VRP opps up with 400, so I took a look at VRP.

The biggest problem is that this testcase has on the order of 400,000 basic
blocks, with a pattern of a block of code followed by a lot of CFG diamonds
using a number of differense ssa-names from within the block over and over.  
When we are calculating /storing imports and exports for every block, then
utilizing that info to try to find outgoing ranges that maybe we can use, it
simply adds up.

For VRP, we currently utilize different cache models depoending on the number
of block.. Im wondering if maybe this might not be a good testcase to actually
use a different VRP when the number of block are excessive.  I wroite the fast
VRP pass last year, which currently isnt being used.  I'm goign to experiment
with it to see if we turn it on for CFGs that above a threasdhold (100,000 BB?
), we enable the lower overhead fast VRP instead for all VRP passes. 

The threading issue probably needs to have some knobs added or tweaked for such
very large BBs. there would be a LOT of threading opportunities in the code I
saw, so I can see why it would be so busy.  I saw a lot fo branches to branches
using the same SSA_NAMe.

[Bug middle-end/111009] [12/13 regression] -fno-strict-overflow erroneously elides null pointer checks and causes SIGSEGV on perf from linux-6.4.10

2024-05-08 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111009

--- Comment #14 from Andrew Macleod  ---
Created attachment 58134
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58134&action=edit
adjusted patch

(In reply to Richard Biener from comment #13)
> Andrew - this doesn't pick to gcc-13 because of the following but we should
> backport the fix somehow.  Can you please see to that in time for GCC 13.3
> (next week)?
> 
Sure. Its just missing the definition of contains_zero_p () from value-range.h
that must have been added in an earlier patch.   I'm running the following
tweaked patch thru testing for GCC 13

[Bug middle-end/114855] ICE: Segfault

2024-05-03 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855

--- Comment #7 from Andrew Macleod  ---
LOoks like the primary culprits now are:

dominator optimization : 666.73 (  7%)   0.77 (  2%) 671.76 (  7%) 
 170M (  4%)
backwards jump threading   :7848.77 ( 85%)  21.04 ( 65%)7920.05 ( 85%) 
1332M ( 29%)

TOTAL  :9250.99 32.58   9341.40
4619M

[Bug tree-optimization/114331] Missed optimization: indicate knownbits from dominating condition switch(trunc(a))

2024-03-14 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331

--- Comment #11 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #10)
> I really don't know how GORI etc. works.
> But, if when the switch handling determines that _1 (the switch controlling
> expression) has [irange] [111, 111] MASK 0x0 VALUE 0x6f (does it actually?
> i.e. for a singleton range all the bits here are known and equal to the
> value), then when trying to derive a range for related num_9(D) which is int
> rather than _1's short and
>   _1 = (short int) num_5(D);
> for the MASK/VALUE we should just use the same VALUE and or in 0x
> into MASK because we then don't know anything about the upper bits.
> Though, looking at the evrp detailed dump there is
> 2->3  _1 :  [irange] short int [111, 111]
> 2->3  _2 :  [irange] int [111, 111]
> 2->3  num_5(D) :[irange] int [-INF, -65537][-65425, -65425][111,
> 111][65536, +INF]
> and so no MASK/VALUE for any of those ranges.

Right. No mask needed for _1 and _2 as the range fully represents the known
bits, and operator_cast::op1_range hasn't been taught to add a bitmask when
calculating num_5 yet.  It could have as mask and value dded to it because its
implied by the result of the cast being short int [111, 111]   The routine Aldy
provided should create that mask when asked I think.



> Now, from comments it seems that irange_bitmask is only computed on demand
> to speed things up, unless it has been explicitly set.
> Now, say for _1 or _2 above, we don't have anything recorded but we can
> always compute it on demand from the value range.  But when adding the

Which I think we are both on the same page so far.

> num_5(D) range based on the related _1 range, the on-demand irange_bitmask
> is no longer as precise as it would be if we when deriving that [-INF,
> -65537][-65425, -65425][111, 111][65536, +INF] range

We aren't deriving it from that range tho.  we are solving for num_5 the
equation
[111,111] = (short int) num_5

The range you list is the best we can currently produce with *just* ranges.  if
we also add that bitmask along with the range (generated from the range on the
LHS,  we can adjust that to what you specify

> from the [111, 111] range also derived from the in that case on-demand asked
> MASK 0x0 VALUE 0x6f to MASK 0x VALUE 0x6f.

Right, so the LHS 16 bits produce MASK 0x VALUE 0x6f, which means those
bits should apply tot he RHS as well.  Since we're extending that to 32 bits,
we'd have to make the upper ones unknown, so we should be able to create

num_5 : [rainge] int [-INF, -65537][-65425, -65425][111, 111][65536, +INF] MASK
0x VALUE 0x6f

And then in bb 3 when we see
  _8 = num_5(D) & 65534;

instead of producing 
_8 : [irange] int [0, 65534] MASK 0xfffe VALUE 0x0

operator_bitwise_and::fold_range ought to be able to combine the known bits and
come up with
_8 : [irange] int [0, 65534] MASK 0x VALUE 0x6e,

which if Aldy's bitmask code is working right (:-) should turn into 
_8 : [irange] int [110, 110]

It should be fairly straightforward if operator_cast::op1_range creates the
mask for the range it produces.

[Bug tree-optimization/114331] Missed optimization: indicate knownbits from dominating condition switch(trunc(a))

2024-03-14 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331

--- Comment #8 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #7)
> (In reply to Aldy Hernandez from comment #6)
> > You may want to look at:
> > 
> > // Return the bitmask inherent in the range.
> > 
> > irange_bitmask
> > irange::get_bitmask_from_range () const
> > {
> > }
> > 
> > IIRC, this code was originally authored by Jakub and was embedded in the
> > tree-ssanames.cc code setting nonzero bits every time we set a global range.
> > I just abstracted it and adapted it to work within our framework.
> 
> I'm afraid on this testcase trying to derive bitmask from range is not going
> to work if the range is that
> [irange] int [-INF, -65537][-65425, -65425][111, 111][65536, +INF]
> We really want to record that because of the bb we know that the low 16 bits
> are equal to 111 and the upper 16 bits are unknown.
> Because the range could help there only if we had a union of 65536 subranges
> here.

Isnt that exactly what the known bits is suppose to do?  op1_range already
knows [111,111] is the lower 15 bits, but it doesnt add the bitmask.  We
shoulod also be able to set the known mask for the lower 16 bits as well?

> tree-ssanames.cc before contained just a single mask, so could only track
> known zero bits vs. unknown bits rather than known zero bits vs. known one
> bits vs. unknown bits.
> While MASK 0x VALUE 0x6f can represent what supposedly the
> BIT_AND_EXPR optimization needs (whether we actually use it then to optimize
> it away is another question).


then when we see 
_8 = num_5(D) & 65534;
the operator_bitwise_and::fold_range should pick up the known bits from the
num_5 range, recognize its the only relevant part of the mask, and get it right
shouldn't it? 
It can't do it without the mask for sure. It possible that
operator_bitwise_and::fold_range needs some tweaking to recognize the known
bits in op1 and map it to the mask in op2, but maybe we alreayd do that. I
havent looked at that code either, but it seems like something it should be
doing.


So instead of 
_8 : [irange] int [0, 65534] MASK 0xfffe VALUE 0x0

we'd get [110,110] for _8

[Bug tree-optimization/114331] Missed optimization: indicate knownbits from dominating condition switch(trunc(a))

2024-03-14 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331

--- Comment #5 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #4)
> Actually, looking at value-range.h, irange_bitmask doesn't have just the
> mask but also value, so I wonder why it isn't able to figure this out.
> I'd think m_mask should be ~0x and value 111.

_1 = (short int) num_5(D);
_2 = (int) _1;
switch (_1)

globally we know 
_2 : [irange] int [-32768, 32767]
and coming into bb 3:
_2 : [irange] int [-32768, 32767]
2->3  _1 :  [irange] short int [111, 111]
2->3  _2 :  [irange] int [111, 111]
2->3  num_5(D) :[irange] int [-INF, -65537][-65425, -65425][111,
111][65536, +INF]

I guess what you are looking for is to add known bits to the range produced for
num_5 on the outgoing edge.

This would have to be done in operator_cast::op1_range  where we are reducing
it to known range of the switch.  I do not believe it makes any attempts to set
bitmasks based on casts like that currently.

so, GORI working backwards has _1 = [irange] short int [111, 111] , so it would
be satisfying the expression:
  short int [111, 111] = (short int) num_5(D);

when evaluating num_5 in operator_cast::op1_range, in the truncating_cast_p
section we could also see if there is a bitmask pattern for the LHS that could
be applied to the range..  I think that basically what you are asking for.  
Simple enough for a constant I suppose for starters.   Or maybe Aldy has a
routine that picks bitmasks and values from ranges somewhere?

[Bug tree-optimization/114331] Missed optimization: indicate knownbits from dominating condition switch(trunc(a))

2024-03-14 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331

--- Comment #3 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #2)
> I thought we don't, if the testcase starts with
> void dummy (void);
> short int foo (void);
> int src(void) {
> short int num = foo ();
> switch(num){
> instead then evrp1 optimizes away the BIT_AND_EXPR.
> Though, if the ranger just has a known zero mask in addition to normal
> ranges, maybe it can't handle it, because the range of num with (short)num
> == 111 is hard to express
> (it would be a union of 65536 ranges).
> Maybe bit-ccp should be able to handle that though, this case is exactly that
> upper 16 bits of the value are uknown and lower 16 bits of the value are all
> known,
> some of them 0, some of them 1.
> But maybe bit-ccp doesn't handle switches.

We certainly do fully handle switches.

The problem is the case is on a cast and then  we use num in its fullsize in
the expression.  So its all the other bits:

num_5(D)[irange] int [-INF, -65537][-65425, -65425][111, 111][65536,
+INF]
 :
  :
_8 = num_5(D) & 65534;
goto ; [INV]
_8 : [irange] int [0, 65534] MASK 0xfffe VALUE 0x0


Given it starts with:
_1 = (short int) num_5(D);
_2 = (int) _1;
switch (_1)  [INV], case 111:  [INV], case 204: 
[INV], case 263:  [INV], case 267:  [INV]>

Its difficult to recognize that the known bits of 'num_5' match each switch
case.

[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193

2024-03-13 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151

--- Comment #23 from Andrew Macleod  ---
Created attachment 57686
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57686&action=edit
another patch



(In reply to Richard Biener from comment #22)
> (In reply to Andrew Macleod from comment #21)

> > 
> > And have that all work with general trees expressions.. That would solve
> > much of this for you?
> 
> Yes, I wouldn't mind if range_on_{entry,exit} handle general tree
> expressions,
> there's enough APIs to be confused with already ;)
> 
> > 

I promoted range_on_exit and range_on_entry to be part of the API in this
patch. This brings valeu_query in line with rangers basic 5 routine API.   It
also tweaks rangers versions to handle tree expressions.  It bootstraps and
shows no regressions, with the caveat that I haven't actually tested the usage
of range_on_entry and exit with arbitrary trees.   As you can see, I didnt
change much... so it should work OK.

> > 
> > 
> > > 
> > > Interestingly enough we somehow still need the
> > > 
> > 
> > > 
> > > hunk of Andrews patch to do it :/
> > > 
> > 
> > That probably means there is another call somewhere in the chain with no
> > context. However, I will say that functionality is more important than it
> > seems. Should have been there from the start :-P.
> 
> Possibly yes.  It might be we fill rangers cache with VARYING and when
> we re-do the query as a dependent one but with context we don't recompute
> it?  I also only patched up a single place in SCEV with the context so
> I possibly missed some others that end up with a range query, for example
> through niter analysis that might be triggered.


My guess is the latter. Without a context and with that change, ranger
evaluates the definition with the context at the location of the def, then
simply uses that value.  If anything it is dependent on later changes, the
temporal cache should indicate it's out of date and trigger a new fold using
current values.

[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193

2024-03-12 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151

--- Comment #21 from Andrew Macleod  ---
(In reply to Richard Biener from comment #19)

> 
> While ranger has a range_on_exit API this doesn't work on GENERIC expressions
> as far as I can see but only SSA names but I guess that could be "fixed"
> given range_on_exit also looks at the last stmt and eventually defers to
> range_of_expr (or range_on_entry), but possibly get_tree_range needs
> variants for on_entry/on_exit (it doesn't seem to use it's 'stmt' context
> very consistently, notably not for SSA_NAMEs ...).

That would appear to be an oversight. That API has not been used very much for
arbitrary generic trees.  I think the original reason support for tree
expressions was added was a "try this" for some other PR. It was simple to do
so we lef tit in, but it never got any real traction.  At least as far as I can
recall :-)

Currently, I think mosrt, if not all, uses of get_tree_range() are either
!gimple_ssa_range_p() (commonly constants or unsupported types) or ssa_names on
abnormal edges. 

For abnormal edges, we ought to be getting the global range directly these days
instad of calling that routine.   Then in get_tree_range (), we ought to be
calling range_of_expr for SSA_NAMES with the provided context.  I'll poke at
that too. The support for general tree expressions changed the original intent
of the function, and it should be adjusted. 

As for the on-exit/on-entry bits... we haven't had a need for entry/exit
outside of ranger in the past.  I had toyed with exporting those routines and
making them a part of the official API for value-query, but hadn't run across
the need as yet.

Let me think about that for a minute. It can certainly be done. I guess we
really only need an on-entry and on-exit version of range_of_expr to do
everything.  So if we end up with something like:  
  range_of_expr (r, expr, stmt)
  range_of_expr_on_entry  (r, expr, bb)
  range_of_expr_on_exit (r, expr, bb)

And have that all work with general trees expressions.. That would solve much
of this for you?



> 
> Interestingly enough we somehow still need the
> 

> 
> hunk of Andrews patch to do it :/
> 

That probably means there is another call somewhere in the chain with no
context. However, I will say that functionality is more important than it
seems. Should have been there from the start :-P.

[Bug middle-end/96564] [11/12/13/14 Regression] New maybe use of uninitialized variable warning since r11-959

2024-03-11 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96564

--- Comment #15 from Andrew Macleod  ---
(In reply to Richard Biener from comment #13)
> (In reply to Jeffrey A. Law from comment #12)
> > So I think we could solve this with a bit of help from the alias oracle.  We
> > have  the routine ptrs_compare_unequal, but points-to-null is going to get
> > in the way.
> > 
> > I think VRP and DOM have enough information to rule out NULL for both
> > objects in question.  So if we could query the points-to information,
> > ignoring NULL then we could likely solve this particular bug.
> > 
> > Essentially VRP or DOM would prove NULL isn't in the set of possible values
> > at the comparison point.  Then we query the alias information ignoring NULL.
> > Voila we compute a static result for the comparison of the two pointers and
> > the problematical block becomes unreachable and the bogus warning goes away.
> > 
> > Richi, any thoughts in viability of such an API?
> 
> We now treat pt.null conservatively and track non-null-ness derived from
> range-info in it.  That means when VRP/DOM can prove a pointer is always
> not NULL they can do set_ptr_nonnull (p) on it.
> 
> This means the
> 
>   /* ???  We'd like to handle ptr1 != NULL and ptr1 != ptr2
>  but those require pt.null to be conservatively correct.  */
> 
> is no longer true and we could finally implement it, like with
> 
> diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
> index e7c1c1aa624..5b6d9e0aa6a 100644
> --- a/gcc/tree-ssa-alias.cc
> +++ b/gcc/tree-ssa-alias.cc
> @@ -479,9 +479,25 @@ ptrs_compare_unequal (tree ptr1, tree ptr2)
> }
>return !pt_solution_includes (&pi->pt, obj1);
>  }
> -
> -  /* ???  We'd like to handle ptr1 != NULL and ptr1 != ptr2
> - but those require pt.null to be conservatively correct.  */
> +  else if (TREE_CODE (ptr1) == SSA_NAME)
> +{
> +  struct ptr_info_def *pi1 = SSA_NAME_PTR_INFO (ptr1);
> +  if (!pi1
> + || pi1->pt.vars_contains_restrict
> + || pi1->pt.vars_contains_interposable)
> +   return false;
> +  if (integer_zerop (ptr2) && !pi1->pt.null)
> +   return true;
> +  if (TREE_CODE (ptr2) == SSA_NAME)
> +   {
> + struct ptr_info_def *pi2 = SSA_NAME_PTR_INFO (ptr2);
> + if (!pi2
> + || pi2->pt.vars_contains_restrict
> + || pi2->pt.vars_contains_interposable)
> + if (!pi1->pt.null || !pi2->pt.null)
> +   return !pt_solutions_intersect (&pi1->pt, &pi2->pt);
> +   }
> +}
>  
>return false;
>  }
> 
> 
> but the testcase shows the non-null-ness is only conditional which means
> we'd have to use a range query above which necessarily falls back to
> the global ranges given we don't have any context available here.  The
> old EVRP adjusted global ranges during the walk but this is no longer done.
> 
You mean it lied?  because x_1 is not NULL until after _8 = *x_1(D); executes. 
It can still be NULL on that stmt can it not?   Did it reset the global value
afterwards?

Contextually ranger knows both are non-null at EVRP time:
a.0_27 : [irange] int[0:D.] * [1, +INF]
2->3  (T) x_1(D) : [irange] int * [1, +INF]
2->3  (T) a.0_27 :  [irange] int[0:D.] * [1, +INF]
2->4  (F) x_1(D) : [irange] int * [1, +INF]
2->4  (F) a.0_27 :  [irange] int[0:D.] * [1, +INF]

So we know x_1 is non-NULL after the de-reference for the rest of the block
(and function).  It also sets a.0_27 globally to be [1, +INF].


> Note it's enough that one pointer is nonnull, so for your idea the
> API could be extended with a bool one_ptr_nonnull parameter.

ranger currently sets a.0 globally to be non-null in EVRP.

[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193

2024-03-07 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151

--- Comment #15 from Andrew Macleod  ---
(In reply to Richard Biener from comment #14)
> (In reply to Andrew Macleod from comment #13)

> > 
> > We would have tripped over this earlier if SCEV or one of those other places
> > using range_of_expr without context had instead invoked range_of_stmt.  That
> > would have been perfectly reasonable, and would have resulting in these same
> > issues.  We never tripped over it because range_of_stmt is not used much
> > outside of ranger.  That is the primary reason I wanted to track this down. 
> > There were alternative paths to the same end result that would have
> > triggered these issues.
> 
> It sounds like this part is a bugfix?

Technically yes.  When I commit it to trunk, I would split this into 3 patches. 

1 - The change to get_global_range to not process PHIS.
This is only a bug that shows up if range_of_stmt is called from within SCEV.

2 - Make cache propagation re-entrant
This is also only currently needed if SCEV invokes range_of_stmt

3 - call range_of_stmt from range_of_expr when context-less.
This patch causes SCEV to call range_of_stmt indirectly through range_of_expr.


I don't think the buglet would show up in the current release simply because
SCEV never calls ranger with a context, and thus range_of_stmt never gets
invoked by it.

It might be worthwhile to put the patchset (or at least the first 2?) in the
first point release after they've been committed to trunk in case something
else wants to be backported which trips over it.

[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193

2024-03-06 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151

--- Comment #13 from Andrew Macleod  ---
Created attachment 57638
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57638&action=edit
patch

Ok, there were 2 issues with simply invoking range_of_stmt, which this new
patch resolves.  IF we aren't looking to fix this in GCC 14 right now anyway,
this is the way to go.

1) The cache has always tried to provide a global range by pre-folding a stmt
for an estimate using global values.  This is a bad idea for PHIs when SCEV is
invoked AND SCEV is calling ranger. This changes it to not pre-evaluate PHIs,
which also saves time when functions have a lot of edges. Its mostly pointless
for PHIs anyway since we're about to do a real evaluation.

2) The cache's entry range propagator was not re-entrant.  We didn't previously
need this, but with SCEV (and possible other place) invoking range_of_expr
without context and having range_of_stmt being called, we can occasionally get
layered calls for cache filling (of different ssa-names) 

With those 2 changes, we can now safely invoke range_of_stmt from a contextless
range_of_expr call.

We would have tripped over this earlier if SCEV or one of those other places
using range_of_expr without context had instead invoked range_of_stmt.  That
would have been perfectly reasonable, and would have resulting in these same
issues.  We never tripped over it because range_of_stmt is not used much
outside of ranger.  That is the primary reason I wanted to track this down. 
There were alternative paths to the same end result that would have triggered
these issues.

Give this patch a try. it also bootstraps with no regressions.  I will queue it
up for stage 1 instead assuming all is good.

[Bug tree-optimization/110199] [12/13/14 Regression] Missing VRP transformation with MIN_EXPR and known relation

2024-03-06 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110199

--- Comment #5 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #4)
> Just looking at the generated code of #c0 with -O2 on x86_64, this regressed
> with
> r13-3596-ge7310e24b1c0ca67b1bb507c1330b2bf39e59e32
> Andrew, are you going to address this for GCC 14, or defer to GCC 15?

Id prefer to defer it I think. Although we can run that thru the testings if
anyone really wants it. 

Maybe in GCC 15 someone can add relations in general to simplifications

[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193

2024-03-06 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151

--- Comment #12 from Andrew Macleod  ---
(In reply to Richard Biener from comment #11)
> (In reply to Richard Biener from comment #10)
> > (In reply to Andrew Macleod from comment #9)
> > > Created attachment 57620 [details]
> > > proposed patch
> > > 
> > > Does this solve your problem if there is an active ranger?  it bootstraps
> > > with no regressions
> > 
> > I'll check what it does.
> 

> 
> So the important part is that it got the fact that _12 is positive.  As
> analyzed in earlier comments I think that's all we can do, we don't know
> anything about the other variable involved and thus can't avoid the
> unsigned punning during SCEV analysis.

Yeah, I wouldn't want to invoke any dynamic lookup changes at this point. that
would be too hard to predict or contain.I will continue poking at what is
triggering the loop issues because I think its a good longer term solution to
have range_of_expr with no context to invoke range_of_stmt if the DEF is in the
IL and has not been processed. 

> 
> I think it's a good change, let's keep it queued for stage1 at this point
> unless we really know a case it helps to avoid a regression with
> r14-9193-ga0b1798042d033
> 
> For testing, what's the "easiest" pass/thing to do to recompute global
> ranges now?  In the past I'd schedule EVRP but is there now a ranger
> API to do this?  Just to see if full global range compute before IVOPTs
> would help.

all VRP passes are the same now. so just schedule EVRP.   in theory, you could
schedule the fast vrp pass I added, but its not heavily tested... but you could
try it.  It doesnt do any back edges or switches (iirc), but does basic
calculations in DOM order and exports/updates globals.

NEXT_PASS (pass_fast_vrp)

[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193

2024-03-05 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151

--- Comment #9 from Andrew Macleod  ---
Created attachment 57620
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57620&action=edit
proposed patch

Does this solve your problem if there is an active ranger?  it bootstraps with
no regressions

ITs pretty minimal, and basically we invokes the cache's version of
range_of_expr if there is no context.   I tweaked it such that if there is no
context, and the def has not been calculated yet, it calls range_of_def, and
combines it with any SSA_NAME_RANGE_INFO that may have pre-existed.  All
without invoking any new lookups.

This seems relatively harmless and does not spawn new dynamic lookups.   As
long as it resolves your issue...   If it still does not work, and we require
the def to actually be evaluated, I will look into that. we prpbably should do
that anyway.  There appears to be a cycle when this is invoked from the loop
analysis, probably because folding of PHIs uses loop info... and back and forth
we go.I'd probably need to add a flag to the ranger instantiation to tell
it to avoid using loop info.

Are we looking to fix this in this release?

[Bug tree-optimization/113632] Range info for a^CSTP2-1 could be improved in some cases

2024-03-04 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113632

Andrew Macleod  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com

--- Comment #1 from Andrew Macleod  ---
(In reply to Andrew Pinski from comment #0)
> Take:
> ```
> void dummy();
> _Bool f(unsigned long a)
> {
> _Bool cmp = a > 8192;
> if (cmp) goto then; else goto e;
> then:
> unsigned long t = __builtin_clzl(a); // [0,50] 
> t^=63; // [13,63]
> return t >= 13;
> e:
>   dummy();
>   return 0;
> }
> ```
> 
> Currently after the t^=63; we get:
> ```
>   # RANGE [irange] int [1, 63] MASK 0x3f VALUE 0x0
>   _7 = _1 ^ 63;
> ```
> 
> But this could/should be improved to [13,63].
> 
> If we change to using minus instead:
> ```
> t = 63 - t;
> ```
> 
> We get the better range and the comparison (t >= 13) is optimized away.
> ```
> Folding statement: t_10 = 63 - t_9;
> Global Exported: t_10 = [irange] long unsigned int [13, 63] MASK 0x3f VALUE
> 0x0
> Not folded
> ```
> 
> Yes this should up in real code, see the LLVM issue for more information on
> that.

I think the current implementation of "operator_bitwise_xor::wi_fold ()" in
range-op.cc  was simply ported from the original version we used in the old VRP
code.  so it is neither multi-range awre, nor been enhanced.

If you put a break point there, you'll see its getting:

(gdb) p lh_lb.dump()
[0], precision = 32
$1 = void
(gdb) p lh_ub.dump()
[0x32], precision = 32
$2 = void
(gdb) p rh_ub.dump()
[0x3f], precision = 32
$3 = void
(gdb) p rh_lb.dump()
[0x3f], precision = 32
$4 = void

One could conceivable do something much better than the general masking stuff
that goes on if rh_lb == rh_ub.  I suspect we could probably do a better job in
general, but have never looked at it.

It also looks like we make some minor attempts with signed values in
wi_optimize_signed_bitwise_op (),   but again, I do not think anyone has tried
to make this code do anything new yet.

[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193

2024-03-01 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151

--- Comment #7 from Andrew Macleod  ---
(In reply to Richard Biener from comment #6)
> (In reply to Andrew Macleod from comment #5)
> > (In reply to rguent...@suse.de from comment #4)
> > 
> > > 
> > > What was definitely missing is consideration of POLY_INT_CSTs (and
> > > variable polys, as I think there's no range info for those).
> > > 
> > Ranger doesn't do anything with POLY_INTs, mostly because I didn't
> > understand them.  
> > 
> > > We do eventually want to improve how ranger behaves here.  I'm not sure
> > > why when we do not provide a context 'stmt' it can't see to compute
> > > a range valid at the SSA names point of definition?  (so basically
> > > compute the global range)
> > 
> > The call looks like it doesn't provide the stmt.  Without the stmt, all
> > ranger will ever provide is global ranges.
> > 
> > I think you are asking why, If there is no global range, it doesn't try to
> > compute one from the ssa_name_def_stmt?  Ranger does when it is active.  
> 
> I tried with an active ranger but that doesn't make a difference.  Basically
> I added enable_ranger () / disable_ranger () around the pass and thought
> that would "activate" it.  But looking at range_for_expr I don't see how
> that would make a difference without a provided stmt.
> 

It wouldn't. why isn't a context stmt being provided?

range_of_expr with no context stmt makes no attempt to calculate anything. This
is because one can get into a lot of trouble as it doesn't know whether the
expression you are calculating is even in the IL or just some detached tree
expression.

If you have an SSA NAME and want to actually calculate the value, you can use
range_of_stmt (range, SSA_NAME_DEF_STMT (name))  instead of range_of_expr ().

If you pass in a stmt as context, and the SSA_NAME you are asking about is the
LHS of the stmt, then range_of_expr will call range_of_stmt itself... but
again, it needs a context stmt in order to know its safe to do so.

In general, range_of_expr with no context will not calculate anything... When a
stmt for location context is provided, then its free to go an do whatever
calculations are required.

[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since g:a0b1798042d033fd2cc2c806afbb77875dd2909b

2024-02-29 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151

--- Comment #5 from Andrew Macleod  ---
(In reply to rguent...@suse.de from comment #4)

> 
> What was definitely missing is consideration of POLY_INT_CSTs (and
> variable polys, as I think there's no range info for those).
> 
Ranger doesn't do anything with POLY_INTs, mostly because I didn't understand
them.  

> We do eventually want to improve how ranger behaves here.  I'm not sure
> why when we do not provide a context 'stmt' it can't see to compute
> a range valid at the SSA names point of definition?  (so basically
> compute the global range)

The call looks like it doesn't provide the stmt.  Without the stmt, all ranger
will ever provide is global ranges.

I think you are asking why, If there is no global range, it doesn't try to
compute one from the ssa_name_def_stmt?  Ranger does when it is active.  
Otherwise it simply picks up the global information from SSA_NAME_RANGE_INFO() 
  Of course, if its a poly int, we don't actually support those... so all you
will ever see is VARYING.  Again, because I don't understand them.


> Maybe there's another API to do that.  But I thought it was convenient
> to use range_of_expr as that also handles GENERIC expression trees
> to some extent and those are common within SCEV.

A range can always be calculated by simply calling fold_range from
gimple-range-fold.h:

// Fold stmt S into range R using range query Q.
bool fold_range (vrange &r, gimple *s, range_query *q = NULL);

if range_query is not provided, it'll simply use the current one.   If you want
to ensure its only global ranges, you call it with

fold_range (r, SSA_NAME_DEF_STMT (name), get_global_range_query ());

[Bug tree-optimization/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

2024-02-26 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

--- Comment #9 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #8)
> Unfortunately doing the ((682 >> x) & 1) to x & 1 optimization in match.pd
> isn't possible, we can only use global ranges there and we need path
> specific range here.
> Can it be done in VRP pass?  Though, I'm afraid I'm quite lost where it
> actually has
> the statement optimizations (rather than mere computing of ranges),
> Aldy/Andrew, any hints?  I mean like what old tree-vrp.c was doing in
> simplify_stmt_using_ranges.

I don't think much has changed there... We still call into all the code in
vr-values.cc to do simplifications.  I think Aldy changed it all to be
contained in class 'simplify_using_ranges'.. but those routines are all still
in vr-values.cc.   tree-vrp calls into the top level simplfy() routine.

  bool fold_stmt (gimple_stmt_iterator *gsi) override
  {
bool ret = m_simplifier.simplify (gsi);
if (!ret)
  ret = ::fold_stmt (gsi, follow_single_use_edges);
return ret;
  }


If that fails, then rangers fold_stmt() is invoked.  That is merely a
contextual wrapper around a call to gimple-fold::fold_stmt to see if normal
folding can find anything.  Under the covers I believe that invokes match.pd
which, if it was using the current range_query, would get contextual info.



> Guess we could duplicate that in match.pd for the case which can use global
> range or
> doesn't need any range at all.
> I mean
> unsigned int
> foo (int x)
> {
>   return (0xU >> x) & 1;
> }
> 
> unsigned int
> bar (int x)
> {
>   return (0xU >> x) & 1;
> }
> 
> unsigned int
> baz (int x)
> {
>   if (x >= 22) __builtin_unreachable ();
>   return (0x5aU >> x) & 1;
> }
> can be optimized even with global ranges (or the first one with no ranges).
> foo and baz equivalent is x & 1, while bar is (~x) & 1 or (x & 1) ^ 1, dunno
> what is more canonical.

[Bug middle-end/113907] [12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-02-16 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907

--- Comment #47 from Andrew Macleod  ---
(In reply to Andrew Macleod from comment #46)
> (In reply to Jan Hubicka from comment #43)
> > > // See discussion here:
> > > // https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571709.html
> > Discussion says:
> > 
> > "Once legacy evrp is removed, this won't be an issue, as ranges in the IL 
> > will tell the truth.  However, this will mean that we will no longer 
> > remove the first __builtin_unreachable combo.  But ISTM, that would be 
> > correct behavior ??."
> > 
> > So perhaps, we could remove that special case for default def and phi?
> > It is an odd thing and we clearly lose info here.
> > 
> 
> legacy VRP has been removed now.  So in theory we are free to do as we
> want.. but I don't remember the specific details.
> 
> So do you just want to always use get_range_global() ?  and not do the check?
> 
> I can try changing it to just get_global  and see what happens.

FWIW,
diff --git a/gcc/value-query.cc b/gcc/value-query.cc
index 040c843c566..0ab10bc5a46 100644
--- a/gcc/value-query.cc
+++ b/gcc/value-query.cc
@@ -353,16 +353,9 @@ get_range_global (vrange &r, tree name, struct function
*fun = cfun)
 void
 gimple_range_global (vrange &r, tree name, struct function *fun)
 {
-  tree type = TREE_TYPE (name);
   gcc_checking_assert (TREE_CODE (name) == SSA_NAME);

-  if (SSA_NAME_IS_DEFAULT_DEF (name) || (fun && fun->after_inlining)
-  || is_a (SSA_NAME_DEF_STMT (name)))
-{
-  get_range_global (r, name, fun);
-  return;
-}
-  r.set_varying (type);
+  get_range_global (r, name, fun);
 }

bootstraps and runs the testsuites clean on x86-64 now...

[Bug middle-end/113907] [12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-02-16 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907

--- Comment #46 from Andrew Macleod  ---
(In reply to Jan Hubicka from comment #43)
> > // See discussion here:
> > // https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571709.html
> Discussion says:
> 
> "Once legacy evrp is removed, this won't be an issue, as ranges in the IL 
> will tell the truth.  However, this will mean that we will no longer 
> remove the first __builtin_unreachable combo.  But ISTM, that would be 
> correct behavior ??."
> 
> So perhaps, we could remove that special case for default def and phi?
> It is an odd thing and we clearly lose info here.
> 

legacy VRP has been removed now.  So in theory we are free to do as we want..
but I don't remember the specific details.

So do you just want to always use get_range_global() ?  and not do the check?

I can try changing it to just get_global  and see what happens.

[Bug tree-optimization/113879] missed optimization - not exploiting known range of integers

2024-02-12 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113879

--- Comment #2 from Andrew Macleod  ---
Not so much a cycle issue as a backward propagation issue.

  :
  goto ; [INV]

   :
  _1 = (long unsigned int) j_10;
<..>
  if (j_10 >= -1)
goto ; [INV]
  else
goto ; [INV]

   :
  __builtin_trap ();

   :
  j_18 = j_10 + 1;

   :
  # j_10 = PHI 
  _9 = i_12(D) + 3;
  if (_9 >= j_10)
goto ; [INV]
  else
goto ; [INV]

   :
  return;

'i' is only ever referenced twice.  Very first thing on the edge from 2->6 as
the initial value for j_10, and then in the calculation of _9 which is then
used in the branch against j_10.

That initial value of i_12 we have no way of knowing can't be INT_MAX.  Its
only later during the calculation _9 = i_12 + 3 that we can infer that i_12
must be INT_MAX-3. 

We certainly know after the branch that 
6->3  (T) i_12(D) : [irange] int [-INF, 2147483644]

Going back to examine the initial value use on the edge 2->6 is not something
that we would normally expect to have to do.  Its similar to the
__builtin_unreachable() problem in that we can infer the global range of i_12
based on that addition statement, but detecting that is the case in general
circumstance is not trivial as we have to go look at all earlier uses to make
sure they are post dominated by the statement.   

There is also the additional option that I do no believe we currently register
an inferred range on the statement 
  _9 = i_12(D) + 3;
for i_12.   Adding an inferred range for every arithmetic statement would come
with a cost.. not sure exactly what it would be, but we weren't expecting
inferred ranges from the majority of statements. we don't normally need that
because as you can see, we get the ranges right after the branches anyway.  I
could do a dry run and see what the time differential is. 


We could consider adding those inferred ranges at -O3.  we could also consider
an enhancement that works like the builtin_unreachable() removal pass, but
looks at all inferred ranges in the function as well to see if they are
applicable in a global context and adjusts the global value if appropriate. 
Which they would be in a case like this.

[Bug middle-end/102580] Failure to optimize signed division to unsigned division when dividend can't be negative

2024-02-08 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102580

--- Comment #6 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #5)
> To be precise, expand_expr_divmod uses get_range_pos_neg for that during
> expansion (unless we'd e.g. somehow noted it in some very late pass before
> expansion that the division ought to be expanded both ways and cost
> compared), and get_range_pos_neg uses the global range of SSA_NAME only.  In
> order to optimize #c0 we'd need to query range with the use stmt and
> enabling ranger in the pass (that is possible in some pass before expansion,
> but doing it during expansion (which would be useful to other spots too, say
> .*_OVERFLOW expansion) would need to deal with some basic blocks already
> converted into RTL and others still at GIMPLE).

Im working on a logging mechanism for ranges for GCC 15. Its possible that a
side effect of this work could make some selective contextual ranges available
from the global table after .. in which case we could get things like this as
if ranger was running.

My other question. so is the issue that unsigned divides are cheaper than
signed divides?

the global range of _2 is set:
_2  : [irange] int [0, 715827882]

Given the statements in .optimized are: 

   [local count: 1073741824]:
  _2 = x_1(D) / 3;
  return _2;

could we not actually conclude that the divide can be unsigned based on the
result being positive and the divisor being positive?

We have "simple" versions of fold_range() which would calculate _2 on the
statement from the global value of x_1, but there aren't any simple versions of
the operand calculators.  It would be fairly trivial to provide one which,
given the global value for _2, you could ask
  op1_range (stmt)
and get back a value of [0, +INF] for x_1 at that statement.   If that would
help... 
THey are going to be added for GCC 15 anyway...

[Bug tree-optimization/113735] ICE: in operator[], at vec.h:910 with _BitInt() at -O and above

2024-02-05 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113735

--- Comment #3 from Andrew Macleod  ---
Seems reasonable to me, adding BBS should be fine throughout. just don't
re-order them :-)

[Bug tree-optimization/113716] Missed optimization: (2/a)b(!b) => 0

2024-02-02 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113716

--- Comment #4 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #3)
> Maybe ranger could figure out that at least one of the multiplication
> operands is zero in this case, because the second one is non-zero only if
> the first one is zero?

  _2 = b_6(D) == 0;
  _3 = (int) _2;
  _4 = _3 * b_6(D);

well, its kind of sort of predicate based.  I mean, the information is
calculable if one thought it was worth it. ie at the multiply, we know _3 is
[0, 1]   and we know b_6 is VARYING
because there are only 2 ranges in _3, you could calculate using GORI "if _3 is
[0,0], what is b_6?" (~[0, 0]) and "if _3 is [1,1], whats b_6?" [0,0]. do the 2
multiplies and union the results.. you'd get [0,0]


We might need a slight API tweak to be able to ask that.   we have fold_stmt
routines for the folding, but I dont think I have added any "what if" hooks to
GORI yet.

[Bug tree-optimization/113475] [14 Regression] phi_analyzer::m_tab contents leak

2024-01-18 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113475

--- Comment #4 from Andrew Macleod  ---
yoinks.   Not sure how I missed that.  thanks.

[Bug tree-optimization/113440] Missed optimization for redundancy computation elimination because of missed judgment for unsigned overflow

2024-01-17 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113440

--- Comment #2 from Andrew Macleod  ---
(In reply to Richard Biener from comment #1)
> Yeah, looks like
> 
>  if (a+a < a)
> 
> for unsigned doesn't register the appropriate range on the false edge.

 _1 = a_5(D) * 2;
  if (_1 < a_5(D))
goto ; [INV]
  else
goto ; [INV]

   :
Relational : (_1 >= a_5(D))
  if (a_5(D) == 0)
goto ; [INV]
  else
goto ; [INV]

_1  [irange] unsigned int [0, 0][2, +INF] MASK 0xfffe VALUE 0x0
a_5(D)  [irange] unsigned int [1, +INF]
 :
_3 = _1 / a_5(D);
n = _3;



I think the ranges as such are fine, but the missing bit is that we KNOW _1 >=
a_5, but we do not use that information.

1) Ranger doesn't fullt leverage relations everywhere yet, in particaulr
multiply makes no attempt to use them or the recomputation _1 would be [2,
+INF] in bb5  (Since a_5 is now [1, +INF])
2) ranger could then utilize that in the divide to set the range of _3 to [1,
+INF] (which we don't do currently).   

But neither of those are the real issue. 

3) It requires a simplification type operation to look at _1 and understand
that the divide is a_5 * 2 / a_5,   with the known relation that a_5 * 2 >=
a_5. This would be fairly trivial with the relation oracle if we can wire
it into one of the simplification routines.

For example, If I look in
simplify_using_ranges::simplify_div_or_mod_using_ranges
when we process the _3 = _1 / a_5(D) statement:

p query->query_relation (stmt, op0, op1)
VREL_GE

so we know...  but we do not look at _1 to see that it is defined as op1 * 2.

I don't know if we can push that sort of thing into the more general match.pd
code... it seems like it would be useful there too.

It is something on the list to eventually look into.

[Bug other/110205] Some new warnings from clang for the range code

2024-01-16 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110205

Andrew Macleod  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #6 from Andrew Macleod  ---
fixed.

[Bug tree-optimization/110251] [13/14 Regression] Hang at -O3 on x86_64-linux-gnu

2024-01-16 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110251

Andrew Macleod  changed:

   What|Removed |Added

 Resolution|FIXED   |---
 Status|RESOLVED|REOPENED

--- Comment #14 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #13)
> r14-2097-g4dfeb1cd8dfca234186216d891ec8f46235c3a14
> was a trunk commit, was that backported to 13 branch too (or was it fixed
> there some other way)?
> In any case, guess we should include the testcase into the testsuite to make
> sure it doesn't reappear.

hmm. no, I Missed that this was gcc 13 too.

[Bug tree-optimization/110251] [13/14 Regression] Hang at -O3 on x86_64-linux-gnu

2024-01-16 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110251

Andrew Macleod  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from Andrew Macleod  ---
fixed via commit 4dfeb1cd8dfca234186216d891ec8f46235c3a14

[Bug tree-optimization/110251] [13/14 Regression] Hang at -O3 on x86_64-linux-gnu

2024-01-16 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110251

--- Comment #11 from Andrew Macleod  ---
(In reply to Andrew Pinski from comment #10)
> (In reply to Anonymous from comment #9)
> > (In reply to Andrew Pinski from comment #1)
> > > dom3 :
> > > ```
> > 
> > Could you please explain on how you to record this trace? Is there any
> > specific compilation option to enable this information? Thanks.
> 
> I used gdb to figure it out. Just stopped it a few times during the
> compiling and saw the backtrace was similar/stuck.

this is resolved is it not?

[Bug tree-optimization/113301] [12/13/14 Regression] Missed optimization: (1/(x+1))/2 => 0 since gcc-12

2024-01-10 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113301

--- Comment #6 from Andrew Macleod  ---
(In reply to Andrew Macleod from comment #5)
> (In reply to Jakub Jelinek from comment #4)
> > Even then, I wonder why ranger doesn't figure this out.
> > (x+1u) <= 2 ? x : 0
> > must have a range [-1, 1] and [-1, 1] / [2, 2] range should be [0, 0], no?
> 
> its because there is no branch which is what drives ranger. At this point,
> we aren't quite smart enough to completely evaluate the 2 operands of a
> conditional as if they were actual  branches.
> ie
> _1 = x_4(D) + 1;
> _10 = (unsigned int) x_4(D);
> _6 = _10 + 2;
> _7 = _6 <= 2;
> _2 = _7 ? _1 : 0;
> 
> if that were:
>   if (_6 <= 2)
> _2 = _1
> we'd recalculate _1 with the condition being (_6 <= 2) and we come upwith
> [-1, 1] for _1 instead of varying.
> 
> I'll have to look at whats involved in enhancing the fold code to invoke
> GORI to reevaluate _1 if _7 is [1,1].   in theory is not too difficult... :-)

ah, its more complicated than that. we normally do this evaluation, but the
cond_expr is using _1.. if you trace back from _6 in the condition, _1 is not
in the dependency chain anywhere, so GORi cannot compute anything for it.  it
can compute that x_4 is [-2, 0]  but it doesnt see any connection between _6 in
the condition and _1.

the remaining question is whether this can be cheaply identified as a
recomputation.. in which case we could recompute _1 usin the [-2, 0] for x_4
and come up with [-1, 1] 

I'll have a look if we can easily invoke hte recompuation code the edges
evaluations use or nor

[Bug tree-optimization/113301] [12/13/14 Regression] Missed optimization: (1/(x+1))/2 => 0 since gcc-12

2024-01-10 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113301

Andrew Macleod  changed:

   What|Removed |Added

 CC||amacleod at redhat dot com

--- Comment #5 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #4)
> Even then, I wonder why ranger doesn't figure this out.
> (x+1u) <= 2 ? x : 0
> must have a range [-1, 1] and [-1, 1] / [2, 2] range should be [0, 0], no?

its because there is no branch which is what drives ranger. At this point, we
aren't quite smart enough to completely evaluate the 2 operands of a
conditional as if they were actual  branches.
ie
_1 = x_4(D) + 1;
_10 = (unsigned int) x_4(D);
_6 = _10 + 2;
_7 = _6 <= 2;
_2 = _7 ? _1 : 0;

if that were:
  if (_6 <= 2)
_2 = _1
we'd recalculate _1 with the condition being (_6 <= 2) and we come upwith [-1,
1] for _1 instead of varying.

I'll have to look at whats involved in enhancing the fold code to invoke GORI
to reevaluate _1 if _7 is [1,1].   in theory is not too difficult... :-)

[Bug tree-optimization/110199] [12/13/14 Regression] Missing VRP transformation with MIN_EXPR and known relation

2023-12-12 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110199

--- Comment #3 from Andrew Macleod  ---
(In reply to Andrew Pinski from comment #2)
> I Kinda see how to implement this by creating
> operator_min::fold_range/operator_max::fold_range but I am still new on
> using these interfaces so I am not 100% sure how to use them.

Actually, on ranger, we'd be able to make the range choice of the range of a_2
or b_3, but it can't rewrite the IL...  and since the range of both is varying,
fold_range would still return varying.  Unless we indicate there are relations.
 fodl_range itself only takes what it is given, so we have to query the
relations first. 

In theory all that is missing is to teach simplification about relation
queries. For instance, in simplify_using_ranges::fold_cond_with_ops, we are
invoking the range-op handler without any relations.. we query the ranges, but
not the relation. If we add something like this (and make sure both operands
are symbolic)

diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
index ecb294131b0..ad2c2d6c090 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -315,10 +315,17 @@ simplify_using_ranges::fold_cond_with_ops (enum tree_code
code,
   || !query->range_of_expr (r1, op1, s))
 return NULL_TREE;

+  relation_kind rel = VREL_VARYING;
+  if (gimple_range_ssa_p (op0) && gimple_range_ssa_p (op1))
+rel = query->query_relation (s, op0, op1);
+  // Create a trio with the relation set between op0 and op2 for folding.
+  // TRIOS are lhs-op0, lhs-op1, op0-op1 relations.
+  relation_trio trio (VREL_VARYING, VREL_VARYING, rel);
+
   tree type = TREE_TYPE (op0);
   int_range<1> res;
   range_op_handler handler (code);
-  if (handler && handler.fold_range (res, type, r0, r1))
+  if (handler && handler.fold_range (res, type, r0, r1, trio))
 {
   if (res == range_true (type))
return boolean_true_node;

This should do what you want I think...   fold_range should use the relation
passed in to determine that the condition is always true or false.

I have not fully tested this patch, fwiw.

[Bug tree-optimization/112843] during GIMPLE pass: bitintlower ICE: SIGSEGV in ranger_cache::range_of_expr (gimple-range-cache.cc:1204) with _BitInt() at -O1

2023-12-04 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112843

--- Comment #9 from Andrew Macleod  ---
Created attachment 56790
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56790&action=edit
auxially patch to avid the trap

refining a range is fine...  the only issue we are really running into here
that is causing the trap is that the "update" on the original LHS DEF is a new
stmt which hasnt been added to the IL yet, so ranger is trapping when it asks
for the basic block of the def stmt.. It had one before, it expects it to still
have one when it looks at the context of the stmt to update the range.

Now that said, the update issue is primarily within the cache.  if we are
updating and the BB isn't set, we could simply just pick up the global value. 
Patch is attached, and pre-approved if you want to use it.

[Bug tree-optimization/112843] during GIMPLE pass: bitintlower ICE: SIGSEGV in ranger_cache::range_of_expr (gimple-range-cache.cc:1204) with _BitInt() at -O1

2023-12-04 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112843

--- Comment #7 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #6)
> (In reply to Andrew Macleod from comment #5)
> > what do you mean?  when a statement is changed, it may generate a different
> > range than it did before,
> 
> No, that would be a bug.  If some IL changes need to extend existing range,
> then
> reset_flow_sensitive_info needs to be called for the SSA_NAME.

Not from rangers point of view.  You may change 
_1 = _2 + 5
to
_1 = _6 + 5  
and it finds it has a better understanding of the range of _6, and so it can
further refine the range for _1 based on that.So it merely checks to see if
the changes to the statement means it can understand it better.  In particular: 
  const char *lossage = _result ? "constant string" : 0;

  if (__builtin_expect (lossage != ((void *)0) , 0))
{
unsigned __message_length = __builtin_strlen (lossage);
if (! __builtin_constant_p (__message_length))
  dead ();
}

This provides an IL update which happens later in compilation when
builtin_expect gives us a rbetter result which gives us a much better range and
can then fold away the call to dead()


> 
> > so we re-evaluate the statement to see if the
> > result is different.  If it is, then we update it in the cache.
> > 
> > All its doing is re-calculating and updating the cache values.
> > 
> > It looks to me like the problem is the stmt is being added in a way that
> > leaves the IL in an illegal state,
> 
> But unfortunately that is pretty much unavoidable in changes like this.
> So, I have
>   lhs1 = some_stmt;
> and want to change it to
>   lhs2 = some_stmt;
>   lhs1 = (lhs_type) lhs2;
> (where lhs2 has a different type from lhs).
> The options to do that are either what the code does right now, i.e.
> first create lhs1 = (lhs_type) lhs2; stmt, add it after lhs1 = some_stmt;
> (including update_stmt), then gimple_set_lhs on the first stmt to lhs2, then
> update_stmt on the first stmt, but this is temporarily invalid IL, because
> two different stmts in the IL have the same lhs, or as changed in the patch
> gimple_set_lhs on the first stmt to lhs2 (but no update_stmt), create the
> second stmt, add it including update_stmt, then update_stmt on the first
> one; this is also invalid IL, the effects of update_stmt haven't been done
> until the second update_stmt; or gimple_set_lhs and update_stmt on the first
> one (but then lhs1 has no definition and we insert a stmt into the IL
> without the definition, so again temporarily invalid IL).

Im still confused tho.

So what I think you are doing is taking
lhs1 = some_stmt
and rewriting it to a different type, and it looks like looping thru the
operands so that some_stmt is consuming the right types too..  so the type of
some_stmt is changing from lhs1_type to lhs2_type.  adn then we create  a new
stmt for lhs1.

to avoid lying why dont we just create a new stmt  'g' before stmt 
lhs2 = some_stmt
with the operands converted and then when that is done, simply update stmt
lhs1 = (lhs1_type) lhs2

the the IL is never in an invalid state?

Otherwise if we are going to have invalid IL states, we have to turn off any
automatic updating...   you could possible just save all the update/sets until
the end.. that might work.The biggest issue would be that lhs1 was in the
IL so ranger expects the block to be set for its def stmt and it is being added
via a new stmt whic is not in the IL yet when the update happens... its
expects gimple_bb to be set for a DEF that it has seen.


Isn't the "normal" way to do this to leave
lhs1 = some_stmt
and then add
lhs2 = (lhs2_type) lhs1
and  follow immeidate uses and change all the occurences of lhs1 to lhs2?

[Bug tree-optimization/112843] during GIMPLE pass: bitintlower ICE: SIGSEGV in ranger_cache::range_of_expr (gimple-range-cache.cc:1204) with _BitInt() at -O1

2023-12-04 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112843

--- Comment #5 from Andrew Macleod  ---
(In reply to Richard Biener from comment #3)
> (In reply to Richard Biener from comment #2)
> > what?!  Ick.  It definitely shouldn't re-fold anything but only scrap caches
> > _at most_.
> 
> So it does
> 
>   // Only update if it already had a value.
>   if (m_cache.get_global_range (r, lhs))
> {
>   // Re-calculate a new value using just cache values.
>   Value_Range tmp (TREE_TYPE (lhs));
>   fold_using_range f;
>   fur_stmt src (s, &m_cache);
>   f.fold_stmt (tmp, s, src, lhs);
> 
>   // Combine the new value with the old value to check for a change.
>   if (r.intersect (tmp))
> {
>   if (dump_file && (dump_flags & TDF_DETAILS))
> {
>   print_generic_expr (dump_file, lhs, TDF_SLIM);
>   fprintf (dump_file, " : global value re-evaluated to ");
>   r.dump (dump_file);
>   fputc ('\n', dump_file);
> }
>   m_cache.set_global_range (lhs, r);
> 
> WTF?  If a pass invalidates a range it needs to properly do this itself.
> But update_stmt itself _never_ should alter range info.

what do you mean?  when a statement is changed, it may generate a different
range than it did before, so we re-evaluate the statement to see if the result
is different.  If it is, then we update it in the cache.

All its doing is re-calculating and updating the cache values.

It looks to me like the problem is the stmt is being added in a way that leaves
the IL in an illegal state,

tree lhs2 = make_ssa_name (type);
gimple *g = gimple_build_assign (lhs, NOP_EXPR, lhs2);
if (stmt_ends_bb_p (stmt))
  {
edge e = find_fallthru_edge (gsi_bb (gsi)->succs);
gsi_insert_on_edge_immediate (e, g);
  }
else
  gsi_insert_after (&gsi, g, GSI_SAME_STMT);
gimple_set_lhs (stmt, lhs2);

(gdb) p print_generic_expr (stderr, lhs, 0)
_1$3 = void
(gdb) p print_generic_expr (stderr, lhs2, 0)
_12$4 = void
(gdb) p print_gimple_stmt (stderr, stmt, 0, 0)
_1 = x_4(D) * 5;
$5 = void
(gdb) p print_gimple_stmt (stderr, g, 0, 0)
_1 = (_BitInt(128)) _12;
$6 = void

So we have 
_1 = x_4(D) * 5;
then we create 
_1 = (_BitInt(128)) _12;
And add it to the IL...  and finally change the original stmt to 
_12 = x_4(D) * 5

how is that right?  _1 is now a different type?  but regardless, we have 2
statements with _1 as a LHS for a brief time.

And rangers real complaint is that we have a range for _1, but its being
updated by a stmt which is not actually in the IL yet during this update.. so
it is finding no basic block info for an SSA name which is thinks its knows
something about already because it WAS in the IL.

It also does not seem correct to me that you can take an existing SSA_NAME and
change its type on the fly while its still being used in the IL?   Once we
create an ssa name, I thought it was immutable from a type point of view untl
deleted and reused?

(gdb) p print_generic_expr (stderr, lhs->typed.type, 0)
_BitInt(128)$7 = void
(gdb) p print_generic_expr (stderr, lhs2->typed.type, 0)
int128_t$8 = void

[Bug tree-optimization/112788] [14 regression] ICEs in fold_range, at range-op.cc:206 after r14-5972-gea19de921b01a6

2023-12-01 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112788

--- Comment #2 from Andrew Macleod  ---
(In reply to Kewen Lin from comment #1)

> 
> ranger makes use of type precision directly instead of something like
> types_compatible_p. I wonder if we can introduce a target hook (or hookpod)
> to make ranger unrestrict this check a bit, the justification is that for
> float type its precision information is encoded in its underlying
> real_format, if two float types underlying modes are the same, the precision
> are actually the same. 
> 
> btw, the operand_check_ps seems able to call range_compatible_p?

It could, but just a precision check seemed enough at the time.
The patch also went thru many iterations and it was only the final version that
operand_check_p() ended up with types as the parameter rather than ranges.

You bring up a good point tho. I just switched those routines to call
range_compatible_p() and checked it in.  Now it is all centralized in the one
routine going forward. 

It does seem wrong that the float precision don't match, and weird that its
hard to fix :-)   It should now be possible to have range_compatible_p check
the underlying mode for floats rather than the precision...  If there's a good
argument for it, and you want to give that a shot...

[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre

2023-11-30 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922

--- Comment #13 from Andrew Macleod  ---
Created attachment 56735
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56735&action=edit
patch

(In reply to Sam James from comment #12)
> Is the plan to backport to 11/12/13 or to leave it?

hmm. I don't think I would apply the same patch  (it wouldn't work as is
anyway), but if we wanted to fix this in earlier releases we could simply have
the custom fold_ranges return false when the precision doesn't match...  it
would at least avoid most of these traps in earlier releases...?

The attached patch for instance would probably apply to GCC 13, 12 and 11..   
I  can test these if we want to do that...

[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre

2023-11-29 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922

Andrew Macleod  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #11 from Andrew Macleod  ---
fixed.

[Bug tree-optimization/111293] [14 Regression] Missed Dead Code Elimination since r14-3414-g0cfc9c953d0

2023-11-24 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111293

--- Comment #2 from Andrew Macleod  ---
It seems like we set 'e' to 3 immediately at the start:
   [local count: 1073741824]:
  e = 3;
  goto ; [100.00%]

and it is never changed again. However, when we load from 'e' later in the IL
   [local count: 9485263241]:
  e.1_6 = e;

we simply get varying. Is some pass suppose to propagate this?  This reminds me
of a few other regression PRs where we no longer propagate known values from
loads from memory into ssa-names.

If we knew that e.1_6 was '3', then the call to foo would be folded away as
never executable.

[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre

2023-11-22 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922

--- Comment #5 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #4)

> 
> I think
>   Value_Range vr (operand_type);
>   if (TREE_CODE_CLASS (operation) == tcc_unary)
> ipa_vr_operation_and_type_effects (vr,
>src_lats->m_value_range.m_vr,
>operation, param_type,
>operand_type);
> should be avoided if param_type is not a compatible type to operand_type,
> unless operation is some cast operation (NOP_EXPR, CONVERT_EXPR, dunno if
> the float to integral or vice versa ops as well but vrp probably doesn't
> handle that yet).
> In the above case, param_type is struct A *, i.e. pointer, while
> operand_type is int.

the root of the issue is that the precisions are different, and we're invoking
an operation which expects the precisions to be the same (minus in this case). 
 we can't deal with this in dispatch because some operations allow the LH and
RH to be different precisions or even types.

It also seems like overkill to have every operation check the incoming
precision, but perhaps not... we could limit it to the wi_fold() subsets.. let
me have a look. if we get incompatible types, perhaps returning VARYING should
be OK?

[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre

2023-11-22 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922

--- Comment #9 from Andrew Macleod  ---
(In reply to Jakub Jelinek from comment #8)

> Well, in this case the user explicitly told compiler not to do that by not
> using a prototype and syntax which doesn't provide one from the definition.
> It is like using
> int f1 (struct C *x, struct A *y)
> {
>   ...
> }
> definition in one TU, and
> int f1 (int, int);
> prototype and
> f1 (0, ~x)
> call in another one + using LTO.  What I meant is how to do decide if the
> param_type vs. operand_type mismatch is ok or not.

I vote we do nothing extra for those clowns! Just return VARYING for a range
:-)

it seems like the safest thing to do?

[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre

2023-11-22 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922

--- Comment #7 from Andrew Macleod  ---
Explicit casts would be no problem as they go through the proper machinery. The
IL for that case has an explicit cast in it.

  _1 = (int) x_2(D);
  foo (_1);

its when that cast is not present,and we try to, say subtract two values, that
we have a problem.  we expect the compiler to promote things to be compatible
when they are suppose to be. This would apply to dual operand arithmetic like
+, -, /, *, bitwise ops, etc.

The testcase in particular is a bitwise not... but it has a return type that is
64 bits and a operand type that is 32.  It was expected that the compiler would
promote the operand to 64 bits if it expects a 64 bit result. At least for
those tree codes which expect compatible types..

I don't think we want to get into overruling decisions at the range-ops level..
 So we decide whether to trap (which would be the same result as we see now
:-P), or handle it some other way.  returning VARYING was my thought.. because
it means something is amuck so say we dont know anything.

Alternatively, if IPA could figure out when things need promoting..  GCC must
already do it, although I suppose thats in the front ends :-P

[Bug tree-optimization/112509] [14 Regression] internal compiler error: in verify_range, at value-range.cc:1132

2023-11-14 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112509

Andrew Macleod  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #5 from Andrew Macleod  ---
fixed.

[Bug tree-optimization/112509] GCC: 14: internal compiler error: in verify_range, at value-range.cc:1132

2023-11-13 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112509

--- Comment #2 from Andrew Macleod  ---
the original switch is an unsigned value with precision of 3 bits, so 0..7 just
fits.

It gets translated to an int index during gimplification, but the case labels
are still precision 3 values. 

find_case_label_ranges in tree-vrp.cc does the following:

 tree label = gimple_switch_label (switch_stmt, i);
  tree case_high
= CASE_HIGH (label) ? CASE_HIGH (label) : CASE_LOW (label);
  wide_int wlow = wi::to_wide (CASE_LOW (label));
  wide_int whigh = wi::to_wide (case_high);
  int_range_max label_range (type,
 wide_int::from (wlow, prec, sign),
 wide_int::from (whigh, prec, sign));

 CASE_HIGH is indeed 6, but when we to the to_wide call, it is sign extended to
produce:
(gdb) p whigh.dump ()
[0xfffe], precision = 3

but then when we do wide_int::from() in the signed int precision 32 of the new
switch type, we get -2 for the high value.

Whats suppose to happen here?  seems odd to me that the case labels are a
different precision than the switch itself..  but then we would also lose the
fact that there is only 3 bits of precision in the switch.

Should the low and high values be cast from the original case precision/type to
the new perhaps?

something like this seems to work:
 tree
 find_case_label_range (gswitch *switch_stmt, const irange *range_of_op)
@@ -900,9 +901,9 @@ find_case_label_range (gswitch *switch_stmt, const irange
*range_of_op)
= CASE_HIGH (label) ? CASE_HIGH (label) : CASE_LOW (label);
   wide_int wlow = wi::to_wide (CASE_LOW (label));
   wide_int whigh = wi::to_wide (case_high);
-  int_range_max label_range (type,
-wide_int::from (wlow, prec, sign),
-wide_int::from (whigh, prec, sign));
+  int_range_max label_range (TREE_TYPE (case_high), wlow, whigh);
+  range_cast (label_range, type);
+  label_range.dump(stderr);

[Bug tree-optimization/106511] [13/14 Regression] New -Werror=maybe-uninitialized since r13-1268-g8c99e307b20c502e

2023-11-07 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106511

--- Comment #8 from Andrew Macleod  ---
(In reply to Richard Biener from comment #2)
> VRP could use max_stmt_executions here (note it doesn't properly handle loop
> nests so the name is somewhat misleading)

Do you have to ask for max_stmt_executions on each loop nest for the stmt to
get the real total?  and multiple them all together?  or can you only count on
loop_outer()?

[Bug tree-optimization/110540] [14 Regression] Dead Code Elimination Regression since r14-1163-gd8b058d3ca4

2023-11-07 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110540

--- Comment #3 from Andrew Macleod  ---
EVRP figures out the j value can only be 0 or -9 and removes a few stmts as a
result, and simplifies the PHI, producing

   [local count: 114863530]:
  # j_12 = PHI <0(2), -9(6)>

   [local count: 1073741824]:
  # j_11 = PHI 
  if (j_11 != -9)
goto ; [94.50%]
  else
goto ; [5.50%]

which when we get to cunolli , looks like a loop and it does nothing.

When EVRP doesn't reduce the range from [-9, 0], we are instead left with:
   :
  # j_16 = PHI <0(2), _35(7)>
  if (j_16 != -9)
goto ; [INV]
  else
goto ; [INV]

After DSE, we have:
 :
  *_8 = 0;

   : 
  # j_16 = PHI <0(2), -9(6), -9(5)>
  if (j_16 != -9)
goto ; [INV]
  else
goto ; [INV]

and the  CDDCE comes along and that turns into:
 :
  *_8 = 0;

   :
  # j_17 = PHI <0(2), -9(6)>

   :
  # j_16 = PHI 
  if (j_16 != -9)
goto ; [INV]
  else
goto ; [INV]

Which is what eventually causes our problem, and this ends up looking like
another loop to cunrolli, and it does no analysis of it other than to mark it
as a 3rd loop.   And we never actually get rid of this.

IN the original version, cunroll analysis the loop and turns it into something
useful, without the extra phi..

If we dont do ccdce, then the same transformation shows up after switchconv in
profile_estimate..

So IM guessing this is a CFG "cleanup".

So EVRP cleans up the code, but it appears that it confuses cunrolli into
thinking there is nothing to do?

[Bug tree-optimization/106511] [13/14 Regression] New -Werror=maybe-uninitialized since r13-1268-g8c99e307b20c502e

2023-11-06 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106511

--- Comment #7 from Andrew Macleod  ---
(In reply to Richard Biener from comment #2)
> VRP could use max_stmt_executions here (note it doesn't properly handle loop
> nests so the name is somewhat misleading)

Given some arbitrary statement S, how do I find the relevant loop pointer to
pass to max_stmt_executions?I don't suppose there is a generic version
which will take care of that lookup?

This seems like something the phi analyzer could easily use to calculate the
number of times the modifier statement triggers when we have loop PHIs and some
other PHI in parallel.. if I can easily ask how many times stmt S is
executed...

[Bug tree-optimization/111472] [11/12/13/14 Regression] Wrong code at -Os on x86_64-linux-gnu since r11-4563-gd0d8b5d836

2023-11-06 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111472

--- Comment #4 from Andrew Macleod  ---
(In reply to Andrew Macleod from comment #3)
> I'm not sure that this didn't uncover something elsewhere, possibly ivopts
> since that pass seems to be generating something different and I think there
> were some fixes put in there on trunk?.
> 
> Regardless, this currently works on trunk. Can we run a regression on trunk
> and see what patch fixed it? 
> 
> Andrew

And still fails on GCC13, so it appears to have been a trunk patch that fixed
it.

[Bug tree-optimization/111472] [11/12/13/14 Regression] Wrong code at -Os on x86_64-linux-gnu since r11-4563-gd0d8b5d836

2023-11-06 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111472

--- Comment #3 from Andrew Macleod  ---
I'm not sure that this didn't uncover something elsewhere, possibly ivopts
since that pass seems to be generating something different and I think there
were some fixes put in there on trunk?.

Regardless, this currently works on trunk. Can we run a regression on trunk and
see what patch fixed it? 

Andrew

[Bug tree-optimization/111766] Missed optimization with __builtin_unreachable and ands

2023-11-03 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111766

Andrew Macleod  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Andrew Macleod  ---
oops. fixed

[Bug tree-optimization/111766] Missed optimization with __builtin_unreachable and ands

2023-11-03 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111766

--- Comment #3 from Andrew Macleod  ---
fixed

[Bug tree-optimization/111622] [13 Regression] EVRP compile-time hog compiling risc-v insn-opinit.cc

2023-10-13 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111622

Andrew Macleod  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #9 from Andrew Macleod  ---
Fixed.

[Bug tree-optimization/111622] [13 Regression] EVRP compile-time hog compiling risc-v insn-opinit.cc

2023-10-12 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111622

--- Comment #6 from Andrew Macleod  ---
Interesting.

The "fix" turns out to be:

commit 9ea74d235c7e7816b996a17c61288f02ef767985
Author: Richard Biener 
Date:   Thu Sep 14 09:31:23 2023 +0200

tree-optimization/111294 - better DCE after forwprop 

The following adds more aggressive DCE to forwprop to clean up dead
stmts when folding a stmt leaves some operands unused.  The patch 
uses simple_dce_from_worklist for this purpose, queueing original 
operands before substitution and folding, but only if we folded the
stmt.

This removes one dead stmt biasing threading costs in a later pass
but it doesn't resolve the optimization issue in the PR yet.

Which implies something pathological was triggering in VRP, so I dug a little
deeper...

It seems to be a massive number of partial equivalencies generated by sequences
like:
  _5 = (unsigned int) _1;
  _10 = (unsigned int) _1;
  _15 = (unsigned int) _1;
  _20 = (unsigned int) _1;
  _25 = (unsigned int) _1;
<...>
for a couple of hundred statements.   these are all then members of a partial
equivalence set, and we end up doing obscene amounts of pointless looping and
recomputing of ranges of things in the set when say _1 may change.

The intent of partial equivalence is to allow us to reflect known subranges
thru casts, but not to build up large groups like in an equivalence. There
should be a limit to the size. We start to lose most of the usefulness when the
grouping gets too large.

[Bug tree-optimization/111694] [13/14 Regression] Wrong behavior for signbit of negative zero when optimizing

2023-10-11 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111694

Andrew Macleod  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Andrew Macleod  ---
now fixed on gcc 13 too.

[Bug tree-optimization/108697] constructing a path-range-query is expensive

2023-10-11 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108697

Andrew Macleod  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Andrew Macleod  ---
fixed.

[Bug tree-optimization/111766] Missed optimization with __builtin_unreachable and ands

2023-10-11 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111766

--- Comment #1 from Andrew Macleod  ---


Imports: bb_3(D)
Exports: _2  bb_3(D)
 _2 : bb_3(D)(I)
bb_3(D) [irange] int [0, 3] MASK 0x3 VALUE 0x0
 :
_2 = bb_3(D) & 1;
if (_2 == 0)
  goto ; [INV]
else
  goto ; [INV]

_2 : [irange] int [0, 1] MASK 0x1 VALUE 0x0
4->5  (T) _2 :  [irange] int [0, 0] MASK 0x1 VALUE 0x0
4->5  (T) bb_3(D) : [irange] int [0, 0][2, 2] MASK 0x2 VALUE 0x0
4->6  (F) _2 :  [irange] int [1, 1] MASK 0x1 VALUE 0x0
4->6  (F) bb_3(D) : [irange] int [1, 3] MASK 0x2 VALUE 0x1

Looks like its just a lack of completeness in
operator_bitwise_and::op1_range().
on the edge from 4->6 ranger knows _2 is [1,1], but when op1_range solves for
  _2 = bb_3 & 1
we have  
  [1,1] =  int [0, 3] MASK 0x3 VALUE 0x0&   1
and it produces int [1, 3] rather than [1,1][3,3]

[Bug tree-optimization/111694] [13/14 Regression] Wrong behavior for signbit of negative zero when optimizing

2023-10-09 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111694

--- Comment #9 from Andrew Macleod  ---
(In reply to Andrew Macleod from comment #8)
> (In reply to Alexander Monakov from comment #7)
> > No backport for gcc-13 planned?
> 
> mmm, didn't realize were we propagating floating point equivalences around
> in 13.  similar patch should work there

Testing same patch on gcc13. will let it settle on trunk for a day or two
first, then check it in if nothing shows up.which it shouldn't :-)

[Bug tree-optimization/111694] [13/14 Regression] Wrong behavior for signbit of negative zero when optimizing

2023-10-09 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111694

Andrew Macleod  changed:

   What|Removed |Added

 Resolution|FIXED   |---
 Status|RESOLVED|REOPENED

--- Comment #8 from Andrew Macleod  ---
(In reply to Alexander Monakov from comment #7)
> No backport for gcc-13 planned?

mmm, didn't realize were we propagating floating point equivalences around in
13.  similar patch should work there

[Bug tree-optimization/111694] [13/14 Regression] Wrong behavior for signbit of negative zero when optimizing

2023-10-09 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111694

Andrew Macleod  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Andrew Macleod  ---
fixed

[Bug tree-optimization/111694] [13/14 Regression] Wrong behavior for signbit of negative zero when optimizing

2023-10-06 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111694

--- Comment #4 from Andrew Macleod  ---
(In reply to Richard Biener from comment #3)
> Looks like some frange / relation mistake then.


l_3(D)  [frange] double [-Inf, +Inf]
Equivalence set : [l_3(D), r_4(D)]
 :
_1 = __builtin_signbit (l_3(D));
if (_1 != 0)
  goto ; [INV]
else
  goto ; [INV]

3->6  (T) _1 :  [irange] int [-INF, -1][1, +INF]
3->6  (T) l_3(D) :  [frange] double [-Inf, -0.0 (-0x0.0p+0)]
3->4  (F) _1 :  [irange] int [0, 0]
3->4  (F) l_3(D) :  [frange] double [0.0 (0x0.0p+0), +Inf]

 :
_2 = __builtin_signbit (r_4(D));

Yeah, we know l_3 and r_4 are equivalent, and we also know that on the edge
3->4 l_3 has the range  double [0.0 (0x0.0p+0), +Inf]

When we miss is that with an equivalence, we also have to put -0.0 back into
the range.  We currently don't so we think we can fold the second signbit call.

If I fix that, we then see 
r_4(D)  [frange] double [-0.0 (-0x0.0p+0), +Inf]
which prevents the folding.  

I need to audit to see if there are other places where we may have to "adjust"
equivalence range, or how best to deal with this in the general case.

[Bug tree-optimization/111622] [13 Regression] EVRP compile-time hog compiling risc-v insn-opinit.cc

2023-09-28 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111622

--- Comment #3 from Andrew Macleod  ---
(In reply to Richard Biener from comment #1)
> Created attachment 56006 [details]
> preprocessed riscv insn-opinit.cc

I get 

i.ii:2203:11: fatal error: definition of ‘class std::initializer_list<_E>’ does
not match ‘#include ’
 2203 | class initializer_list
  |   ^~~~
compilation terminated.

 with the .ii file...  do I need something else?

[Bug tree-optimization/111599] [14 Regression] ICE: Segmentation fault in VRP

2023-09-26 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111599

Andrew Macleod  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Andrew Macleod  ---
fixed.

[Bug tree-optimization/110315] [13 Regression] g++ crashes with a segmentation fault while compiling a large const std::vector of std::string since r13-6566-ge0324e2629e25a90

2023-09-26 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110315

Andrew Macleod  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #13 from Andrew Macleod  ---
fixed.

[Bug tree-optimization/111599] [14 Regression] ICE: Segmentation fault in VRP

2023-09-26 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111599

--- Comment #3 from Andrew Macleod  ---
patch in testing

[Bug tree-optimization/110315] [13 Regression] g++ crashes with a segmentation fault while compiling a large const std::vector of std::string since r13-6566-ge0324e2629e25a90

2023-09-26 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110315

--- Comment #11 from Andrew Macleod  ---
(In reply to Richard Biener from comment #10)
> (In reply to Andrew Macleod from comment #7)
> > Created attachment 55591 [details]
> > potental patch
> > 
> > I've attached Aldy's patch for PR109695 which he had backported to GCC13
> > back in May.
> > This does resolve the issue.. Assuming we want to apply it GCC13.  The
> > original variant has been in trunk for a while now.  Im running this thru a
> > bootstrap/regression run now.
> > 
> > I don't know if there is anything further you want to try it with?
> 
> Can we consider this please?

sure. OK for gcc 13 after another round of regressions?

[Bug tree-optimization/93917] VRP forgets range of value read from memory

2023-09-19 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93917

--- Comment #4 from Andrew Macleod  ---
Note a slight change in expectation as a result of commit
r14-4141-gbf6b107e2a342319b3787ec960fc8014ef3aff91  for PR 110080

Due to a memory load in the second case, we do not remove the unreachable call
now as there may be a commoning opportunity later (via inlining in this case)
in which this unreachable call may provide new information.

The test case has been adjusted to current expectations where we leave this
unreachable call and then remove it and set the global range in VRP2 instead.

[Bug tree-optimization/110249] __builtin_unreachable helps optimisation at -O1 but not at -O2

2023-09-19 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110249

Andrew Macleod  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||amacleod at redhat dot com
 Resolution|--- |FIXED

--- Comment #6 from Andrew Macleod  ---
Fixed.

[Bug tree-optimization/110080] [13/14 Regression] Missed Dead Code Elimination at -Os when using __builtin_unreachable since r13-6945-g429a7a88438

2023-09-19 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110080

Andrew Macleod  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Andrew Macleod  ---
fixed.

[Bug tree-optimization/110875] [14 Regression] Dead Code Elimination Regression since r14-2501-g285c9d042e9

2023-09-07 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110875

Andrew Macleod  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #4 from Andrew Macleod  ---
When range_of_stmt invokes prefill_name to evaluate unvisited dependencies it
should not mark visited names as always_current.

when ranger_cache::get_globaL_range() is invoked with the optional  "current_p"
flag, it triggers additional functionality. This call is meant to be from
within ranger and it is understood that if the current value is not current, 
set_global_range will always be called later with a value.  Thus it sets the
always_current flag in the temporal cache to avoid computation cycles.

the prefill_stmt_dependencies () mechanism within ranger is intended to emulate
the bahaviour of range_of_stmt on an arbitrarily long series of unresolved
dependencies without triggering the overhead of huge call chains from the
range_of_expr/range_on_entry/range_on_exit routines.  Rather, it creates a
stack of unvisited names, and invokes range_of_stmt on them directly in order
to get initial cache values for each ssa-name.

The issue in this PR was that routine was incorrectly invoking the
get_global_cache to determine whether there was a global value.  If there was,
it would move on to the next dependency without invoking set_global_range to
clear the always_current flag.

What it should have been doing was simply checking if there as a global value,
and if there was not, add the name for processing and THEN invoke
get_global_value to do all the special processing.
fixed.

[Bug tree-optimization/110918] [14 Regression] Dead Code Elimination Regression at -O3 since r14-2331-g018e7f16408

2023-08-23 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110918

Andrew Macleod  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #3 from Andrew Macleod  ---
fixed.

[Bug middle-end/111009] [12/13/14 regression] -fno-strict-overflow erroneously elides null pointer checks and causes SIGSEGV on perf from linux-6.4.10

2023-08-17 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111009

--- Comment #10 from Andrew Macleod  ---

> At the hazard of stating the obvious: it's a wrong-code when you execute it
> (not a gcc ICE).
> 

doh. of course. 

test is in progress.  Richi was correct.  Although the code in range-ops for
fold_range is correct, op1_range cannot make the same assumptions that
fold_range does because it has no concept of the symbolic values on the RHS.
I am making op1_range more restrictive.

[Bug middle-end/111009] [12/13/14 regression] -fno-strict-overflow erroneously elides null pointer checks and causes SIGSEGV on perf from linux-6.4.10

2023-08-16 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111009

--- Comment #8 from Andrew Macleod  ---
Do I need some special target or something?   on trunk just
"-fno-strict-overflow -O3" doesnt fail for me on x86_64-pc-linux-gnu...
./cc1 -fno-strict-overflow -O3 009.c  -quiet

[Bug tree-optimization/110942] [14 Regression] Dead Code Elimination Regression at -O3 since r14-1165-g257c2be7ff8

2023-08-16 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110942

--- Comment #3 from Andrew Macleod  ---
The original revision listed, I narrowed down to a single instance where the
new code did something that makes a difference
we determine that in stmt
  stmt _8 = (int) i_10;
which originally had a range of int [0, 0][8, 8] MASK 0x8 VALUE 0x0,
the new code allows it to determine that the result is actually now int [0, 0],
which allows stmt_8 to be propagated as "0" now.. and thus dead. 
It was also dead before, but stayed around for a fe wpasses because it wasnt a
constant, so wasn't put in the "let VRP kill this stmt because it was fully
propagated as a constant" list.

The only difference in the IL as a result is the block which contained that
stmt has a phi:
>:
>   # k_14 = PHI 
>   _8 = (int) i_10;
and with the current trunk, we remove that block, which causes the fallthru
block to have an extra incoming edge:

<   # i_10 = PHI <0(4), 8(25), 8(10), 0(3)>
<   # k_15 = PHI <0(4), k_13(25), k_15(10), 0(3)>
---
>   # i_10 = PHI <0(4), 8(26), 0(3)>
>   # k_15 = PHI <0(4), k_14(26), 0(3)>


That is literally the only difference coming out of EVRP, so perhaps the extra
edge coming into the block is causing some threading issues?

[Bug tree-optimization/110942] [14 Regression] Dead Code Elimination Regression at -O3 since r14-1165-g257c2be7ff8

2023-08-16 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110942

--- Comment #1 from Andrew Macleod  ---
Created attachment 55743
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55743&action=edit
patch to revert the change

Although the bisection stopped at this change, it does not appear to be the
underlying culprit.  The attached patch will revert that change (its pretty
simple) and the problem still exists.. so something other than that presumably
caused it.

I Also presume this is the first change going backwards that shows the issue,
is it possible to find the *first* change since gcc 13 was branched that caused
the issue?

[Bug middle-end/111009] [12/13/14 regression] -fno-strict-overflow erroneously elides null pointer checks and causes SIGSEGV on perf from linux-6.4.10

2023-08-15 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111009

--- Comment #4 from Andrew Macleod  ---
(In reply to Richard Biener from comment #3)
> bool
> operator_addr_expr::fold_range (irange &r, tree type,
> const irange &lh,
> const irange &rh,
> relation_trio) const
> { 
>   if (empty_range_varying (r, type, lh, rh))
> return true;
>   
>   // Return a non-null pointer of the LHS type (passed in op2).
>   if (lh.zero_p ())
> r = range_zero (type); 
> 
> not sure how this is called, but we can only derive this if the offset
> is zero as well, definitely if targetm.addr_space.zero_address_valid,
> but I think this is true in general.
> 
>   else if (!contains_zero_p (lh))
> r = range_nonzero (type);
> 
> and this is only true for TYPE_OVERFLOW_UNDEFINED (type), with
> -fwrapv-pointer we could wrap to zero.
> 
> That is, it's _not_ GIMPLE undefined behavior to compute &0->bar.


> It looks like without -fwrapv-pointer we elide the if (!a) check,
> dereferencing it when dso && dso != curr.  I suppose that looks reasonable
> with a = &dso->maj, when dso != 0 then a != 0 unless ->maj wraps.

Range-ops won't see anything like &dso->maj.. it sees rangers and nothing else.
it just gets the result of that expression determined by someone else. . so if
it see [0,0] for the range, that means &dso->maj has been determined to be 0.

When folding, addressof has some funky mechanics, and it symbolically processes
the trees in gimple-range-fold.cc  in fold_using_range::range_of_address

I think it takes care of all the funky things you mention.

I also notice in the earlier comment where we set _13 to 0...   the code you
quoted where _13 was recomputed by ranger. it ends with   
   GORI  TRUE : (797) recomputation (_13) [irange] _Bool [1, 1]
The result was [1,1] as far as ranger was concerned o the edge from 3->16
so that prop0bably isn't how gimple fold determined it was zero.

Is there still an issue here?

[Bug tree-optimization/110131] [12/13/14 Regression] Missed Dead Code Elimination when using __builtin_unreachable since r12-6924-gc2b610e7c6c

2023-08-09 Thread amacleod at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110131

--- Comment #6 from Andrew Macleod  ---
(In reply to Andrew Pinski from comment #5)
> (In reply to Andrew Pinski from comment #4)
> > So I have a VRP patch which gets us to:
> 
>   /* If the value range is defined by more than one pair,
>  try to optimize to a singularity if either
>  the first or last pair is a singleton.  */
> 
> That is if we have:
> a range like: [0, 0][3, 32768][4294934529, +INF]
> 
> and we the comparison like `_37 <= 2`
> Since 2 is a value between the first 2 pairs, we can just say this should be
> the same as `_37 == 0` because that is the only value that is valid here.
> 
> The same idea happens for the last 2 pairs (and the last pair is a
> singleton).
> 
> Also if we only 2 pairs prefer the one which is 0 (since 0 is always
> simplier).

hmm.  you should be able to trim out the "untrue" parts of the range. lets see.
do you have the comparison stmt, or is it just an expression?
maybe it doesnt matter.  if you know the expression is "_37 <= 2" you can ask
range-ops what the range of _37 is and intessect it with your range, and ask if
the result is a singleton.   then this will work for all kinds of things.

the format is 
op1_range (result_range, result_type, lhs_range, op2_range)

so it looks like:
range_op_handler handler (LT_EXPR);
if (handler)
  {
Value_Range vr(TREE_TYPE (_37));
if (handler->op1_range (vr, TREE_TYPE (_37), bool_one_range, [2,2]))
  _37_range.intersect (vr);
  }
if (_37.singleton_p ())
blah()

you have a _37_range of [0, 0][3, 32768][4294934529, +INF]
op1_range for  [1, 1] = OP1 LT_EXPR [2,2] will return [0,2]
the intersection result will be [0,0]   

this would work for any condition/type  and both false and true LHS if you
provide [0,0] or [1,1] for the lhs.

in particular this may help with things like  x != 15,   where the pair that is
relevant to produce a singleton maybe wont be at one end or the other.

poking around for a minute, it looks like simplify_using_ranges from vr_values
calls a routine called test_for_singularity which has never been converted to
multi-ranges.  I wonder if reworking it might resolve this class of issues as
well?  Eventually we planned to get to simplify_using-ranges and friends, but
it hasnt happened yet

1 2 3 4 5 6 7 8 9 >

1 - 100 of 830 matches

Mail list logo