[Bug middle-end/116899] ICE: in quick_push, at vec.h:1041 with -O at _BitInt()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116899 --- Comment #10 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #9) > Created attachment 59238 [details] > gcc15-pr116899.patch > > Untested fix in patch form. Also pre-approved. thanks
[Bug middle-end/116898] ICE: in block_range, at gimple-range-cache.cc:1293 with -O -finstrument-functions -fnon-call-exceptions and _BitInt()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116898 --- Comment #5 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #4) > Created attachment 59236 [details] > gcc15-pr116898.patch > > So like this? Will test tonight. perfect. consider it pre-pproved.
[Bug middle-end/116899] ICE: in quick_push, at vec.h:1041 with -O at _BitInt()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116899 --- Comment #7 from Andrew Macleod --- (In reply to Andrew Macleod from comment #6) > (In reply to Jakub Jelinek from comment #5) > > Even if create (n) didn't work (which would be good to debug), the > > x.safe_grow_cleared (n); x.truncate (0); can be just done more efficiently > > with x.reserve (n); > > > > Anyway, the more important question is if range-cache can work even if the > > number of basic blocks grows in the middle and whether say just using > > safe_push instead of quick_push would do the trick. > > It works fine with growing CFG. > OK, let me give you a more measured answer instead of yes. :-P the m_workback vector is used for a Walk up the CFG to mark blocks which need to be revisited because the incoming edge changes a specific SSA_NAMES range. These mostly become range-on-entry blocks that need updating, and only those block get put into the stack. As a result, even a walk from the bottom of the CFG to the top should require far less than the full number of blocks. As a result, the cache does use a quick_push. I suppose it is possible that one could add so many basic blocks that this situation could exceed the original size of the vector, however unlikely. The completely safe approach would be to change ranger_cache::fill_block_cache () to either a) assert there is space before calling quick_push or b) change it to a safe_push
[Bug middle-end/116899] ICE: in quick_push, at vec.h:1041 with -O at _BitInt()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116899 --- Comment #6 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #5) > Even if create (n) didn't work (which would be good to debug), the > x.safe_grow_cleared (n); x.truncate (0); can be just done more efficiently > with x.reserve (n); > > Anyway, the more important question is if range-cache can work even if the > number of basic blocks grows in the middle and whether say just using > safe_push instead of quick_push would do the trick. It works fine with growing CFG. whatever problem I had once upon a time is probably long gone, it was years ago. I also wasn't/am not familiar with the nuances of vec initiations or that some versions are more effiicet than others. good to know.
[Bug middle-end/116899] ICE: in quick_push, at vec.h:1041 with -O at _BitInt()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116899 --- Comment #4 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #3) > Similarly to the other BB, enable_ranger -> gimple_ranger -> ranger_cache > does > 1000m_workback.create (0); > 1001m_workback.safe_grow_cleared (last_basic_block_for_fn (cfun)); > 1002m_workback.truncate (0); > (which seems just weird, I'd expect m_workback.create > (last_basic_block_for_fn (cfun)); > Creating a large vector, filling it with zeros and then immediately > truncating is just a waste of time) and then just uses > m_workback.quick_push. That works only if no new basic blocks are created, > which the gimple-lower-bitint creates a lot (split_block in tons of places, > even manual create_basic_block). with the truncate, clearly no need for the safe growing. I think my use of the create (0); safe_grow(size); idiom dates back to a problem I once had (years ago) where "create (size)" caused me a failure down the road that was hard to debug.. and it was fixed by the ceate (0) safe_grow(size) seqeuence. , so I ended up just always doing it that way :-P I especially see no need for that safe_grow_cleared call in this case
[Bug middle-end/116898] ICE: in block_range, at gimple-range-cache.cc:1293 with -O -finstrument-functions -fnon-call-exceptions and _BitInt()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116898 --- Comment #3 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #2) > The bitint lowering pass emits some gimple statements on edges > (gsi_insert_on_edge) and the edge insertions are (intentionally) committed > to edges only at the end of pass (gsi_commit_edge_inserts ()). > So, what the gimple cache sees here is an SSA_NAME set in a stmt queued on > edge. > > Either the range cache can be adjusted not to fail assertion in cases like > that but just pretend it doesn't know the range of something not yet in the > IL (perhaps under some flag), or I'll need to revamp the pass to repeat the > handle_operand_addr and .{ADD,SUB}_OVERFLOW discovery to find out where > range_to_prec -> range_of_expr would be actually called, remember it in some > hash table or whatever data structure and then during actual pass don't use > range_of_expr anymore and just look up what has been remembered. If the > former would be possible, I'd certainly prefer that. It should be perfectly safe to return false instead of trapping on that line. The routine already returns false if it can't produce a range on entry. We've added similar functionality at other places to have stmts that are no yet in the IL, we just haven't tripped over this one before.
[Bug middle-end/114855] ICE: Segfault when compiling large autogenerated C source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 --- Comment #57 from Andrew Macleod --- FWIW, on my last run, enabling early VRP sped up my -O1 compilation by a fair amount. total compilation dropped from 480 seconds to 380 seconds... It took 2.5 seconds to run, and im going to guess might have cleaned the code up somewhat. Might be worth looking at a Fast VRP pass at -O1? I'm not sure how/where to add NEXT_PASS (pass_fast_vrp); to run it at -O1, but it can be easily experimented with via --enable-tree-evrp --param=vrp-block-limit=1 which will always run Fast VRP as the EVRP pass.
[Bug middle-end/114855] ICE: Segfault when compiling large autogenerated C source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 --- Comment #55 from Andrew Macleod --- (In reply to Richard Biener from comment #50) > (In reply to Richard Biener from comment #4) > > Trunk at -O1: > > > > dominator optimization : 495.14 ( 82%) 0.20 ( 5%) 495.44 ( > > 81%) 113M ( 5%) > > Compared to that we're now at the following state with -O1 (everything >= > 4%): > > callgraph ipa passes : 17.23 ( 10%) > df live regs : 6.76 ( 4%) > dominator optimization : 89.76 ( 50%) > backwards jump threading : 7.94 ( 4%) > TOTAL : 180.77 > > So it's still DOM aka forward threading eating most of the time. > -fno-thread-jumps improves compile-time to 77s, DOM then still takes 25s > (33%) > (top offenders are then dom_oracle::register_transitives, bitmap_set_bit > and wide_int_storage copying). I noticed the unbound dominator traversal > in register_transitives already. Theres already a creation flag to disable registering transitives if we decide we don't want them due to their expense. fast VRP disables them always, but regular VRP never disables it. We switch to fast vrp when the size of the CFG goes beyond a threshold. The threader will use a normal ranger, so it isn't getting the benefit of this flag. It would be simple enough to disable transitives in ranger if the CFG is beyond a specified size, and we could use the same --param VRP uses to switch algorithms. something like this ought to remove that hotspot: diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc index 4b94e0db7aa..f665925f630 100644 --- a/gcc/gimple-range-cache.cc +++ b/gcc/gimple-range-cache.cc @@ -1003,7 +1003,8 @@ ranger_cache::ranger_cache (int not_executable_flag, bool use_imm_uses) m_temporal = new temporal_cache; // If DOM info is available, spawn an oracle as well. - create_relation_oracle (); + bool transitives_p = (last_basic_block_for_fn (cfun) < param_vrp_block_limit); + create_relation_oracle (transitives_p); create_infer_oracle (use_imm_uses); create_gori (not_executable_flag, param_vrp_switch_limit);
[Bug tree-optimization/116588] wrong code with -O2 -fno-vect-cost-model -fno-tree-dominator-opts -fno-tree-fre --param=vrp-block-limit=0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116588 --- Comment #3 from Andrew Macleod --- Its happening in tree-ssa-propagate.cc after the simplification: /* If this is a control statement the propagator left edges unexecuted on force the condition in a way consistent with that. See PR66945 for cases where the propagator can end up with a different idea of a taken edge than folding (once undefined behavior is involved). */ if (gimple_code (stmt) == GIMPLE_COND) { if ((EDGE_SUCC (bb, 0)->flags & EDGE_EXECUTABLE) ^ (EDGE_SUCC (bb, 1)->flags & EDGE_EXECUTABLE)) { if (((EDGE_SUCC (bb, 0)->flags & EDGE_TRUE_VALUE) != 0) == ((EDGE_SUCC (bb, 0)->flags & EDGE_EXECUTABLE) != 0)) gimple_cond_make_true (as_a (stmt)); else gimple_cond_make_false (as_a (stmt)); gimple_set_modified (stmt, true); It appears that EDGE_EXECUTABLE is incorrect.. Thats as far as I have gotten so far.
[Bug middle-end/114855] ICE: Segfault when compiling large autogenerated C source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 --- Comment #34 from Andrew Macleod --- Btw, if we simply remove the call to that function, with -O1 -fenable-tree-thread1 I see: backwards jump threading : 14.36 ( 3%) 0.39 ( 5%) 14.68 ( 3%) 238M ( 9%) which is a notable improvement from 1370 seconds :-P SO clearly the culprit. DOM appears to still be somewhat of an issue.. Ive done some experimenting in this area, but nothing IM willing to commit to yet. Do you have anything outstanding for this Aldy? dominator optimization : 208.04 ( 43%) 0.95 ( 12%) 209.77 ( 43%) 342M ( 13%)
[Bug middle-end/114855] ICE: Segfault when compiling large autogenerated C source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 --- Comment #29 from Andrew Macleod --- Huh. Do we calculate *all* paths ahead of time? I tried running valgrind, which died, but up to that point it showed 77% of the time spend in the body of back_jt_path_registry::adjust_paths_after_duplication () That functions looks kind of quadratic or worse in nature, and when I run it with GDB and force a stop at a random point, I see: (gdb) p m_paths->m_vec $2 = (vec*, va_heap, vl_embed> *) 0x12f50750 (gdb) p *(m_paths->m_vec) $3 = {m_vecpfx = {m_alloc = 745039, m_using_auto_storage = 0, m_num = 572595}} (gdb) p cand_path_num $4 = 309148 (gdb) p curr_path_num $5 = 0 Am I reading that right? 572,595 paths, of which we are looking at candidate #309,148 ?? It *appears* that this is called for path 0, then path 0 is removed and its all done over again with one less path in the vector. At one point a little later in back_jt_path_registry::update_cfg I see: p m_num_threaded_edges $19 = 18669 and the size of the m_path vector is down to 558637 (gdb) p *(m_paths.m_vec) $23 = {m_vecpfx = {m_alloc = 745039, m_using_auto_storage = 0, m_num = 558637}} back_jt_path_registry::update_cfg() makes a lot of calls to back_jt_path_registry::duplicate_thread_path which then invokes adjust_paths_after_duplication All this time is spend here copying and moving things. Something seems horribly wrong about that
[Bug middle-end/114855] ICE: Segfault when compiling large autogenerated C source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 --- Comment #27 from Andrew Macleod --- (In reply to Aldy Hernandez from comment #26) > > With -O1 -fenable-tree-thread1 (no threadfull): > dominator optimization : 127.76 ( 7%) 0.57 ( 7%) 128.58 ( > 7%) 236M ( 9%) > backwards jump threading :1375.45 ( 79%) 1.81 ( 22%)1382.33 ( > 79%) 519M ( 20%) > TOTAL :1733.30 8.16 1747.35 > 2636M > > Richi mentioned that the handcuffs he added weren't being triggered in the > backwards threader because ranger was going outside the path, but > -fenable-tree-thread1 adds a light-weight threading pass that shouldn't use > anything but gori (*). So, either we're doing something stupid in path > discovery, or we're looking outside the path, even with a non-resolving > threader. Something definitely smells bad. > > (*) Well, technically we use the global ranger to look at relations. > Andrew, would looking at relations cause us to look outside of the given > path? Sure. If a relation is not resolved within the path, and there might be a relation determined before the path, then it queries the global oracle starting at the root of the path. The relation oracle does not cache anything on lookups. It was anticipated as a light weight query, and caching results would end up costing more in memory than it would save in subsequent queries. So if there are frequent queries that escape the path, and those potential relations are way up in the CFG, it could do a lot of traversals. Also, the oracle does not KNOW if there is a relation, it only knows when both ssa-names HAVE relations, so it has to go look to see if they are related to each other. That said, I tried disabling the root oracle, and there was basically no difference. So that is probably a red herring. I also tried disabling relations.. ie, hacked the path oracle to always return VREL_VARYING and not to actually register anything. the threader only dropped from 79% to 77% with no relations, so I do not think the relation oracle is the issue. Its sure pending a lot of time somewhere. still looking... It does appear to do a lot of the same work over and over.
[Bug middle-end/116003] [15 Regression] ICE: in register_initial_def, at value-relation.cc:610 with -O2 -fnon-call-exceptions -fprofile-arcs -finstrument-functions -fno-tree-copy-prop and _BitInt()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116003 Andrew Macleod changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #3 from Andrew Macleod --- fixed.
[Bug middle-end/114855] ICE: Segfault when compiling large autogenerated C source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 --- Comment #20 from Andrew Macleod --- I did an -O2 run after those patches. Highlights: tree SSA incremental : 117.74 ( 1%) 0.63 ( 3%) 120.37 ( 1%) 1049M ( 24%) dominator optimization : 680.49 ( 5%) 0.65 ( 4%) 686.40 ( 5%) 170M ( 4%) backwards jump threading :11253.23 ( 90%) 9.15 ( 49%)11332.34 ( 90%) 1332M ( 30%) TOTAL :12520.81 18.54 12622.54 4459M I suspect most of dom is the over and over edge recalculations, but I don't actually know that. I do know that most of the 200 seconds at -O1 is, so it seems a reasonable guess. the backwards threader I can't comment on, but time would be rational if that were fixed.
[Bug middle-end/114855] ICE: Segfault when compiling large autogenerated C source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 --- Comment #19 from Andrew Macleod --- (In reply to GCC Commits from comment #17) > The master branch has been updated by Andrew Macleod : > > https://gcc.gnu.org/g:9e4da946c4263a4c89d5fc365b3c97ae244c5018 > > commit r15-2858-g9e4da946c4263a4c89d5fc365b3c97ae244c5018 > Author: Andrew MacLeod > Date: Thu Aug 8 16:37:28 2024 -0400 > > Adjust rangers recomputation depth based on the number of BBs. > > As the number of block increase, recomputations can become more > expensive. Adjust the depth limit to avoid excessive compile time. > > PR tree-optimization/114855 > * gimple-range-gori.cc (gori_compute::gori_compute): Adjust > ranger_recompute_depth limit based on the number of BBs. > (gori_compute::may_recompute_p): Use previosuly calculated value. > * gimple-range-gori.h (gori_compute::m_recompute_depth): New. With the second change, on my machine at -O1 we're down to: callgraph ipa passes : 47.83 ( 11%) 1.64 ( 25%) 49.65 ( 12%) 338M ( 17%) df live regs : 12.19 ( 3%) 0.19 ( 3%) 12.41 ( 3%) 0 ( 0%) dominator optimization : 214.20 ( 51%) 0.58 ( 9%) 215.56 ( 51%) 113M ( 6%) backwards jump threading : 27.58 ( 7%) 0.20 ( 3%) 27.95 ( 7%) 68M ( 3%) tree CCP : 5.46 ( 1%) 0.15 ( 2%) 5.72 ( 1%) 14M ( 1%) tree FRE : 5.48 ( 1%) 0.20 ( 3%) 5.69 ( 1%) 2063k ( 0%) dominance computation : 8.54 ( 2%) 0.18 ( 3%) 8.79 ( 2%) 0 ( 0%) integrated RA : 28.48 ( 7%) 0.03 ( 0%) 28.61 ( 7%) 29M ( 1%) LRA non-specific : 12.43 ( 3%) 0.01 ( 0%) 12.48 ( 3%) 8977k ( 0%) TOTAL : 416.00 6.52424.24 2041M So better. Still have some DOM work to do. That is mostly edge range recalculations in the path ranger and GORI. -O2 is a different beast :-P
[Bug tree-optimization/116166] [13/14/15 Regression] risc-v (last) insn-emit-nn.c build takes hours
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116166 --- Comment #23 from Andrew Macleod --- (In reply to Richard Biener from comment #20) > collected sofar. I've noticed with the backwards threader that path > ranger isn't very good in the ability to preserve a cache when adding > blocks. But in the case of forward threading resetting the cache shouldn't I know only a little about the path ranger. I believe it is primarily a path finding class which uses ranger to provide ranges on entry ot the path, and then goes and walks the various path combinations utilizing GORI to calculate any outgoing ranges of ssa-names it thinks are interesting to determine if any conditions can folded on some path(s). > be necessary - we might get additional "interesting" names to analyze but > already analyzed names do not need re-analyzing (as opposed to the backward > threader where we add blocks at the start of the path and thus would > possibly get refined ranges). We only need to clear the cache when > popping blocks - and even then only for names defined in the popped block. > > A more pragmatic fix would be to limit the number of EDGE_NO_COPY_SRC_BLOCK > blocks we add to the path aka the depth of the thread_around_empty_blocks > recursion. Limiting that to 10 for example makes the testcase compile in 1s. > Implementing search space limiting and re-using --param max-jump-thread-paths > works as well, resulting in 3s. If there are no limits on its depth, well that could be a problem :-) Aldy Understands this much better, we should engage him upon his return.
[Bug tree-optimization/116166] risc-v (last) insn-emit-nn.c build takes hours
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116166 --- Comment #7 from Andrew Macleod --- I think fixing 114855 will probably resolve this one too. Its a more "managable" test case. I'm trying to have a look, but I am off next week so it isn't imminent. Meanwhile the "workaround" might be to use '-fno-tree-dominator-opts' and maybe '-fno-thread-jumps -fno-tree-reassoc'. I don't know if the no-tree-reassoc is still necessary, but when I was looking at 114855 for VRP it died in reassociation if I turned off thread-jumps.
[Bug middle-end/116003] [15 Regression] ICE: in register_initial_def, at value-relation.cc:610 with -O2 -fnon-call-exceptions -fprofile-arcs -finstrument-functions -fno-tree-copy-prop and _BitInt()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116003 --- Comment #1 from Andrew Macleod --- When registering an equivalence, the oracle creates an initial equivalence for an SSA_NAME with itself in the def block. This prevents dominator queries from searching past the definition block. In this case the definition stmt has no gimple_bb set yet, so it is ice-ing If the definition stmt for the SSA_NAME has not been placed in the CFG yet, we should defer creating this record until it has.The next time an equivalence is registered, if the SSA_NAME is in the IL then this record will be created. Patch in testing
[Bug bootstrap/115951] [15 Regression] pgo+lto enabled bootstrap fails building gnat (ICE in fold_stmt, at gimple-range-fold.cc:701)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115951 --- Comment #3 from Andrew Macleod --- I think I have a handle on it. The issue is statements like _80 = _79 == 0 where in ADA a a boolean type is 8 bits and GCC's default boolean type is 1 bit. All through ranger we use the type of the LHS, but when the pointer operations were split out for GCC 15 into their own type, we accidentally ended up with the relational operators using the default boolean type instead of generating a result with the appropriate type. Thus for the above statement we were getting a 1 bit boolean for the RHS and an 8 bit boolean for the LHS, resulting in incompatible ranges (which require identical precision) I'll try making those adjustments and see if it bootstraps for me.
[Bug other/115241] header-tools scripts not compatible to python3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115241 --- Comment #2 from Andrew Macleod --- (In reply to sadineniharish8446 from comment #0) > The scripts in contrib/header-tools/ are breaks with python3. > These scripts were for python2 and we did update it for python3 and sent the > patches to gcc upstream. > https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg337175.html > > We did not get any response over there about the patch. > > Can you let us know that our patches are ok OR gcc has any plan to upgrade > these scripts for python3 compatibility? Patches are OK by me. Jeff was planning to run them once a release or so, but clearly that has not happened :-)
[Bug tree-optimization/115208] [15 Regression] Memory consumption get extremely high after r15-807-gfae5e6a4dfcf92
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115208 --- Comment #6 from Andrew Macleod --- Created attachment 58287 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58287&action=edit proposed patch I'm testing this patch, does it resolve your problem? I forgot to free the gori_nmap object when the companion gori object was freed. doh.
[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985 --- Comment #21 from Andrew Macleod --- (In reply to Martin Jambor from comment #20) > (I also cannot prevent myself from ranting a little that it would > really help if all the ranger (helper) classes and functions were > better documented.) I am working on that this year. Its one of my major tasks. internals, APIs, reusable components.. an entire encyclopedia. first bits are coming shortly.
[Bug middle-end/114855] ICE: Segfault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 --- Comment #8 from Andrew Macleod --- (In reply to Andrew Macleod from comment #7) > LOoks like the primary culprits now are: > > dominator optimization : 666.73 ( 7%) 0.77 ( 2%) 671.76 ( > 7%) 170M ( 4%) > backwards jump threading :7848.77 ( 85%) 21.04 ( 65%)7920.05 ( > 85%) 1332M ( 29%) > > TOTAL :9250.99 32.58 9341.40 > 4619M If I turn off threading, then VRP opps up with 400, so I took a look at VRP. The biggest problem is that this testcase has on the order of 400,000 basic blocks, with a pattern of a block of code followed by a lot of CFG diamonds using a number of differense ssa-names from within the block over and over. When we are calculating /storing imports and exports for every block, then utilizing that info to try to find outgoing ranges that maybe we can use, it simply adds up. For VRP, we currently utilize different cache models depoending on the number of block.. Im wondering if maybe this might not be a good testcase to actually use a different VRP when the number of block are excessive. I wroite the fast VRP pass last year, which currently isnt being used. I'm goign to experiment with it to see if we turn it on for CFGs that above a threasdhold (100,000 BB? ), we enable the lower overhead fast VRP instead for all VRP passes. The threading issue probably needs to have some knobs added or tweaked for such very large BBs. there would be a LOT of threading opportunities in the code I saw, so I can see why it would be so busy. I saw a lot fo branches to branches using the same SSA_NAMe.
[Bug middle-end/111009] [12/13 regression] -fno-strict-overflow erroneously elides null pointer checks and causes SIGSEGV on perf from linux-6.4.10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111009 --- Comment #14 from Andrew Macleod --- Created attachment 58134 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58134&action=edit adjusted patch (In reply to Richard Biener from comment #13) > Andrew - this doesn't pick to gcc-13 because of the following but we should > backport the fix somehow. Can you please see to that in time for GCC 13.3 > (next week)? > Sure. Its just missing the definition of contains_zero_p () from value-range.h that must have been added in an earlier patch. I'm running the following tweaked patch thru testing for GCC 13
[Bug middle-end/114855] ICE: Segfault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 --- Comment #7 from Andrew Macleod --- LOoks like the primary culprits now are: dominator optimization : 666.73 ( 7%) 0.77 ( 2%) 671.76 ( 7%) 170M ( 4%) backwards jump threading :7848.77 ( 85%) 21.04 ( 65%)7920.05 ( 85%) 1332M ( 29%) TOTAL :9250.99 32.58 9341.40 4619M
[Bug tree-optimization/114331] Missed optimization: indicate knownbits from dominating condition switch(trunc(a))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331 --- Comment #11 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #10) > I really don't know how GORI etc. works. > But, if when the switch handling determines that _1 (the switch controlling > expression) has [irange] [111, 111] MASK 0x0 VALUE 0x6f (does it actually? > i.e. for a singleton range all the bits here are known and equal to the > value), then when trying to derive a range for related num_9(D) which is int > rather than _1's short and > _1 = (short int) num_5(D); > for the MASK/VALUE we should just use the same VALUE and or in 0x > into MASK because we then don't know anything about the upper bits. > Though, looking at the evrp detailed dump there is > 2->3 _1 : [irange] short int [111, 111] > 2->3 _2 : [irange] int [111, 111] > 2->3 num_5(D) :[irange] int [-INF, -65537][-65425, -65425][111, > 111][65536, +INF] > and so no MASK/VALUE for any of those ranges. Right. No mask needed for _1 and _2 as the range fully represents the known bits, and operator_cast::op1_range hasn't been taught to add a bitmask when calculating num_5 yet. It could have as mask and value dded to it because its implied by the result of the cast being short int [111, 111] The routine Aldy provided should create that mask when asked I think. > Now, from comments it seems that irange_bitmask is only computed on demand > to speed things up, unless it has been explicitly set. > Now, say for _1 or _2 above, we don't have anything recorded but we can > always compute it on demand from the value range. But when adding the Which I think we are both on the same page so far. > num_5(D) range based on the related _1 range, the on-demand irange_bitmask > is no longer as precise as it would be if we when deriving that [-INF, > -65537][-65425, -65425][111, 111][65536, +INF] range We aren't deriving it from that range tho. we are solving for num_5 the equation [111,111] = (short int) num_5 The range you list is the best we can currently produce with *just* ranges. if we also add that bitmask along with the range (generated from the range on the LHS, we can adjust that to what you specify > from the [111, 111] range also derived from the in that case on-demand asked > MASK 0x0 VALUE 0x6f to MASK 0x VALUE 0x6f. Right, so the LHS 16 bits produce MASK 0x VALUE 0x6f, which means those bits should apply tot he RHS as well. Since we're extending that to 32 bits, we'd have to make the upper ones unknown, so we should be able to create num_5 : [rainge] int [-INF, -65537][-65425, -65425][111, 111][65536, +INF] MASK 0x VALUE 0x6f And then in bb 3 when we see _8 = num_5(D) & 65534; instead of producing _8 : [irange] int [0, 65534] MASK 0xfffe VALUE 0x0 operator_bitwise_and::fold_range ought to be able to combine the known bits and come up with _8 : [irange] int [0, 65534] MASK 0x VALUE 0x6e, which if Aldy's bitmask code is working right (:-) should turn into _8 : [irange] int [110, 110] It should be fairly straightforward if operator_cast::op1_range creates the mask for the range it produces.
[Bug tree-optimization/114331] Missed optimization: indicate knownbits from dominating condition switch(trunc(a))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331 --- Comment #8 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #7) > (In reply to Aldy Hernandez from comment #6) > > You may want to look at: > > > > // Return the bitmask inherent in the range. > > > > irange_bitmask > > irange::get_bitmask_from_range () const > > { > > } > > > > IIRC, this code was originally authored by Jakub and was embedded in the > > tree-ssanames.cc code setting nonzero bits every time we set a global range. > > I just abstracted it and adapted it to work within our framework. > > I'm afraid on this testcase trying to derive bitmask from range is not going > to work if the range is that > [irange] int [-INF, -65537][-65425, -65425][111, 111][65536, +INF] > We really want to record that because of the bb we know that the low 16 bits > are equal to 111 and the upper 16 bits are unknown. > Because the range could help there only if we had a union of 65536 subranges > here. Isnt that exactly what the known bits is suppose to do? op1_range already knows [111,111] is the lower 15 bits, but it doesnt add the bitmask. We shoulod also be able to set the known mask for the lower 16 bits as well? > tree-ssanames.cc before contained just a single mask, so could only track > known zero bits vs. unknown bits rather than known zero bits vs. known one > bits vs. unknown bits. > While MASK 0x VALUE 0x6f can represent what supposedly the > BIT_AND_EXPR optimization needs (whether we actually use it then to optimize > it away is another question). then when we see _8 = num_5(D) & 65534; the operator_bitwise_and::fold_range should pick up the known bits from the num_5 range, recognize its the only relevant part of the mask, and get it right shouldn't it? It can't do it without the mask for sure. It possible that operator_bitwise_and::fold_range needs some tweaking to recognize the known bits in op1 and map it to the mask in op2, but maybe we alreayd do that. I havent looked at that code either, but it seems like something it should be doing. So instead of _8 : [irange] int [0, 65534] MASK 0xfffe VALUE 0x0 we'd get [110,110] for _8
[Bug tree-optimization/114331] Missed optimization: indicate knownbits from dominating condition switch(trunc(a))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331 --- Comment #5 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #4) > Actually, looking at value-range.h, irange_bitmask doesn't have just the > mask but also value, so I wonder why it isn't able to figure this out. > I'd think m_mask should be ~0x and value 111. _1 = (short int) num_5(D); _2 = (int) _1; switch (_1) globally we know _2 : [irange] int [-32768, 32767] and coming into bb 3: _2 : [irange] int [-32768, 32767] 2->3 _1 : [irange] short int [111, 111] 2->3 _2 : [irange] int [111, 111] 2->3 num_5(D) :[irange] int [-INF, -65537][-65425, -65425][111, 111][65536, +INF] I guess what you are looking for is to add known bits to the range produced for num_5 on the outgoing edge. This would have to be done in operator_cast::op1_range where we are reducing it to known range of the switch. I do not believe it makes any attempts to set bitmasks based on casts like that currently. so, GORI working backwards has _1 = [irange] short int [111, 111] , so it would be satisfying the expression: short int [111, 111] = (short int) num_5(D); when evaluating num_5 in operator_cast::op1_range, in the truncating_cast_p section we could also see if there is a bitmask pattern for the LHS that could be applied to the range.. I think that basically what you are asking for. Simple enough for a constant I suppose for starters. Or maybe Aldy has a routine that picks bitmasks and values from ranges somewhere?
[Bug tree-optimization/114331] Missed optimization: indicate knownbits from dominating condition switch(trunc(a))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331 --- Comment #3 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #2) > I thought we don't, if the testcase starts with > void dummy (void); > short int foo (void); > int src(void) { > short int num = foo (); > switch(num){ > instead then evrp1 optimizes away the BIT_AND_EXPR. > Though, if the ranger just has a known zero mask in addition to normal > ranges, maybe it can't handle it, because the range of num with (short)num > == 111 is hard to express > (it would be a union of 65536 ranges). > Maybe bit-ccp should be able to handle that though, this case is exactly that > upper 16 bits of the value are uknown and lower 16 bits of the value are all > known, > some of them 0, some of them 1. > But maybe bit-ccp doesn't handle switches. We certainly do fully handle switches. The problem is the case is on a cast and then we use num in its fullsize in the expression. So its all the other bits: num_5(D)[irange] int [-INF, -65537][-65425, -65425][111, 111][65536, +INF] : : _8 = num_5(D) & 65534; goto ; [INV] _8 : [irange] int [0, 65534] MASK 0xfffe VALUE 0x0 Given it starts with: _1 = (short int) num_5(D); _2 = (int) _1; switch (_1) [INV], case 111: [INV], case 204: [INV], case 263: [INV], case 267: [INV]> Its difficult to recognize that the known bits of 'num_5' match each switch case.
[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 --- Comment #23 from Andrew Macleod --- Created attachment 57686 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57686&action=edit another patch (In reply to Richard Biener from comment #22) > (In reply to Andrew Macleod from comment #21) > > > > And have that all work with general trees expressions.. That would solve > > much of this for you? > > Yes, I wouldn't mind if range_on_{entry,exit} handle general tree > expressions, > there's enough APIs to be confused with already ;) > > > I promoted range_on_exit and range_on_entry to be part of the API in this patch. This brings valeu_query in line with rangers basic 5 routine API. It also tweaks rangers versions to handle tree expressions. It bootstraps and shows no regressions, with the caveat that I haven't actually tested the usage of range_on_entry and exit with arbitrary trees. As you can see, I didnt change much... so it should work OK. > > > > > > > > > > Interestingly enough we somehow still need the > > > > > > > > > > > hunk of Andrews patch to do it :/ > > > > > > > That probably means there is another call somewhere in the chain with no > > context. However, I will say that functionality is more important than it > > seems. Should have been there from the start :-P. > > Possibly yes. It might be we fill rangers cache with VARYING and when > we re-do the query as a dependent one but with context we don't recompute > it? I also only patched up a single place in SCEV with the context so > I possibly missed some others that end up with a range query, for example > through niter analysis that might be triggered. My guess is the latter. Without a context and with that change, ranger evaluates the definition with the context at the location of the def, then simply uses that value. If anything it is dependent on later changes, the temporal cache should indicate it's out of date and trigger a new fold using current values.
[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 --- Comment #21 from Andrew Macleod --- (In reply to Richard Biener from comment #19) > > While ranger has a range_on_exit API this doesn't work on GENERIC expressions > as far as I can see but only SSA names but I guess that could be "fixed" > given range_on_exit also looks at the last stmt and eventually defers to > range_of_expr (or range_on_entry), but possibly get_tree_range needs > variants for on_entry/on_exit (it doesn't seem to use it's 'stmt' context > very consistently, notably not for SSA_NAMEs ...). That would appear to be an oversight. That API has not been used very much for arbitrary generic trees. I think the original reason support for tree expressions was added was a "try this" for some other PR. It was simple to do so we lef tit in, but it never got any real traction. At least as far as I can recall :-) Currently, I think mosrt, if not all, uses of get_tree_range() are either !gimple_ssa_range_p() (commonly constants or unsupported types) or ssa_names on abnormal edges. For abnormal edges, we ought to be getting the global range directly these days instad of calling that routine. Then in get_tree_range (), we ought to be calling range_of_expr for SSA_NAMES with the provided context. I'll poke at that too. The support for general tree expressions changed the original intent of the function, and it should be adjusted. As for the on-exit/on-entry bits... we haven't had a need for entry/exit outside of ranger in the past. I had toyed with exporting those routines and making them a part of the official API for value-query, but hadn't run across the need as yet. Let me think about that for a minute. It can certainly be done. I guess we really only need an on-entry and on-exit version of range_of_expr to do everything. So if we end up with something like: range_of_expr (r, expr, stmt) range_of_expr_on_entry (r, expr, bb) range_of_expr_on_exit (r, expr, bb) And have that all work with general trees expressions.. That would solve much of this for you? > > Interestingly enough we somehow still need the > > > hunk of Andrews patch to do it :/ > That probably means there is another call somewhere in the chain with no context. However, I will say that functionality is more important than it seems. Should have been there from the start :-P.
[Bug middle-end/96564] [11/12/13/14 Regression] New maybe use of uninitialized variable warning since r11-959
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96564 --- Comment #15 from Andrew Macleod --- (In reply to Richard Biener from comment #13) > (In reply to Jeffrey A. Law from comment #12) > > So I think we could solve this with a bit of help from the alias oracle. We > > have the routine ptrs_compare_unequal, but points-to-null is going to get > > in the way. > > > > I think VRP and DOM have enough information to rule out NULL for both > > objects in question. So if we could query the points-to information, > > ignoring NULL then we could likely solve this particular bug. > > > > Essentially VRP or DOM would prove NULL isn't in the set of possible values > > at the comparison point. Then we query the alias information ignoring NULL. > > Voila we compute a static result for the comparison of the two pointers and > > the problematical block becomes unreachable and the bogus warning goes away. > > > > Richi, any thoughts in viability of such an API? > > We now treat pt.null conservatively and track non-null-ness derived from > range-info in it. That means when VRP/DOM can prove a pointer is always > not NULL they can do set_ptr_nonnull (p) on it. > > This means the > > /* ??? We'd like to handle ptr1 != NULL and ptr1 != ptr2 > but those require pt.null to be conservatively correct. */ > > is no longer true and we could finally implement it, like with > > diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc > index e7c1c1aa624..5b6d9e0aa6a 100644 > --- a/gcc/tree-ssa-alias.cc > +++ b/gcc/tree-ssa-alias.cc > @@ -479,9 +479,25 @@ ptrs_compare_unequal (tree ptr1, tree ptr2) > } >return !pt_solution_includes (&pi->pt, obj1); > } > - > - /* ??? We'd like to handle ptr1 != NULL and ptr1 != ptr2 > - but those require pt.null to be conservatively correct. */ > + else if (TREE_CODE (ptr1) == SSA_NAME) > +{ > + struct ptr_info_def *pi1 = SSA_NAME_PTR_INFO (ptr1); > + if (!pi1 > + || pi1->pt.vars_contains_restrict > + || pi1->pt.vars_contains_interposable) > + return false; > + if (integer_zerop (ptr2) && !pi1->pt.null) > + return true; > + if (TREE_CODE (ptr2) == SSA_NAME) > + { > + struct ptr_info_def *pi2 = SSA_NAME_PTR_INFO (ptr2); > + if (!pi2 > + || pi2->pt.vars_contains_restrict > + || pi2->pt.vars_contains_interposable) > + if (!pi1->pt.null || !pi2->pt.null) > + return !pt_solutions_intersect (&pi1->pt, &pi2->pt); > + } > +} > >return false; > } > > > but the testcase shows the non-null-ness is only conditional which means > we'd have to use a range query above which necessarily falls back to > the global ranges given we don't have any context available here. The > old EVRP adjusted global ranges during the walk but this is no longer done. > You mean it lied? because x_1 is not NULL until after _8 = *x_1(D); executes. It can still be NULL on that stmt can it not? Did it reset the global value afterwards? Contextually ranger knows both are non-null at EVRP time: a.0_27 : [irange] int[0:D.] * [1, +INF] 2->3 (T) x_1(D) : [irange] int * [1, +INF] 2->3 (T) a.0_27 : [irange] int[0:D.] * [1, +INF] 2->4 (F) x_1(D) : [irange] int * [1, +INF] 2->4 (F) a.0_27 : [irange] int[0:D.] * [1, +INF] So we know x_1 is non-NULL after the de-reference for the rest of the block (and function). It also sets a.0_27 globally to be [1, +INF]. > Note it's enough that one pointer is nonnull, so for your idea the > API could be extended with a bool one_ptr_nonnull parameter. ranger currently sets a.0 globally to be non-null in EVRP.
[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 --- Comment #15 from Andrew Macleod --- (In reply to Richard Biener from comment #14) > (In reply to Andrew Macleod from comment #13) > > > > We would have tripped over this earlier if SCEV or one of those other places > > using range_of_expr without context had instead invoked range_of_stmt. That > > would have been perfectly reasonable, and would have resulting in these same > > issues. We never tripped over it because range_of_stmt is not used much > > outside of ranger. That is the primary reason I wanted to track this down. > > There were alternative paths to the same end result that would have > > triggered these issues. > > It sounds like this part is a bugfix? Technically yes. When I commit it to trunk, I would split this into 3 patches. 1 - The change to get_global_range to not process PHIS. This is only a bug that shows up if range_of_stmt is called from within SCEV. 2 - Make cache propagation re-entrant This is also only currently needed if SCEV invokes range_of_stmt 3 - call range_of_stmt from range_of_expr when context-less. This patch causes SCEV to call range_of_stmt indirectly through range_of_expr. I don't think the buglet would show up in the current release simply because SCEV never calls ranger with a context, and thus range_of_stmt never gets invoked by it. It might be worthwhile to put the patchset (or at least the first 2?) in the first point release after they've been committed to trunk in case something else wants to be backported which trips over it.
[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 --- Comment #13 from Andrew Macleod --- Created attachment 57638 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57638&action=edit patch Ok, there were 2 issues with simply invoking range_of_stmt, which this new patch resolves. IF we aren't looking to fix this in GCC 14 right now anyway, this is the way to go. 1) The cache has always tried to provide a global range by pre-folding a stmt for an estimate using global values. This is a bad idea for PHIs when SCEV is invoked AND SCEV is calling ranger. This changes it to not pre-evaluate PHIs, which also saves time when functions have a lot of edges. Its mostly pointless for PHIs anyway since we're about to do a real evaluation. 2) The cache's entry range propagator was not re-entrant. We didn't previously need this, but with SCEV (and possible other place) invoking range_of_expr without context and having range_of_stmt being called, we can occasionally get layered calls for cache filling (of different ssa-names) With those 2 changes, we can now safely invoke range_of_stmt from a contextless range_of_expr call. We would have tripped over this earlier if SCEV or one of those other places using range_of_expr without context had instead invoked range_of_stmt. That would have been perfectly reasonable, and would have resulting in these same issues. We never tripped over it because range_of_stmt is not used much outside of ranger. That is the primary reason I wanted to track this down. There were alternative paths to the same end result that would have triggered these issues. Give this patch a try. it also bootstraps with no regressions. I will queue it up for stage 1 instead assuming all is good.
[Bug tree-optimization/110199] [12/13/14 Regression] Missing VRP transformation with MIN_EXPR and known relation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110199 --- Comment #5 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #4) > Just looking at the generated code of #c0 with -O2 on x86_64, this regressed > with > r13-3596-ge7310e24b1c0ca67b1bb507c1330b2bf39e59e32 > Andrew, are you going to address this for GCC 14, or defer to GCC 15? Id prefer to defer it I think. Although we can run that thru the testings if anyone really wants it. Maybe in GCC 15 someone can add relations in general to simplifications
[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 --- Comment #12 from Andrew Macleod --- (In reply to Richard Biener from comment #11) > (In reply to Richard Biener from comment #10) > > (In reply to Andrew Macleod from comment #9) > > > Created attachment 57620 [details] > > > proposed patch > > > > > > Does this solve your problem if there is an active ranger? it bootstraps > > > with no regressions > > > > I'll check what it does. > > > So the important part is that it got the fact that _12 is positive. As > analyzed in earlier comments I think that's all we can do, we don't know > anything about the other variable involved and thus can't avoid the > unsigned punning during SCEV analysis. Yeah, I wouldn't want to invoke any dynamic lookup changes at this point. that would be too hard to predict or contain.I will continue poking at what is triggering the loop issues because I think its a good longer term solution to have range_of_expr with no context to invoke range_of_stmt if the DEF is in the IL and has not been processed. > > I think it's a good change, let's keep it queued for stage1 at this point > unless we really know a case it helps to avoid a regression with > r14-9193-ga0b1798042d033 > > For testing, what's the "easiest" pass/thing to do to recompute global > ranges now? In the past I'd schedule EVRP but is there now a ranger > API to do this? Just to see if full global range compute before IVOPTs > would help. all VRP passes are the same now. so just schedule EVRP. in theory, you could schedule the fast vrp pass I added, but its not heavily tested... but you could try it. It doesnt do any back edges or switches (iirc), but does basic calculations in DOM order and exports/updates globals. NEXT_PASS (pass_fast_vrp)
[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 --- Comment #9 from Andrew Macleod --- Created attachment 57620 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57620&action=edit proposed patch Does this solve your problem if there is an active ranger? it bootstraps with no regressions ITs pretty minimal, and basically we invokes the cache's version of range_of_expr if there is no context. I tweaked it such that if there is no context, and the def has not been calculated yet, it calls range_of_def, and combines it with any SSA_NAME_RANGE_INFO that may have pre-existed. All without invoking any new lookups. This seems relatively harmless and does not spawn new dynamic lookups. As long as it resolves your issue... If it still does not work, and we require the def to actually be evaluated, I will look into that. we prpbably should do that anyway. There appears to be a cycle when this is invoked from the loop analysis, probably because folding of PHIs uses loop info... and back and forth we go.I'd probably need to add a flag to the ranger instantiation to tell it to avoid using loop info. Are we looking to fix this in this release?
[Bug tree-optimization/113632] Range info for a^CSTP2-1 could be improved in some cases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113632 Andrew Macleod changed: What|Removed |Added CC||amacleod at redhat dot com --- Comment #1 from Andrew Macleod --- (In reply to Andrew Pinski from comment #0) > Take: > ``` > void dummy(); > _Bool f(unsigned long a) > { > _Bool cmp = a > 8192; > if (cmp) goto then; else goto e; > then: > unsigned long t = __builtin_clzl(a); // [0,50] > t^=63; // [13,63] > return t >= 13; > e: > dummy(); > return 0; > } > ``` > > Currently after the t^=63; we get: > ``` > # RANGE [irange] int [1, 63] MASK 0x3f VALUE 0x0 > _7 = _1 ^ 63; > ``` > > But this could/should be improved to [13,63]. > > If we change to using minus instead: > ``` > t = 63 - t; > ``` > > We get the better range and the comparison (t >= 13) is optimized away. > ``` > Folding statement: t_10 = 63 - t_9; > Global Exported: t_10 = [irange] long unsigned int [13, 63] MASK 0x3f VALUE > 0x0 > Not folded > ``` > > Yes this should up in real code, see the LLVM issue for more information on > that. I think the current implementation of "operator_bitwise_xor::wi_fold ()" in range-op.cc was simply ported from the original version we used in the old VRP code. so it is neither multi-range awre, nor been enhanced. If you put a break point there, you'll see its getting: (gdb) p lh_lb.dump() [0], precision = 32 $1 = void (gdb) p lh_ub.dump() [0x32], precision = 32 $2 = void (gdb) p rh_ub.dump() [0x3f], precision = 32 $3 = void (gdb) p rh_lb.dump() [0x3f], precision = 32 $4 = void One could conceivable do something much better than the general masking stuff that goes on if rh_lb == rh_ub. I suspect we could probably do a better job in general, but have never looked at it. It also looks like we make some minor attempts with signed values in wi_optimize_signed_bitwise_op (), but again, I do not think anyone has tried to make this code do anything new yet.
[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 --- Comment #7 from Andrew Macleod --- (In reply to Richard Biener from comment #6) > (In reply to Andrew Macleod from comment #5) > > (In reply to rguent...@suse.de from comment #4) > > > > > > > > What was definitely missing is consideration of POLY_INT_CSTs (and > > > variable polys, as I think there's no range info for those). > > > > > Ranger doesn't do anything with POLY_INTs, mostly because I didn't > > understand them. > > > > > We do eventually want to improve how ranger behaves here. I'm not sure > > > why when we do not provide a context 'stmt' it can't see to compute > > > a range valid at the SSA names point of definition? (so basically > > > compute the global range) > > > > The call looks like it doesn't provide the stmt. Without the stmt, all > > ranger will ever provide is global ranges. > > > > I think you are asking why, If there is no global range, it doesn't try to > > compute one from the ssa_name_def_stmt? Ranger does when it is active. > > I tried with an active ranger but that doesn't make a difference. Basically > I added enable_ranger () / disable_ranger () around the pass and thought > that would "activate" it. But looking at range_for_expr I don't see how > that would make a difference without a provided stmt. > It wouldn't. why isn't a context stmt being provided? range_of_expr with no context stmt makes no attempt to calculate anything. This is because one can get into a lot of trouble as it doesn't know whether the expression you are calculating is even in the IL or just some detached tree expression. If you have an SSA NAME and want to actually calculate the value, you can use range_of_stmt (range, SSA_NAME_DEF_STMT (name)) instead of range_of_expr (). If you pass in a stmt as context, and the SSA_NAME you are asking about is the LHS of the stmt, then range_of_expr will call range_of_stmt itself... but again, it needs a context stmt in order to know its safe to do so. In general, range_of_expr with no context will not calculate anything... When a stmt for location context is provided, then its free to go an do whatever calculations are required.
[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since g:a0b1798042d033fd2cc2c806afbb77875dd2909b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151 --- Comment #5 from Andrew Macleod --- (In reply to rguent...@suse.de from comment #4) > > What was definitely missing is consideration of POLY_INT_CSTs (and > variable polys, as I think there's no range info for those). > Ranger doesn't do anything with POLY_INTs, mostly because I didn't understand them. > We do eventually want to improve how ranger behaves here. I'm not sure > why when we do not provide a context 'stmt' it can't see to compute > a range valid at the SSA names point of definition? (so basically > compute the global range) The call looks like it doesn't provide the stmt. Without the stmt, all ranger will ever provide is global ranges. I think you are asking why, If there is no global range, it doesn't try to compute one from the ssa_name_def_stmt? Ranger does when it is active. Otherwise it simply picks up the global information from SSA_NAME_RANGE_INFO() Of course, if its a poly int, we don't actually support those... so all you will ever see is VARYING. Again, because I don't understand them. > Maybe there's another API to do that. But I thought it was convenient > to use range_of_expr as that also handles GENERIC expression trees > to some extent and those are common within SCEV. A range can always be calculated by simply calling fold_range from gimple-range-fold.h: // Fold stmt S into range R using range query Q. bool fold_range (vrange &r, gimple *s, range_query *q = NULL); if range_query is not provided, it'll simply use the current one. If you want to ensure its only global ranges, you call it with fold_range (r, SSA_NAME_DEF_STMT (name), get_global_range_query ());
[Bug tree-optimization/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086 --- Comment #9 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #8) > Unfortunately doing the ((682 >> x) & 1) to x & 1 optimization in match.pd > isn't possible, we can only use global ranges there and we need path > specific range here. > Can it be done in VRP pass? Though, I'm afraid I'm quite lost where it > actually has > the statement optimizations (rather than mere computing of ranges), > Aldy/Andrew, any hints? I mean like what old tree-vrp.c was doing in > simplify_stmt_using_ranges. I don't think much has changed there... We still call into all the code in vr-values.cc to do simplifications. I think Aldy changed it all to be contained in class 'simplify_using_ranges'.. but those routines are all still in vr-values.cc. tree-vrp calls into the top level simplfy() routine. bool fold_stmt (gimple_stmt_iterator *gsi) override { bool ret = m_simplifier.simplify (gsi); if (!ret) ret = ::fold_stmt (gsi, follow_single_use_edges); return ret; } If that fails, then rangers fold_stmt() is invoked. That is merely a contextual wrapper around a call to gimple-fold::fold_stmt to see if normal folding can find anything. Under the covers I believe that invokes match.pd which, if it was using the current range_query, would get contextual info. > Guess we could duplicate that in match.pd for the case which can use global > range or > doesn't need any range at all. > I mean > unsigned int > foo (int x) > { > return (0xU >> x) & 1; > } > > unsigned int > bar (int x) > { > return (0xU >> x) & 1; > } > > unsigned int > baz (int x) > { > if (x >= 22) __builtin_unreachable (); > return (0x5aU >> x) & 1; > } > can be optimized even with global ranges (or the first one with no ranges). > foo and baz equivalent is x & 1, while bar is (~x) & 1 or (x & 1) ^ 1, dunno > what is more canonical.
[Bug middle-end/113907] [12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907 --- Comment #47 from Andrew Macleod --- (In reply to Andrew Macleod from comment #46) > (In reply to Jan Hubicka from comment #43) > > > // See discussion here: > > > // https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571709.html > > Discussion says: > > > > "Once legacy evrp is removed, this won't be an issue, as ranges in the IL > > will tell the truth. However, this will mean that we will no longer > > remove the first __builtin_unreachable combo. But ISTM, that would be > > correct behavior ??." > > > > So perhaps, we could remove that special case for default def and phi? > > It is an odd thing and we clearly lose info here. > > > > legacy VRP has been removed now. So in theory we are free to do as we > want.. but I don't remember the specific details. > > So do you just want to always use get_range_global() ? and not do the check? > > I can try changing it to just get_global and see what happens. FWIW, diff --git a/gcc/value-query.cc b/gcc/value-query.cc index 040c843c566..0ab10bc5a46 100644 --- a/gcc/value-query.cc +++ b/gcc/value-query.cc @@ -353,16 +353,9 @@ get_range_global (vrange &r, tree name, struct function *fun = cfun) void gimple_range_global (vrange &r, tree name, struct function *fun) { - tree type = TREE_TYPE (name); gcc_checking_assert (TREE_CODE (name) == SSA_NAME); - if (SSA_NAME_IS_DEFAULT_DEF (name) || (fun && fun->after_inlining) - || is_a (SSA_NAME_DEF_STMT (name))) -{ - get_range_global (r, name, fun); - return; -} - r.set_varying (type); + get_range_global (r, name, fun); } bootstraps and runs the testsuites clean on x86-64 now...
[Bug middle-end/113907] [12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907 --- Comment #46 from Andrew Macleod --- (In reply to Jan Hubicka from comment #43) > > // See discussion here: > > // https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571709.html > Discussion says: > > "Once legacy evrp is removed, this won't be an issue, as ranges in the IL > will tell the truth. However, this will mean that we will no longer > remove the first __builtin_unreachable combo. But ISTM, that would be > correct behavior ??." > > So perhaps, we could remove that special case for default def and phi? > It is an odd thing and we clearly lose info here. > legacy VRP has been removed now. So in theory we are free to do as we want.. but I don't remember the specific details. So do you just want to always use get_range_global() ? and not do the check? I can try changing it to just get_global and see what happens.
[Bug tree-optimization/113879] missed optimization - not exploiting known range of integers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113879 --- Comment #2 from Andrew Macleod --- Not so much a cycle issue as a backward propagation issue. : goto ; [INV] : _1 = (long unsigned int) j_10; <..> if (j_10 >= -1) goto ; [INV] else goto ; [INV] : __builtin_trap (); : j_18 = j_10 + 1; : # j_10 = PHI _9 = i_12(D) + 3; if (_9 >= j_10) goto ; [INV] else goto ; [INV] : return; 'i' is only ever referenced twice. Very first thing on the edge from 2->6 as the initial value for j_10, and then in the calculation of _9 which is then used in the branch against j_10. That initial value of i_12 we have no way of knowing can't be INT_MAX. Its only later during the calculation _9 = i_12 + 3 that we can infer that i_12 must be INT_MAX-3. We certainly know after the branch that 6->3 (T) i_12(D) : [irange] int [-INF, 2147483644] Going back to examine the initial value use on the edge 2->6 is not something that we would normally expect to have to do. Its similar to the __builtin_unreachable() problem in that we can infer the global range of i_12 based on that addition statement, but detecting that is the case in general circumstance is not trivial as we have to go look at all earlier uses to make sure they are post dominated by the statement. There is also the additional option that I do no believe we currently register an inferred range on the statement _9 = i_12(D) + 3; for i_12. Adding an inferred range for every arithmetic statement would come with a cost.. not sure exactly what it would be, but we weren't expecting inferred ranges from the majority of statements. we don't normally need that because as you can see, we get the ranges right after the branches anyway. I could do a dry run and see what the time differential is. We could consider adding those inferred ranges at -O3. we could also consider an enhancement that works like the builtin_unreachable() removal pass, but looks at all inferred ranges in the function as well to see if they are applicable in a global context and adjusts the global value if appropriate. Which they would be in a case like this.
[Bug middle-end/102580] Failure to optimize signed division to unsigned division when dividend can't be negative
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102580 --- Comment #6 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #5) > To be precise, expand_expr_divmod uses get_range_pos_neg for that during > expansion (unless we'd e.g. somehow noted it in some very late pass before > expansion that the division ought to be expanded both ways and cost > compared), and get_range_pos_neg uses the global range of SSA_NAME only. In > order to optimize #c0 we'd need to query range with the use stmt and > enabling ranger in the pass (that is possible in some pass before expansion, > but doing it during expansion (which would be useful to other spots too, say > .*_OVERFLOW expansion) would need to deal with some basic blocks already > converted into RTL and others still at GIMPLE). Im working on a logging mechanism for ranges for GCC 15. Its possible that a side effect of this work could make some selective contextual ranges available from the global table after .. in which case we could get things like this as if ranger was running. My other question. so is the issue that unsigned divides are cheaper than signed divides? the global range of _2 is set: _2 : [irange] int [0, 715827882] Given the statements in .optimized are: [local count: 1073741824]: _2 = x_1(D) / 3; return _2; could we not actually conclude that the divide can be unsigned based on the result being positive and the divisor being positive? We have "simple" versions of fold_range() which would calculate _2 on the statement from the global value of x_1, but there aren't any simple versions of the operand calculators. It would be fairly trivial to provide one which, given the global value for _2, you could ask op1_range (stmt) and get back a value of [0, +INF] for x_1 at that statement. If that would help... THey are going to be added for GCC 15 anyway...
[Bug tree-optimization/113735] ICE: in operator[], at vec.h:910 with _BitInt() at -O and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113735 --- Comment #3 from Andrew Macleod --- Seems reasonable to me, adding BBS should be fine throughout. just don't re-order them :-)
[Bug tree-optimization/113716] Missed optimization: (2/a)*b*(!b) => 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113716 --- Comment #4 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #3) > Maybe ranger could figure out that at least one of the multiplication > operands is zero in this case, because the second one is non-zero only if > the first one is zero? _2 = b_6(D) == 0; _3 = (int) _2; _4 = _3 * b_6(D); well, its kind of sort of predicate based. I mean, the information is calculable if one thought it was worth it. ie at the multiply, we know _3 is [0, 1] and we know b_6 is VARYING because there are only 2 ranges in _3, you could calculate using GORI "if _3 is [0,0], what is b_6?" (~[0, 0]) and "if _3 is [1,1], whats b_6?" [0,0]. do the 2 multiplies and union the results.. you'd get [0,0] We might need a slight API tweak to be able to ask that. we have fold_stmt routines for the folding, but I dont think I have added any "what if" hooks to GORI yet.
[Bug tree-optimization/113475] [14 Regression] phi_analyzer::m_tab contents leak
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113475 --- Comment #4 from Andrew Macleod --- yoinks. Not sure how I missed that. thanks.
[Bug tree-optimization/113440] Missed optimization for redundancy computation elimination because of missed judgment for unsigned overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113440 --- Comment #2 from Andrew Macleod --- (In reply to Richard Biener from comment #1) > Yeah, looks like > > if (a+a < a) > > for unsigned doesn't register the appropriate range on the false edge. _1 = a_5(D) * 2; if (_1 < a_5(D)) goto ; [INV] else goto ; [INV] : Relational : (_1 >= a_5(D)) if (a_5(D) == 0) goto ; [INV] else goto ; [INV] _1 [irange] unsigned int [0, 0][2, +INF] MASK 0xfffe VALUE 0x0 a_5(D) [irange] unsigned int [1, +INF] : _3 = _1 / a_5(D); n = _3; I think the ranges as such are fine, but the missing bit is that we KNOW _1 >= a_5, but we do not use that information. 1) Ranger doesn't fullt leverage relations everywhere yet, in particaulr multiply makes no attempt to use them or the recomputation _1 would be [2, +INF] in bb5 (Since a_5 is now [1, +INF]) 2) ranger could then utilize that in the divide to set the range of _3 to [1, +INF] (which we don't do currently). But neither of those are the real issue. 3) It requires a simplification type operation to look at _1 and understand that the divide is a_5 * 2 / a_5, with the known relation that a_5 * 2 >= a_5. This would be fairly trivial with the relation oracle if we can wire it into one of the simplification routines. For example, If I look in simplify_using_ranges::simplify_div_or_mod_using_ranges when we process the _3 = _1 / a_5(D) statement: p query->query_relation (stmt, op0, op1) VREL_GE so we know... but we do not look at _1 to see that it is defined as op1 * 2. I don't know if we can push that sort of thing into the more general match.pd code... it seems like it would be useful there too. It is something on the list to eventually look into.
[Bug other/110205] Some new warnings from clang for the range code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110205 Andrew Macleod changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #6 from Andrew Macleod --- fixed.
[Bug tree-optimization/110251] [13/14 Regression] Hang at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110251 Andrew Macleod changed: What|Removed |Added Resolution|FIXED |--- Status|RESOLVED|REOPENED --- Comment #14 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #13) > r14-2097-g4dfeb1cd8dfca234186216d891ec8f46235c3a14 > was a trunk commit, was that backported to 13 branch too (or was it fixed > there some other way)? > In any case, guess we should include the testcase into the testsuite to make > sure it doesn't reappear. hmm. no, I Missed that this was gcc 13 too.
[Bug tree-optimization/110251] [13/14 Regression] Hang at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110251 Andrew Macleod changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #12 from Andrew Macleod --- fixed via commit 4dfeb1cd8dfca234186216d891ec8f46235c3a14
[Bug tree-optimization/110251] [13/14 Regression] Hang at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110251 --- Comment #11 from Andrew Macleod --- (In reply to Andrew Pinski from comment #10) > (In reply to Anonymous from comment #9) > > (In reply to Andrew Pinski from comment #1) > > > dom3 : > > > ``` > > > > Could you please explain on how you to record this trace? Is there any > > specific compilation option to enable this information? Thanks. > > I used gdb to figure it out. Just stopped it a few times during the > compiling and saw the backtrace was similar/stuck. this is resolved is it not?
[Bug tree-optimization/113301] [12/13/14 Regression] Missed optimization: (1/(x+1))/2 => 0 since gcc-12
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113301 --- Comment #6 from Andrew Macleod --- (In reply to Andrew Macleod from comment #5) > (In reply to Jakub Jelinek from comment #4) > > Even then, I wonder why ranger doesn't figure this out. > > (x+1u) <= 2 ? x : 0 > > must have a range [-1, 1] and [-1, 1] / [2, 2] range should be [0, 0], no? > > its because there is no branch which is what drives ranger. At this point, > we aren't quite smart enough to completely evaluate the 2 operands of a > conditional as if they were actual branches. > ie > _1 = x_4(D) + 1; > _10 = (unsigned int) x_4(D); > _6 = _10 + 2; > _7 = _6 <= 2; > _2 = _7 ? _1 : 0; > > if that were: > if (_6 <= 2) > _2 = _1 > we'd recalculate _1 with the condition being (_6 <= 2) and we come upwith > [-1, 1] for _1 instead of varying. > > I'll have to look at whats involved in enhancing the fold code to invoke > GORI to reevaluate _1 if _7 is [1,1]. in theory is not too difficult... :-) ah, its more complicated than that. we normally do this evaluation, but the cond_expr is using _1.. if you trace back from _6 in the condition, _1 is not in the dependency chain anywhere, so GORi cannot compute anything for it. it can compute that x_4 is [-2, 0] but it doesnt see any connection between _6 in the condition and _1. the remaining question is whether this can be cheaply identified as a recomputation.. in which case we could recompute _1 usin the [-2, 0] for x_4 and come up with [-1, 1] I'll have a look if we can easily invoke hte recompuation code the edges evaluations use or nor
[Bug tree-optimization/113301] [12/13/14 Regression] Missed optimization: (1/(x+1))/2 => 0 since gcc-12
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113301 Andrew Macleod changed: What|Removed |Added CC||amacleod at redhat dot com --- Comment #5 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #4) > Even then, I wonder why ranger doesn't figure this out. > (x+1u) <= 2 ? x : 0 > must have a range [-1, 1] and [-1, 1] / [2, 2] range should be [0, 0], no? its because there is no branch which is what drives ranger. At this point, we aren't quite smart enough to completely evaluate the 2 operands of a conditional as if they were actual branches. ie _1 = x_4(D) + 1; _10 = (unsigned int) x_4(D); _6 = _10 + 2; _7 = _6 <= 2; _2 = _7 ? _1 : 0; if that were: if (_6 <= 2) _2 = _1 we'd recalculate _1 with the condition being (_6 <= 2) and we come upwith [-1, 1] for _1 instead of varying. I'll have to look at whats involved in enhancing the fold code to invoke GORI to reevaluate _1 if _7 is [1,1]. in theory is not too difficult... :-)
[Bug tree-optimization/110199] [12/13/14 Regression] Missing VRP transformation with MIN_EXPR and known relation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110199 --- Comment #3 from Andrew Macleod --- (In reply to Andrew Pinski from comment #2) > I Kinda see how to implement this by creating > operator_min::fold_range/operator_max::fold_range but I am still new on > using these interfaces so I am not 100% sure how to use them. Actually, on ranger, we'd be able to make the range choice of the range of a_2 or b_3, but it can't rewrite the IL... and since the range of both is varying, fold_range would still return varying. Unless we indicate there are relations. fodl_range itself only takes what it is given, so we have to query the relations first. In theory all that is missing is to teach simplification about relation queries. For instance, in simplify_using_ranges::fold_cond_with_ops, we are invoking the range-op handler without any relations.. we query the ranges, but not the relation. If we add something like this (and make sure both operands are symbolic) diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc index ecb294131b0..ad2c2d6c090 100644 --- a/gcc/vr-values.cc +++ b/gcc/vr-values.cc @@ -315,10 +315,17 @@ simplify_using_ranges::fold_cond_with_ops (enum tree_code code, || !query->range_of_expr (r1, op1, s)) return NULL_TREE; + relation_kind rel = VREL_VARYING; + if (gimple_range_ssa_p (op0) && gimple_range_ssa_p (op1)) +rel = query->query_relation (s, op0, op1); + // Create a trio with the relation set between op0 and op2 for folding. + // TRIOS are lhs-op0, lhs-op1, op0-op1 relations. + relation_trio trio (VREL_VARYING, VREL_VARYING, rel); + tree type = TREE_TYPE (op0); int_range<1> res; range_op_handler handler (code); - if (handler && handler.fold_range (res, type, r0, r1)) + if (handler && handler.fold_range (res, type, r0, r1, trio)) { if (res == range_true (type)) return boolean_true_node; This should do what you want I think... fold_range should use the relation passed in to determine that the condition is always true or false. I have not fully tested this patch, fwiw.
[Bug tree-optimization/112843] during GIMPLE pass: bitintlower ICE: SIGSEGV in ranger_cache::range_of_expr (gimple-range-cache.cc:1204) with _BitInt() at -O1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112843 --- Comment #9 from Andrew Macleod --- Created attachment 56790 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56790&action=edit auxially patch to avid the trap refining a range is fine... the only issue we are really running into here that is causing the trap is that the "update" on the original LHS DEF is a new stmt which hasnt been added to the IL yet, so ranger is trapping when it asks for the basic block of the def stmt.. It had one before, it expects it to still have one when it looks at the context of the stmt to update the range. Now that said, the update issue is primarily within the cache. if we are updating and the BB isn't set, we could simply just pick up the global value. Patch is attached, and pre-approved if you want to use it.
[Bug tree-optimization/112843] during GIMPLE pass: bitintlower ICE: SIGSEGV in ranger_cache::range_of_expr (gimple-range-cache.cc:1204) with _BitInt() at -O1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112843 --- Comment #7 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #6) > (In reply to Andrew Macleod from comment #5) > > what do you mean? when a statement is changed, it may generate a different > > range than it did before, > > No, that would be a bug. If some IL changes need to extend existing range, > then > reset_flow_sensitive_info needs to be called for the SSA_NAME. Not from rangers point of view. You may change _1 = _2 + 5 to _1 = _6 + 5 and it finds it has a better understanding of the range of _6, and so it can further refine the range for _1 based on that.So it merely checks to see if the changes to the statement means it can understand it better. In particular: const char *lossage = _result ? "constant string" : 0; if (__builtin_expect (lossage != ((void *)0) , 0)) { unsigned __message_length = __builtin_strlen (lossage); if (! __builtin_constant_p (__message_length)) dead (); } This provides an IL update which happens later in compilation when builtin_expect gives us a rbetter result which gives us a much better range and can then fold away the call to dead() > > > so we re-evaluate the statement to see if the > > result is different. If it is, then we update it in the cache. > > > > All its doing is re-calculating and updating the cache values. > > > > It looks to me like the problem is the stmt is being added in a way that > > leaves the IL in an illegal state, > > But unfortunately that is pretty much unavoidable in changes like this. > So, I have > lhs1 = some_stmt; > and want to change it to > lhs2 = some_stmt; > lhs1 = (lhs_type) lhs2; > (where lhs2 has a different type from lhs). > The options to do that are either what the code does right now, i.e. > first create lhs1 = (lhs_type) lhs2; stmt, add it after lhs1 = some_stmt; > (including update_stmt), then gimple_set_lhs on the first stmt to lhs2, then > update_stmt on the first stmt, but this is temporarily invalid IL, because > two different stmts in the IL have the same lhs, or as changed in the patch > gimple_set_lhs on the first stmt to lhs2 (but no update_stmt), create the > second stmt, add it including update_stmt, then update_stmt on the first > one; this is also invalid IL, the effects of update_stmt haven't been done > until the second update_stmt; or gimple_set_lhs and update_stmt on the first > one (but then lhs1 has no definition and we insert a stmt into the IL > without the definition, so again temporarily invalid IL). Im still confused tho. So what I think you are doing is taking lhs1 = some_stmt and rewriting it to a different type, and it looks like looping thru the operands so that some_stmt is consuming the right types too.. so the type of some_stmt is changing from lhs1_type to lhs2_type. adn then we create a new stmt for lhs1. to avoid lying why dont we just create a new stmt 'g' before stmt lhs2 = some_stmt with the operands converted and then when that is done, simply update stmt lhs1 = (lhs1_type) lhs2 the the IL is never in an invalid state? Otherwise if we are going to have invalid IL states, we have to turn off any automatic updating... you could possible just save all the update/sets until the end.. that might work.The biggest issue would be that lhs1 was in the IL so ranger expects the block to be set for its def stmt and it is being added via a new stmt whic is not in the IL yet when the update happens... its expects gimple_bb to be set for a DEF that it has seen. Isn't the "normal" way to do this to leave lhs1 = some_stmt and then add lhs2 = (lhs2_type) lhs1 and follow immeidate uses and change all the occurences of lhs1 to lhs2?
[Bug tree-optimization/112843] during GIMPLE pass: bitintlower ICE: SIGSEGV in ranger_cache::range_of_expr (gimple-range-cache.cc:1204) with _BitInt() at -O1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112843 --- Comment #5 from Andrew Macleod --- (In reply to Richard Biener from comment #3) > (In reply to Richard Biener from comment #2) > > what?! Ick. It definitely shouldn't re-fold anything but only scrap caches > > _at most_. > > So it does > > // Only update if it already had a value. > if (m_cache.get_global_range (r, lhs)) > { > // Re-calculate a new value using just cache values. > Value_Range tmp (TREE_TYPE (lhs)); > fold_using_range f; > fur_stmt src (s, &m_cache); > f.fold_stmt (tmp, s, src, lhs); > > // Combine the new value with the old value to check for a change. > if (r.intersect (tmp)) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > { > print_generic_expr (dump_file, lhs, TDF_SLIM); > fprintf (dump_file, " : global value re-evaluated to "); > r.dump (dump_file); > fputc ('\n', dump_file); > } > m_cache.set_global_range (lhs, r); > > WTF? If a pass invalidates a range it needs to properly do this itself. > But update_stmt itself _never_ should alter range info. what do you mean? when a statement is changed, it may generate a different range than it did before, so we re-evaluate the statement to see if the result is different. If it is, then we update it in the cache. All its doing is re-calculating and updating the cache values. It looks to me like the problem is the stmt is being added in a way that leaves the IL in an illegal state, tree lhs2 = make_ssa_name (type); gimple *g = gimple_build_assign (lhs, NOP_EXPR, lhs2); if (stmt_ends_bb_p (stmt)) { edge e = find_fallthru_edge (gsi_bb (gsi)->succs); gsi_insert_on_edge_immediate (e, g); } else gsi_insert_after (&gsi, g, GSI_SAME_STMT); gimple_set_lhs (stmt, lhs2); (gdb) p print_generic_expr (stderr, lhs, 0) _1$3 = void (gdb) p print_generic_expr (stderr, lhs2, 0) _12$4 = void (gdb) p print_gimple_stmt (stderr, stmt, 0, 0) _1 = x_4(D) * 5; $5 = void (gdb) p print_gimple_stmt (stderr, g, 0, 0) _1 = (_BitInt(128)) _12; $6 = void So we have _1 = x_4(D) * 5; then we create _1 = (_BitInt(128)) _12; And add it to the IL... and finally change the original stmt to _12 = x_4(D) * 5 how is that right? _1 is now a different type? but regardless, we have 2 statements with _1 as a LHS for a brief time. And rangers real complaint is that we have a range for _1, but its being updated by a stmt which is not actually in the IL yet during this update.. so it is finding no basic block info for an SSA name which is thinks its knows something about already because it WAS in the IL. It also does not seem correct to me that you can take an existing SSA_NAME and change its type on the fly while its still being used in the IL? Once we create an ssa name, I thought it was immutable from a type point of view untl deleted and reused? (gdb) p print_generic_expr (stderr, lhs->typed.type, 0) _BitInt(128)$7 = void (gdb) p print_generic_expr (stderr, lhs2->typed.type, 0) int128_t$8 = void
[Bug tree-optimization/112788] [14 regression] ICEs in fold_range, at range-op.cc:206 after r14-5972-gea19de921b01a6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112788 --- Comment #2 from Andrew Macleod --- (In reply to Kewen Lin from comment #1) > > ranger makes use of type precision directly instead of something like > types_compatible_p. I wonder if we can introduce a target hook (or hookpod) > to make ranger unrestrict this check a bit, the justification is that for > float type its precision information is encoded in its underlying > real_format, if two float types underlying modes are the same, the precision > are actually the same. > > btw, the operand_check_ps seems able to call range_compatible_p? It could, but just a precision check seemed enough at the time. The patch also went thru many iterations and it was only the final version that operand_check_p() ended up with types as the parameter rather than ranges. You bring up a good point tho. I just switched those routines to call range_compatible_p() and checked it in. Now it is all centralized in the one routine going forward. It does seem wrong that the float precision don't match, and weird that its hard to fix :-) It should now be possible to have range_compatible_p check the underlying mode for floats rather than the precision... If there's a good argument for it, and you want to give that a shot...
[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922 --- Comment #13 from Andrew Macleod --- Created attachment 56735 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56735&action=edit patch (In reply to Sam James from comment #12) > Is the plan to backport to 11/12/13 or to leave it? hmm. I don't think I would apply the same patch (it wouldn't work as is anyway), but if we wanted to fix this in earlier releases we could simply have the custom fold_ranges return false when the precision doesn't match... it would at least avoid most of these traps in earlier releases...? The attached patch for instance would probably apply to GCC 13, 12 and 11.. I can test these if we want to do that...
[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922 Andrew Macleod changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #11 from Andrew Macleod --- fixed.
[Bug tree-optimization/111293] [14 Regression] Missed Dead Code Elimination since r14-3414-g0cfc9c953d0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111293 --- Comment #2 from Andrew Macleod --- It seems like we set 'e' to 3 immediately at the start: [local count: 1073741824]: e = 3; goto ; [100.00%] and it is never changed again. However, when we load from 'e' later in the IL [local count: 9485263241]: e.1_6 = e; we simply get varying. Is some pass suppose to propagate this? This reminds me of a few other regression PRs where we no longer propagate known values from loads from memory into ssa-names. If we knew that e.1_6 was '3', then the call to foo would be folded away as never executable.
[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922 --- Comment #5 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #4) > > I think > Value_Range vr (operand_type); > if (TREE_CODE_CLASS (operation) == tcc_unary) > ipa_vr_operation_and_type_effects (vr, >src_lats->m_value_range.m_vr, >operation, param_type, >operand_type); > should be avoided if param_type is not a compatible type to operand_type, > unless operation is some cast operation (NOP_EXPR, CONVERT_EXPR, dunno if > the float to integral or vice versa ops as well but vrp probably doesn't > handle that yet). > In the above case, param_type is struct A *, i.e. pointer, while > operand_type is int. the root of the issue is that the precisions are different, and we're invoking an operation which expects the precisions to be the same (minus in this case). we can't deal with this in dispatch because some operations allow the LH and RH to be different precisions or even types. It also seems like overkill to have every operation check the incoming precision, but perhaps not... we could limit it to the wi_fold() subsets.. let me have a look. if we get incompatible types, perhaps returning VARYING should be OK?
[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922 --- Comment #9 from Andrew Macleod --- (In reply to Jakub Jelinek from comment #8) > Well, in this case the user explicitly told compiler not to do that by not > using a prototype and syntax which doesn't provide one from the definition. > It is like using > int f1 (struct C *x, struct A *y) > { > ... > } > definition in one TU, and > int f1 (int, int); > prototype and > f1 (0, ~x) > call in another one + using LTO. What I meant is how to do decide if the > param_type vs. operand_type mismatch is ok or not. I vote we do nothing extra for those clowns! Just return VARYING for a range :-) it seems like the safest thing to do?
[Bug ipa/111922] [11/12/13/14 Regression] ICE in cp with -O2 -fno-tree-fre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111922 --- Comment #7 from Andrew Macleod --- Explicit casts would be no problem as they go through the proper machinery. The IL for that case has an explicit cast in it. _1 = (int) x_2(D); foo (_1); its when that cast is not present,and we try to, say subtract two values, that we have a problem. we expect the compiler to promote things to be compatible when they are suppose to be. This would apply to dual operand arithmetic like +, -, /, *, bitwise ops, etc. The testcase in particular is a bitwise not... but it has a return type that is 64 bits and a operand type that is 32. It was expected that the compiler would promote the operand to 64 bits if it expects a 64 bit result. At least for those tree codes which expect compatible types.. I don't think we want to get into overruling decisions at the range-ops level.. So we decide whether to trap (which would be the same result as we see now :-P), or handle it some other way. returning VARYING was my thought.. because it means something is amuck so say we dont know anything. Alternatively, if IPA could figure out when things need promoting.. GCC must already do it, although I suppose thats in the front ends :-P
[Bug tree-optimization/112509] [14 Regression] internal compiler error: in verify_range, at value-range.cc:1132
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112509 Andrew Macleod changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #5 from Andrew Macleod --- fixed.
[Bug tree-optimization/112509] GCC: 14: internal compiler error: in verify_range, at value-range.cc:1132
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112509 --- Comment #2 from Andrew Macleod --- the original switch is an unsigned value with precision of 3 bits, so 0..7 just fits. It gets translated to an int index during gimplification, but the case labels are still precision 3 values. find_case_label_ranges in tree-vrp.cc does the following: tree label = gimple_switch_label (switch_stmt, i); tree case_high = CASE_HIGH (label) ? CASE_HIGH (label) : CASE_LOW (label); wide_int wlow = wi::to_wide (CASE_LOW (label)); wide_int whigh = wi::to_wide (case_high); int_range_max label_range (type, wide_int::from (wlow, prec, sign), wide_int::from (whigh, prec, sign)); CASE_HIGH is indeed 6, but when we to the to_wide call, it is sign extended to produce: (gdb) p whigh.dump () [0xfffe], precision = 3 but then when we do wide_int::from() in the signed int precision 32 of the new switch type, we get -2 for the high value. Whats suppose to happen here? seems odd to me that the case labels are a different precision than the switch itself.. but then we would also lose the fact that there is only 3 bits of precision in the switch. Should the low and high values be cast from the original case precision/type to the new perhaps? something like this seems to work: tree find_case_label_range (gswitch *switch_stmt, const irange *range_of_op) @@ -900,9 +901,9 @@ find_case_label_range (gswitch *switch_stmt, const irange *range_of_op) = CASE_HIGH (label) ? CASE_HIGH (label) : CASE_LOW (label); wide_int wlow = wi::to_wide (CASE_LOW (label)); wide_int whigh = wi::to_wide (case_high); - int_range_max label_range (type, -wide_int::from (wlow, prec, sign), -wide_int::from (whigh, prec, sign)); + int_range_max label_range (TREE_TYPE (case_high), wlow, whigh); + range_cast (label_range, type); + label_range.dump(stderr);
[Bug tree-optimization/106511] [13/14 Regression] New -Werror=maybe-uninitialized since r13-1268-g8c99e307b20c502e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106511 --- Comment #8 from Andrew Macleod --- (In reply to Richard Biener from comment #2) > VRP could use max_stmt_executions here (note it doesn't properly handle loop > nests so the name is somewhat misleading) Do you have to ask for max_stmt_executions on each loop nest for the stmt to get the real total? and multiple them all together? or can you only count on loop_outer()?
[Bug tree-optimization/110540] [14 Regression] Dead Code Elimination Regression since r14-1163-gd8b058d3ca4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110540 --- Comment #3 from Andrew Macleod --- EVRP figures out the j value can only be 0 or -9 and removes a few stmts as a result, and simplifies the PHI, producing [local count: 114863530]: # j_12 = PHI <0(2), -9(6)> [local count: 1073741824]: # j_11 = PHI if (j_11 != -9) goto ; [94.50%] else goto ; [5.50%] which when we get to cunolli , looks like a loop and it does nothing. When EVRP doesn't reduce the range from [-9, 0], we are instead left with: : # j_16 = PHI <0(2), _35(7)> if (j_16 != -9) goto ; [INV] else goto ; [INV] After DSE, we have: : *_8 = 0; : # j_16 = PHI <0(2), -9(6), -9(5)> if (j_16 != -9) goto ; [INV] else goto ; [INV] and the CDDCE comes along and that turns into: : *_8 = 0; : # j_17 = PHI <0(2), -9(6)> : # j_16 = PHI if (j_16 != -9) goto ; [INV] else goto ; [INV] Which is what eventually causes our problem, and this ends up looking like another loop to cunrolli, and it does no analysis of it other than to mark it as a 3rd loop. And we never actually get rid of this. IN the original version, cunroll analysis the loop and turns it into something useful, without the extra phi.. If we dont do ccdce, then the same transformation shows up after switchconv in profile_estimate.. So IM guessing this is a CFG "cleanup". So EVRP cleans up the code, but it appears that it confuses cunrolli into thinking there is nothing to do?
[Bug tree-optimization/106511] [13/14 Regression] New -Werror=maybe-uninitialized since r13-1268-g8c99e307b20c502e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106511 --- Comment #7 from Andrew Macleod --- (In reply to Richard Biener from comment #2) > VRP could use max_stmt_executions here (note it doesn't properly handle loop > nests so the name is somewhat misleading) Given some arbitrary statement S, how do I find the relevant loop pointer to pass to max_stmt_executions?I don't suppose there is a generic version which will take care of that lookup? This seems like something the phi analyzer could easily use to calculate the number of times the modifier statement triggers when we have loop PHIs and some other PHI in parallel.. if I can easily ask how many times stmt S is executed...
[Bug tree-optimization/111472] [11/12/13/14 Regression] Wrong code at -Os on x86_64-linux-gnu since r11-4563-gd0d8b5d836
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111472 --- Comment #4 from Andrew Macleod --- (In reply to Andrew Macleod from comment #3) > I'm not sure that this didn't uncover something elsewhere, possibly ivopts > since that pass seems to be generating something different and I think there > were some fixes put in there on trunk?. > > Regardless, this currently works on trunk. Can we run a regression on trunk > and see what patch fixed it? > > Andrew And still fails on GCC13, so it appears to have been a trunk patch that fixed it.
[Bug tree-optimization/111472] [11/12/13/14 Regression] Wrong code at -Os on x86_64-linux-gnu since r11-4563-gd0d8b5d836
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111472 --- Comment #3 from Andrew Macleod --- I'm not sure that this didn't uncover something elsewhere, possibly ivopts since that pass seems to be generating something different and I think there were some fixes put in there on trunk?. Regardless, this currently works on trunk. Can we run a regression on trunk and see what patch fixed it? Andrew
[Bug tree-optimization/111766] Missed optimization with __builtin_unreachable and ands
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111766 Andrew Macleod changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #4 from Andrew Macleod --- oops. fixed
[Bug tree-optimization/111766] Missed optimization with __builtin_unreachable and ands
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111766 --- Comment #3 from Andrew Macleod --- fixed
[Bug tree-optimization/111622] [13 Regression] EVRP compile-time hog compiling risc-v insn-opinit.cc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111622 Andrew Macleod changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #9 from Andrew Macleod --- Fixed.
[Bug tree-optimization/111622] [13 Regression] EVRP compile-time hog compiling risc-v insn-opinit.cc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111622 --- Comment #6 from Andrew Macleod --- Interesting. The "fix" turns out to be: commit 9ea74d235c7e7816b996a17c61288f02ef767985 Author: Richard Biener Date: Thu Sep 14 09:31:23 2023 +0200 tree-optimization/111294 - better DCE after forwprop The following adds more aggressive DCE to forwprop to clean up dead stmts when folding a stmt leaves some operands unused. The patch uses simple_dce_from_worklist for this purpose, queueing original operands before substitution and folding, but only if we folded the stmt. This removes one dead stmt biasing threading costs in a later pass but it doesn't resolve the optimization issue in the PR yet. Which implies something pathological was triggering in VRP, so I dug a little deeper... It seems to be a massive number of partial equivalencies generated by sequences like: _5 = (unsigned int) _1; _10 = (unsigned int) _1; _15 = (unsigned int) _1; _20 = (unsigned int) _1; _25 = (unsigned int) _1; <...> for a couple of hundred statements. these are all then members of a partial equivalence set, and we end up doing obscene amounts of pointless looping and recomputing of ranges of things in the set when say _1 may change. The intent of partial equivalence is to allow us to reflect known subranges thru casts, but not to build up large groups like in an equivalence. There should be a limit to the size. We start to lose most of the usefulness when the grouping gets too large.
[Bug tree-optimization/111694] [13/14 Regression] Wrong behavior for signbit of negative zero when optimizing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111694 Andrew Macleod changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED --- Comment #11 from Andrew Macleod --- now fixed on gcc 13 too.
[Bug tree-optimization/108697] constructing a path-range-query is expensive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108697 Andrew Macleod changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #8 from Andrew Macleod --- fixed.
[Bug tree-optimization/111766] Missed optimization with __builtin_unreachable and ands
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111766 --- Comment #1 from Andrew Macleod --- Imports: bb_3(D) Exports: _2 bb_3(D) _2 : bb_3(D)(I) bb_3(D) [irange] int [0, 3] MASK 0x3 VALUE 0x0 : _2 = bb_3(D) & 1; if (_2 == 0) goto ; [INV] else goto ; [INV] _2 : [irange] int [0, 1] MASK 0x1 VALUE 0x0 4->5 (T) _2 : [irange] int [0, 0] MASK 0x1 VALUE 0x0 4->5 (T) bb_3(D) : [irange] int [0, 0][2, 2] MASK 0x2 VALUE 0x0 4->6 (F) _2 : [irange] int [1, 1] MASK 0x1 VALUE 0x0 4->6 (F) bb_3(D) : [irange] int [1, 3] MASK 0x2 VALUE 0x1 Looks like its just a lack of completeness in operator_bitwise_and::op1_range(). on the edge from 4->6 ranger knows _2 is [1,1], but when op1_range solves for _2 = bb_3 & 1 we have [1,1] = int [0, 3] MASK 0x3 VALUE 0x0& 1 and it produces int [1, 3] rather than [1,1][3,3]
[Bug tree-optimization/111694] [13/14 Regression] Wrong behavior for signbit of negative zero when optimizing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111694 --- Comment #9 from Andrew Macleod --- (In reply to Andrew Macleod from comment #8) > (In reply to Alexander Monakov from comment #7) > > No backport for gcc-13 planned? > > mmm, didn't realize were we propagating floating point equivalences around > in 13. similar patch should work there Testing same patch on gcc13. will let it settle on trunk for a day or two first, then check it in if nothing shows up.which it shouldn't :-)
[Bug tree-optimization/111694] [13/14 Regression] Wrong behavior for signbit of negative zero when optimizing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111694 Andrew Macleod changed: What|Removed |Added Resolution|FIXED |--- Status|RESOLVED|REOPENED --- Comment #8 from Andrew Macleod --- (In reply to Alexander Monakov from comment #7) > No backport for gcc-13 planned? mmm, didn't realize were we propagating floating point equivalences around in 13. similar patch should work there
[Bug tree-optimization/111694] [13/14 Regression] Wrong behavior for signbit of negative zero when optimizing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111694 Andrew Macleod changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #6 from Andrew Macleod --- fixed
[Bug tree-optimization/111694] [13/14 Regression] Wrong behavior for signbit of negative zero when optimizing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111694 --- Comment #4 from Andrew Macleod --- (In reply to Richard Biener from comment #3) > Looks like some frange / relation mistake then. l_3(D) [frange] double [-Inf, +Inf] Equivalence set : [l_3(D), r_4(D)] : _1 = __builtin_signbit (l_3(D)); if (_1 != 0) goto ; [INV] else goto ; [INV] 3->6 (T) _1 : [irange] int [-INF, -1][1, +INF] 3->6 (T) l_3(D) : [frange] double [-Inf, -0.0 (-0x0.0p+0)] 3->4 (F) _1 : [irange] int [0, 0] 3->4 (F) l_3(D) : [frange] double [0.0 (0x0.0p+0), +Inf] : _2 = __builtin_signbit (r_4(D)); Yeah, we know l_3 and r_4 are equivalent, and we also know that on the edge 3->4 l_3 has the range double [0.0 (0x0.0p+0), +Inf] When we miss is that with an equivalence, we also have to put -0.0 back into the range. We currently don't so we think we can fold the second signbit call. If I fix that, we then see r_4(D) [frange] double [-0.0 (-0x0.0p+0), +Inf] which prevents the folding. I need to audit to see if there are other places where we may have to "adjust" equivalence range, or how best to deal with this in the general case.
[Bug tree-optimization/111622] [13 Regression] EVRP compile-time hog compiling risc-v insn-opinit.cc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111622 --- Comment #3 from Andrew Macleod --- (In reply to Richard Biener from comment #1) > Created attachment 56006 [details] > preprocessed riscv insn-opinit.cc I get i.ii:2203:11: fatal error: definition of ‘class std::initializer_list<_E>’ does not match ‘#include ’ 2203 | class initializer_list | ^~~~ compilation terminated. with the .ii file... do I need something else?
[Bug tree-optimization/111599] [14 Regression] ICE: Segmentation fault in VRP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111599 Andrew Macleod changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #5 from Andrew Macleod --- fixed.
[Bug tree-optimization/110315] [13 Regression] g++ crashes with a segmentation fault while compiling a large const std::vector of std::string since r13-6566-ge0324e2629e25a90
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110315 Andrew Macleod changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #13 from Andrew Macleod --- fixed.
[Bug tree-optimization/111599] [14 Regression] ICE: Segmentation fault in VRP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111599 --- Comment #3 from Andrew Macleod --- patch in testing
[Bug tree-optimization/110315] [13 Regression] g++ crashes with a segmentation fault while compiling a large const std::vector of std::string since r13-6566-ge0324e2629e25a90
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110315 --- Comment #11 from Andrew Macleod --- (In reply to Richard Biener from comment #10) > (In reply to Andrew Macleod from comment #7) > > Created attachment 55591 [details] > > potental patch > > > > I've attached Aldy's patch for PR109695 which he had backported to GCC13 > > back in May. > > This does resolve the issue.. Assuming we want to apply it GCC13. The > > original variant has been in trunk for a while now. Im running this thru a > > bootstrap/regression run now. > > > > I don't know if there is anything further you want to try it with? > > Can we consider this please? sure. OK for gcc 13 after another round of regressions?
[Bug tree-optimization/93917] VRP forgets range of value read from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93917 --- Comment #4 from Andrew Macleod --- Note a slight change in expectation as a result of commit r14-4141-gbf6b107e2a342319b3787ec960fc8014ef3aff91 for PR 110080 Due to a memory load in the second case, we do not remove the unreachable call now as there may be a commoning opportunity later (via inlining in this case) in which this unreachable call may provide new information. The test case has been adjusted to current expectations where we leave this unreachable call and then remove it and set the global range in VRP2 instead.
[Bug tree-optimization/110249] __builtin_unreachable helps optimisation at -O1 but not at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110249 Andrew Macleod changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||amacleod at redhat dot com Resolution|--- |FIXED --- Comment #6 from Andrew Macleod --- Fixed.
[Bug tree-optimization/110080] [13/14 Regression] Missed Dead Code Elimination at -Os when using __builtin_unreachable since r13-6945-g429a7a88438
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110080 Andrew Macleod changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #6 from Andrew Macleod --- fixed.
[Bug tree-optimization/110875] [14 Regression] Dead Code Elimination Regression since r14-2501-g285c9d042e9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110875 Andrew Macleod changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #4 from Andrew Macleod --- When range_of_stmt invokes prefill_name to evaluate unvisited dependencies it should not mark visited names as always_current. when ranger_cache::get_globaL_range() is invoked with the optional "current_p" flag, it triggers additional functionality. This call is meant to be from within ranger and it is understood that if the current value is not current, set_global_range will always be called later with a value. Thus it sets the always_current flag in the temporal cache to avoid computation cycles. the prefill_stmt_dependencies () mechanism within ranger is intended to emulate the bahaviour of range_of_stmt on an arbitrarily long series of unresolved dependencies without triggering the overhead of huge call chains from the range_of_expr/range_on_entry/range_on_exit routines. Rather, it creates a stack of unvisited names, and invokes range_of_stmt on them directly in order to get initial cache values for each ssa-name. The issue in this PR was that routine was incorrectly invoking the get_global_cache to determine whether there was a global value. If there was, it would move on to the next dependency without invoking set_global_range to clear the always_current flag. What it should have been doing was simply checking if there as a global value, and if there was not, add the name for processing and THEN invoke get_global_value to do all the special processing. fixed.
[Bug tree-optimization/110918] [14 Regression] Dead Code Elimination Regression at -O3 since r14-2331-g018e7f16408
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110918 Andrew Macleod changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #3 from Andrew Macleod --- fixed.
[Bug middle-end/111009] [12/13/14 regression] -fno-strict-overflow erroneously elides null pointer checks and causes SIGSEGV on perf from linux-6.4.10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111009 --- Comment #10 from Andrew Macleod --- > At the hazard of stating the obvious: it's a wrong-code when you execute it > (not a gcc ICE). > doh. of course. test is in progress. Richi was correct. Although the code in range-ops for fold_range is correct, op1_range cannot make the same assumptions that fold_range does because it has no concept of the symbolic values on the RHS. I am making op1_range more restrictive.
[Bug middle-end/111009] [12/13/14 regression] -fno-strict-overflow erroneously elides null pointer checks and causes SIGSEGV on perf from linux-6.4.10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111009 --- Comment #8 from Andrew Macleod --- Do I need some special target or something? on trunk just "-fno-strict-overflow -O3" doesnt fail for me on x86_64-pc-linux-gnu... ./cc1 -fno-strict-overflow -O3 009.c -quiet
[Bug tree-optimization/110942] [14 Regression] Dead Code Elimination Regression at -O3 since r14-1165-g257c2be7ff8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110942 --- Comment #3 from Andrew Macleod --- The original revision listed, I narrowed down to a single instance where the new code did something that makes a difference we determine that in stmt stmt _8 = (int) i_10; which originally had a range of int [0, 0][8, 8] MASK 0x8 VALUE 0x0, the new code allows it to determine that the result is actually now int [0, 0], which allows stmt_8 to be propagated as "0" now.. and thus dead. It was also dead before, but stayed around for a fe wpasses because it wasnt a constant, so wasn't put in the "let VRP kill this stmt because it was fully propagated as a constant" list. The only difference in the IL as a result is the block which contained that stmt has a phi: >: > # k_14 = PHI > _8 = (int) i_10; and with the current trunk, we remove that block, which causes the fallthru block to have an extra incoming edge: < # i_10 = PHI <0(4), 8(25), 8(10), 0(3)> < # k_15 = PHI <0(4), k_13(25), k_15(10), 0(3)> --- > # i_10 = PHI <0(4), 8(26), 0(3)> > # k_15 = PHI <0(4), k_14(26), 0(3)> That is literally the only difference coming out of EVRP, so perhaps the extra edge coming into the block is causing some threading issues?
[Bug tree-optimization/110942] [14 Regression] Dead Code Elimination Regression at -O3 since r14-1165-g257c2be7ff8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110942 --- Comment #1 from Andrew Macleod --- Created attachment 55743 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55743&action=edit patch to revert the change Although the bisection stopped at this change, it does not appear to be the underlying culprit. The attached patch will revert that change (its pretty simple) and the problem still exists.. so something other than that presumably caused it. I Also presume this is the first change going backwards that shows the issue, is it possible to find the *first* change since gcc 13 was branched that caused the issue?
[Bug middle-end/111009] [12/13/14 regression] -fno-strict-overflow erroneously elides null pointer checks and causes SIGSEGV on perf from linux-6.4.10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111009 --- Comment #4 from Andrew Macleod --- (In reply to Richard Biener from comment #3) > bool > operator_addr_expr::fold_range (irange &r, tree type, > const irange &lh, > const irange &rh, > relation_trio) const > { > if (empty_range_varying (r, type, lh, rh)) > return true; > > // Return a non-null pointer of the LHS type (passed in op2). > if (lh.zero_p ()) > r = range_zero (type); > > not sure how this is called, but we can only derive this if the offset > is zero as well, definitely if targetm.addr_space.zero_address_valid, > but I think this is true in general. > > else if (!contains_zero_p (lh)) > r = range_nonzero (type); > > and this is only true for TYPE_OVERFLOW_UNDEFINED (type), with > -fwrapv-pointer we could wrap to zero. > > That is, it's _not_ GIMPLE undefined behavior to compute &0->bar. > It looks like without -fwrapv-pointer we elide the if (!a) check, > dereferencing it when dso && dso != curr. I suppose that looks reasonable > with a = &dso->maj, when dso != 0 then a != 0 unless ->maj wraps. Range-ops won't see anything like &dso->maj.. it sees rangers and nothing else. it just gets the result of that expression determined by someone else. . so if it see [0,0] for the range, that means &dso->maj has been determined to be 0. When folding, addressof has some funky mechanics, and it symbolically processes the trees in gimple-range-fold.cc in fold_using_range::range_of_address I think it takes care of all the funky things you mention. I also notice in the earlier comment where we set _13 to 0... the code you quoted where _13 was recomputed by ranger. it ends with GORI TRUE : (797) recomputation (_13) [irange] _Bool [1, 1] The result was [1,1] as far as ranger was concerned o the edge from 3->16 so that prop0bably isn't how gimple fold determined it was zero. Is there still an issue here?
[Bug tree-optimization/110131] [12/13/14 Regression] Missed Dead Code Elimination when using __builtin_unreachable since r12-6924-gc2b610e7c6c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110131 --- Comment #6 from Andrew Macleod --- (In reply to Andrew Pinski from comment #5) > (In reply to Andrew Pinski from comment #4) > > So I have a VRP patch which gets us to: > > /* If the value range is defined by more than one pair, > try to optimize to a singularity if either > the first or last pair is a singleton. */ > > That is if we have: > a range like: [0, 0][3, 32768][4294934529, +INF] > > and we the comparison like `_37 <= 2` > Since 2 is a value between the first 2 pairs, we can just say this should be > the same as `_37 == 0` because that is the only value that is valid here. > > The same idea happens for the last 2 pairs (and the last pair is a > singleton). > > Also if we only 2 pairs prefer the one which is 0 (since 0 is always > simplier). hmm. you should be able to trim out the "untrue" parts of the range. lets see. do you have the comparison stmt, or is it just an expression? maybe it doesnt matter. if you know the expression is "_37 <= 2" you can ask range-ops what the range of _37 is and intessect it with your range, and ask if the result is a singleton. then this will work for all kinds of things. the format is op1_range (result_range, result_type, lhs_range, op2_range) so it looks like: range_op_handler handler (LT_EXPR); if (handler) { Value_Range vr(TREE_TYPE (_37)); if (handler->op1_range (vr, TREE_TYPE (_37), bool_one_range, [2,2])) _37_range.intersect (vr); } if (_37.singleton_p ()) blah() you have a _37_range of [0, 0][3, 32768][4294934529, +INF] op1_range for [1, 1] = OP1 LT_EXPR [2,2] will return [0,2] the intersection result will be [0,0] this would work for any condition/type and both false and true LHS if you provide [0,0] or [1,1] for the lhs. in particular this may help with things like x != 15, where the pair that is relevant to produce a singleton maybe wont be at one end or the other. poking around for a minute, it looks like simplify_using_ranges from vr_values calls a routine called test_for_singularity which has never been converted to multi-ranges. I wonder if reworking it might resolve this class of issues as well? Eventually we planned to get to simplify_using-ranges and friends, but it hasnt happened yet