Re: [PATCH] [RFC] New early __builtin_unreachable processing.
On 9/19/23 08:56, Richard Biener wrote: On Mon, Sep 18, 2023 at 3:48 PM Andrew MacLeod wrote: OK. I dont see anything in the early VRP processing now that would allow a later pass to remove the unreachable unless it does its own analysis like DOM might do. Isn't it as simple as if (i_2 > 5) __builtin_unreachable (); registering a global range of [6, INF] for i_2 and then the next time we fold if (i_2 > 5) using range info will eliminate it? Yes, historically that required VRP or DOM since nothing else looked at ranges, not sure how it behaves now given more match.pd patterns do look at (global) ranges. if we set the range yes. What I meant was in the cases where we decide it can't be removed, we do NOT set the range globally in vrp1 now. This means unless some other pass determines the range is [6, +INF] the unreachcable call will remain in the IL and any ranger aware pass will still get the contextual range info resulting from the unreachable. We were sometimes removing the unreachable without being able to update every affected global/future optimization opportunity, which this fixes. Hopefully :-) Its certainly much better at least. In theory, if inlining were aware of global ranges and propagated them, we could also now remove these some of these unreachables in EVRP rather than VRP1... as I think we're now sure there is no benefit to keeping the unreachable call when we remove it. In any case, thanks for the explanation and OK for the patch. Will check it in shortly. Andrew
Re: [PATCH] [RFC] New early __builtin_unreachable processing.
On Mon, Sep 18, 2023 at 3:48 PM Andrew MacLeod wrote: > > > On 9/18/23 02:53, Richard Biener wrote: > > On Fri, Sep 15, 2023 at 4:45 PM Andrew MacLeod wrote: > >> Ive been looking at __builtin_unreachable () regressions. The > >> fundamental problem seems to be a lack of consistent expectation for > >> when we remove it earlier than the final pass of VRP.After looking > >> through them, I think this provides a sensible approach. > >> > >> Ranger is pretty good at providing ranges in blocks dominated by the > >> __builtin_unreachable branch, so removing it isn't quite a critical as > >> it once was. Its also pretty good at identifying what in the block can > >> be affected by the branch. > >> > >> This patch provide an alternate removal algorithm for earlier passes. > >> it looks at *all* the exports from the block, and if the branch > >> dominates every use of all the exports, AND none of those values access > >> memory, VRP will remove the unreachable call, rewrite the branch, update > >> all the values globally, and finally perform the simple DCE on the > >> branch's ssa-name. This is kind of what it did before, but it wasn't > >> as stringent on the requirements. > >> > >> The memory access check is required because there are a couple of test > >> cases for PRE in which there is a series of instruction leading to an > >> unreachable call, and none of those ssa names are ever used in the IL > >> again. The whole chunk is dead, and we update globals, however > >> pointlessly. However, one of ssa_names loads from memory, and a later > >> passes commons this value with a later load, and then the unreachable > >> call provides additional information about the later load.This is > >> evident in tree-ssa/ssa-pre-34.c. The only way I see to avoid this > >> situation is to not remove the unreachable if there is a load feeding it. > >> > >> What this does is a more sophisticated version of what DOM does in > >> all_uses_feed_or_dominated_by_stmt. THe feeding instructions dont have > >> to be single use, but they do have to be dominated by the branch or be > >> single use within the branches block.. > >> > >> If there are multiple uses in the same block as the branch, this does > >> not remove the unreachable call. If we could be sure there are no > >> intervening calls or side effects, it would be allowable, but this a > >> more expensive checking operation. Ranger gets the ranges right anyway, > >> so with more passes using ranger, Im not sure we'd see much benefit from > >> the additional analysis. It could always be added later. > >> > >> This fixes at least 110249 and 110080 (and probably others). The only > >> regression is 93917 for which I changed the testcase to adjust > >> expectations: > >> > >> // PR 93917 > >> void f1(int n) > >> { > >> if(n<0) > >> __builtin_unreachable(); > >> f3(n); > >> } > >> > >> void f2(int*n) > >> { > >> if(*n<0) > >> __builtin_unreachable(); > >> f3 (*n); > >> } > >> > >> We were removing both unreachable calls in VRP1, but only updating the > >> global values in the first case, meaning we lose information. With the > >> change in semantics, we only update the global in the first case, but we > >> leave the unreachable call in the second case now (due to the load from > >> memory). Ranger still calculates the contextual range correctly as [0, > >> +INF] in the second case, it just doesn't set the global value until > >> VRP2 when it is removed. > >> > >> Does this seem reasonable? > > I wonder how this addresses the fundamental issue we always faced > > in that when we apply the range this range info in itself allows the > > branch to the __builtin_unreachable () to be statically determined, > > so when the first VRP pass sets the range the next pass evaluating > > the condition will remove it (and the guarded __builtin_unreachable ()). > > > > In principle there's nothing wrong with that if we don't lose the range > > info during optimizations, but that unfortunately happens more often > > than wanted and with the __builtin_unreachable () gone we've lost > > the ability to re-compute them. > > > > I think it's good to explicitly remove the branch at the point we want > > rather than relying on the "next" visitor to pick up the global range. > > > > As I read the patch we now remove __builtin_unreachable () explicitly > > as soon as possible but don't really address the fundamental issue > > in any way? > > > I think it pretty much addresses the issue completely. No globals are > updated by the unreachable branch unless it is removed. We remove the > unreachable early ONLY if every use of all the exports is dominated by > the branch...with the exception of a single use in the block used to > define a different export. and those have to all have no other uses > which are not dominated. ie > >[local count: 1073741824]: >y_2 = x_1(D) >> 1; >t_3 = y_2 + 1; >if (t_3 > 100) > goto ; [0.00%] >el
Re: [PATCH] [RFC] New early __builtin_unreachable processing.
On 9/18/23 02:53, Richard Biener wrote: On Fri, Sep 15, 2023 at 4:45 PM Andrew MacLeod wrote: Ive been looking at __builtin_unreachable () regressions. The fundamental problem seems to be a lack of consistent expectation for when we remove it earlier than the final pass of VRP.After looking through them, I think this provides a sensible approach. Ranger is pretty good at providing ranges in blocks dominated by the __builtin_unreachable branch, so removing it isn't quite a critical as it once was. Its also pretty good at identifying what in the block can be affected by the branch. This patch provide an alternate removal algorithm for earlier passes. it looks at *all* the exports from the block, and if the branch dominates every use of all the exports, AND none of those values access memory, VRP will remove the unreachable call, rewrite the branch, update all the values globally, and finally perform the simple DCE on the branch's ssa-name. This is kind of what it did before, but it wasn't as stringent on the requirements. The memory access check is required because there are a couple of test cases for PRE in which there is a series of instruction leading to an unreachable call, and none of those ssa names are ever used in the IL again. The whole chunk is dead, and we update globals, however pointlessly. However, one of ssa_names loads from memory, and a later passes commons this value with a later load, and then the unreachable call provides additional information about the later load.This is evident in tree-ssa/ssa-pre-34.c. The only way I see to avoid this situation is to not remove the unreachable if there is a load feeding it. What this does is a more sophisticated version of what DOM does in all_uses_feed_or_dominated_by_stmt. THe feeding instructions dont have to be single use, but they do have to be dominated by the branch or be single use within the branches block.. If there are multiple uses in the same block as the branch, this does not remove the unreachable call. If we could be sure there are no intervening calls or side effects, it would be allowable, but this a more expensive checking operation. Ranger gets the ranges right anyway, so with more passes using ranger, Im not sure we'd see much benefit from the additional analysis. It could always be added later. This fixes at least 110249 and 110080 (and probably others). The only regression is 93917 for which I changed the testcase to adjust expectations: // PR 93917 void f1(int n) { if(n<0) __builtin_unreachable(); f3(n); } void f2(int*n) { if(*n<0) __builtin_unreachable(); f3 (*n); } We were removing both unreachable calls in VRP1, but only updating the global values in the first case, meaning we lose information. With the change in semantics, we only update the global in the first case, but we leave the unreachable call in the second case now (due to the load from memory). Ranger still calculates the contextual range correctly as [0, +INF] in the second case, it just doesn't set the global value until VRP2 when it is removed. Does this seem reasonable? I wonder how this addresses the fundamental issue we always faced in that when we apply the range this range info in itself allows the branch to the __builtin_unreachable () to be statically determined, so when the first VRP pass sets the range the next pass evaluating the condition will remove it (and the guarded __builtin_unreachable ()). In principle there's nothing wrong with that if we don't lose the range info during optimizations, but that unfortunately happens more often than wanted and with the __builtin_unreachable () gone we've lost the ability to re-compute them. I think it's good to explicitly remove the branch at the point we want rather than relying on the "next" visitor to pick up the global range. As I read the patch we now remove __builtin_unreachable () explicitly as soon as possible but don't really address the fundamental issue in any way? I think it pretty much addresses the issue completely. No globals are updated by the unreachable branch unless it is removed. We remove the unreachable early ONLY if every use of all the exports is dominated by the branch... with the exception of a single use in the block used to define a different export. and those have to all have no other uses which are not dominated. ie [local count: 1073741824]: y_2 = x_1(D) >> 1; t_3 = y_2 + 1; if (t_3 > 100) goto ; [0.00%] else goto ; [100.00%] [count: 0]: __builtin_unreachable (); [local count: 1073741824]: func (x_1(D), y_2, t_3); In this case we will remove the unreachable call because we can provide an accurate global range for all values used in the definition chain for the program. Global Exported (via early unreachable): x_1(D) = [irange] unsigned int [0, 199] MASK 0xff VALUE 0x0 Global Exported (via early unreachable): y_2 = [irange] unsigned int [0, 99] MASK 0x7ff
Re: [PATCH] [RFC] New early __builtin_unreachable processing.
On Fri, Sep 15, 2023 at 4:45 PM Andrew MacLeod wrote: > > Ive been looking at __builtin_unreachable () regressions. The > fundamental problem seems to be a lack of consistent expectation for > when we remove it earlier than the final pass of VRP.After looking > through them, I think this provides a sensible approach. > > Ranger is pretty good at providing ranges in blocks dominated by the > __builtin_unreachable branch, so removing it isn't quite a critical as > it once was. Its also pretty good at identifying what in the block can > be affected by the branch. > > This patch provide an alternate removal algorithm for earlier passes. > it looks at *all* the exports from the block, and if the branch > dominates every use of all the exports, AND none of those values access > memory, VRP will remove the unreachable call, rewrite the branch, update > all the values globally, and finally perform the simple DCE on the > branch's ssa-name. This is kind of what it did before, but it wasn't > as stringent on the requirements. > > The memory access check is required because there are a couple of test > cases for PRE in which there is a series of instruction leading to an > unreachable call, and none of those ssa names are ever used in the IL > again. The whole chunk is dead, and we update globals, however > pointlessly. However, one of ssa_names loads from memory, and a later > passes commons this value with a later load, and then the unreachable > call provides additional information about the later load.This is > evident in tree-ssa/ssa-pre-34.c. The only way I see to avoid this > situation is to not remove the unreachable if there is a load feeding it. > > What this does is a more sophisticated version of what DOM does in > all_uses_feed_or_dominated_by_stmt. THe feeding instructions dont have > to be single use, but they do have to be dominated by the branch or be > single use within the branches block.. > > If there are multiple uses in the same block as the branch, this does > not remove the unreachable call. If we could be sure there are no > intervening calls or side effects, it would be allowable, but this a > more expensive checking operation. Ranger gets the ranges right anyway, > so with more passes using ranger, Im not sure we'd see much benefit from > the additional analysis. It could always be added later. > > This fixes at least 110249 and 110080 (and probably others). The only > regression is 93917 for which I changed the testcase to adjust > expectations: > > // PR 93917 > void f1(int n) > { >if(n<0) > __builtin_unreachable(); >f3(n); > } > > void f2(int*n) > { >if(*n<0) > __builtin_unreachable(); >f3 (*n); > } > > We were removing both unreachable calls in VRP1, but only updating the > global values in the first case, meaning we lose information. With the > change in semantics, we only update the global in the first case, but we > leave the unreachable call in the second case now (due to the load from > memory). Ranger still calculates the contextual range correctly as [0, > +INF] in the second case, it just doesn't set the global value until > VRP2 when it is removed. > > Does this seem reasonable? I wonder how this addresses the fundamental issue we always faced in that when we apply the range this range info in itself allows the branch to the __builtin_unreachable () to be statically determined, so when the first VRP pass sets the range the next pass evaluating the condition will remove it (and the guarded __builtin_unreachable ()). In principle there's nothing wrong with that if we don't lose the range info during optimizations, but that unfortunately happens more often than wanted and with the __builtin_unreachable () gone we've lost the ability to re-compute them. I think it's good to explicitly remove the branch at the point we want rather than relying on the "next" visitor to pick up the global range. As I read the patch we now remove __builtin_unreachable () explicitly as soon as possible but don't really address the fundamental issue in any way? > Bootstraps on x86_64-pc-linux-gnu with no regressions. OK? > > Andrew > >
[PATCH] [RFC] New early __builtin_unreachable processing.
Ive been looking at __builtin_unreachable () regressions. The fundamental problem seems to be a lack of consistent expectation for when we remove it earlier than the final pass of VRP. After looking through them, I think this provides a sensible approach. Ranger is pretty good at providing ranges in blocks dominated by the __builtin_unreachable branch, so removing it isn't quite a critical as it once was. Its also pretty good at identifying what in the block can be affected by the branch. This patch provide an alternate removal algorithm for earlier passes. it looks at *all* the exports from the block, and if the branch dominates every use of all the exports, AND none of those values access memory, VRP will remove the unreachable call, rewrite the branch, update all the values globally, and finally perform the simple DCE on the branch's ssa-name. This is kind of what it did before, but it wasn't as stringent on the requirements. The memory access check is required because there are a couple of test cases for PRE in which there is a series of instruction leading to an unreachable call, and none of those ssa names are ever used in the IL again. The whole chunk is dead, and we update globals, however pointlessly. However, one of ssa_names loads from memory, and a later passes commons this value with a later load, and then the unreachable call provides additional information about the later load. This is evident in tree-ssa/ssa-pre-34.c. The only way I see to avoid this situation is to not remove the unreachable if there is a load feeding it. What this does is a more sophisticated version of what DOM does in all_uses_feed_or_dominated_by_stmt. THe feeding instructions dont have to be single use, but they do have to be dominated by the branch or be single use within the branches block.. If there are multiple uses in the same block as the branch, this does not remove the unreachable call. If we could be sure there are no intervening calls or side effects, it would be allowable, but this a more expensive checking operation. Ranger gets the ranges right anyway, so with more passes using ranger, Im not sure we'd see much benefit from the additional analysis. It could always be added later. This fixes at least 110249 and 110080 (and probably others). The only regression is 93917 for which I changed the testcase to adjust expectations: // PR 93917 void f1(int n) { if(n<0) __builtin_unreachable(); f3(n); } void f2(int*n) { if(*n<0) __builtin_unreachable(); f3 (*n); } We were removing both unreachable calls in VRP1, but only updating the global values in the first case, meaning we lose information. With the change in semantics, we only update the global in the first case, but we leave the unreachable call in the second case now (due to the load from memory). Ranger still calculates the contextual range correctly as [0, +INF] in the second case, it just doesn't set the global value until VRP2 when it is removed. Does this seem reasonable? Bootstraps on x86_64-pc-linux-gnu with no regressions. OK? Andrew From 87072ebfcd4f51276fc6ed1fb0557257d51ec446 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Wed, 13 Sep 2023 11:52:15 -0400 Subject: [PATCH 3/3] New early __builtin_unreachable processing. in VRP passes before __builtin_unreachable MUST be removed, only remove it if all exports affected by the unreachable can have global values updated, and do not involve loads from memory. PR tree-optimization/110080 PR tree-optimization/110249 gcc/ * tree-vrp.cc (remove_unreachable::final_p): New. (remove_unreachable::maybe_register): Rename from maybe_register_block and call early or final routine. (fully_replaceable): New. (remove_unreachable::handle_early): New. (remove_unreachable::remove_and_update_globals): Remove non-final processing. (rvrp_folder::rvrp_folder): Add final flag to constructor. (rvrp_folder::post_fold_bb): Remove unreachable registration. (rvrp_folder::pre_fold_stmt): Move unreachable processing to here. (execute_ranger_vrp): Adjust some call parameters. gcc/testsuite/ * g++.dg/pr110249.C: New. * gcc.dg/pr110080.c: New. * gcc.dg/pr93917.c: Adjust. Tweak vuse case Adjusted testcase 93917 --- gcc/testsuite/g++.dg/pr110249.C | 16 +++ gcc/testsuite/gcc.dg/pr110080.c | 27 + gcc/testsuite/gcc.dg/pr93917.c | 7 +- gcc/tree-vrp.cc | 203 ++-- 4 files changed, 214 insertions(+), 39 deletions(-) create mode 100644 gcc/testsuite/g++.dg/pr110249.C create mode 100644 gcc/testsuite/gcc.dg/pr110080.c diff --git a/gcc/testsuite/g++.dg/pr110249.C b/gcc/testsuite/g++.dg/pr110249.C new file mode 100644 index 000..2b737618bdb --- /dev/null +++ b/gcc/testsuite/g++.dg/pr110249.C @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-vrp1-alias" } */ + +#include +#include + +uint64_t read64r(const uint64_t &x) { +if