Re: [PATCH 2/2] PR tree-optimization/103231 - Directly resolve range_of_stmt dependencies.
On 11/24/21 04:17, Richard Biener wrote: On Tue, Nov 23, 2021 at 6:04 PM Andrew MacLeod via Gcc-patches wrote: This is the second patch in the series. Ranger uses its own API to recursively satisfy dependencies. When range_of_stmt is called on _1482 = _1154 + _1177; it picks up the ranges of _1154 and _1177 from it's cache. If those statements have not been seen yet, it recursively calls range_of_stmt on each one to resolve the answer. Each main API call can trigger up to 5 other calls to get to the next API point: gimple_ranger::fold_range_internal (...) gimple_ranger::range_of_stmt (_1154,...) gimple_ranger::range_of_expr (_1154,) fold_using_range::range_of_range_op (..) fold_using_range::fold_stmt (...) gimple_ranger::fold_range_internal (...) gimple_ranger::range_of_stmt (_1482,...) For a normal forward walk, values tend to already be in the cache, but when we try to answer a range_on_edge question on a back edge, it can trigger a very long series of queries. I spent some time analyzing these patterns, and found that regardless of which API entry point was used, ultimately range_of_stmt is invoked in a predictable order to initiate the cache values. This patch implements a dependency resolver which when range_of_stmt uses when it is called on something which does not have a cache entry yet (thus the disambiguation of the temporal failure vs lack of cache entry in the previous patch) This looks at each operand, and if that operand does not have a cache entry, pushes it on a stack. Names are popped from the stack and fold_using_range() is invoked once all the operands have been resolved. When we do get to call fold_using_range::fold_stmt(), we are sure the operands are cached and the value will simply be calculated. This is ultimately the exact series of events that would have happened had the main API been used... except we don't involve the call stack anymore for each one. Well, mostly :-). For this fix, we only do this with operands of stmts which have a range-ops handler.. meaning we do not use the API for anything range-ops understands. We will still use the main API for resolving PHIS and other statements as they are encountered.We could do this for PHIS as well, but for the most part it was the chains of stmts within a block that were causing the vast majority of the issue. If we later discover large chains of PHIs are causing issues as well, then I can easily add them to this as well. I avoided them this time because there is extra overhead involved in traversing all the PHI arguments extra times. Sticking with range-ops limits us to 2 operands to check, and the overhead is very minimal. I have tested this with PHIs as well and we could just include them upfront. The overhead is more than doubled, but the increased compile time of a VRP pass is still under 1%. Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK? OK. Richard. Andrew committed.
Re: [PATCH 2/2] PR tree-optimization/103231 - Directly resolve range_of_stmt dependencies.
On Tue, Nov 23, 2021 at 6:04 PM Andrew MacLeod via Gcc-patches wrote: > > This is the second patch in the series. > > Ranger uses its own API to recursively satisfy dependencies. When > range_of_stmt is called on _1482 = _1154 + _1177; it picks up the > ranges of _1154 and _1177 from it's cache. If those statements have not > been seen yet, it recursively calls range_of_stmt on each one to resolve > the answer. Each main API call can trigger up to 5 other calls to get > to the next API point: > > gimple_ranger::fold_range_internal (...) > gimple_ranger::range_of_stmt (_1154,...) > gimple_ranger::range_of_expr (_1154,) > fold_using_range::range_of_range_op (..) > fold_using_range::fold_stmt (...) > gimple_ranger::fold_range_internal (...) > gimple_ranger::range_of_stmt (_1482,...) > > For a normal forward walk, values tend to already be in the cache, but > when we try to answer a range_on_edge question on a back edge, it can > trigger a very long series of queries. I spent some time analyzing > these patterns, and found that regardless of which API entry point was > used, ultimately range_of_stmt is invoked in a predictable order to > initiate the cache values. > > This patch implements a dependency resolver which when range_of_stmt > uses when it is called on something which does not have a cache entry > yet (thus the disambiguation of the temporal failure vs lack of cache > entry in the previous patch) > > This looks at each operand, and if that operand does not have a cache > entry, pushes it on a stack. Names are popped from the stack and > fold_using_range() is invoked once all the operands have been > resolved. When we do get to call fold_using_range::fold_stmt(), we are > sure the operands are cached and the value will simply be calculated. > This is ultimately the exact series of events that would have happened > had the main API been used... except we don't involve the call stack > anymore for each one. > > Well, mostly :-). For this fix, we only do this with operands of stmts > which have a range-ops handler.. meaning we do not use the API for > anything range-ops understands. We will still use the main API for > resolving PHIS and other statements as they are encountered.We could > do this for PHIS as well, but for the most part it was the chains of > stmts within a block that were causing the vast majority of the issue. > If we later discover large chains of PHIs are causing issues as well, > then I can easily add them to this as well. I avoided them this time > because there is extra overhead involved in traversing all the PHI > arguments extra times. Sticking with range-ops limits us to 2 operands > to check, and the overhead is very minimal. > > I have tested this with PHIs as well and we could just include them > upfront. The overhead is more than doubled, but the increased compile > time of a VRP pass is still under 1%. > > Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK? OK. Richard. > Andrew > > >
[PATCH 2/2] PR tree-optimization/103231 - Directly resolve range_of_stmt dependencies.
This is the second patch in the series. Ranger uses its own API to recursively satisfy dependencies. When range_of_stmt is called on _1482 = _1154 + _1177; it picks up the ranges of _1154 and _1177 from it's cache. If those statements have not been seen yet, it recursively calls range_of_stmt on each one to resolve the answer. Each main API call can trigger up to 5 other calls to get to the next API point: gimple_ranger::fold_range_internal (...) gimple_ranger::range_of_stmt (_1154,...) gimple_ranger::range_of_expr (_1154,) fold_using_range::range_of_range_op (..) fold_using_range::fold_stmt (...) gimple_ranger::fold_range_internal (...) gimple_ranger::range_of_stmt (_1482,...) For a normal forward walk, values tend to already be in the cache, but when we try to answer a range_on_edge question on a back edge, it can trigger a very long series of queries. I spent some time analyzing these patterns, and found that regardless of which API entry point was used, ultimately range_of_stmt is invoked in a predictable order to initiate the cache values. This patch implements a dependency resolver which when range_of_stmt uses when it is called on something which does not have a cache entry yet (thus the disambiguation of the temporal failure vs lack of cache entry in the previous patch) This looks at each operand, and if that operand does not have a cache entry, pushes it on a stack. Names are popped from the stack and fold_using_range() is invoked once all the operands have been resolved. When we do get to call fold_using_range::fold_stmt(), we are sure the operands are cached and the value will simply be calculated. This is ultimately the exact series of events that would have happened had the main API been used... except we don't involve the call stack anymore for each one. Well, mostly :-). For this fix, we only do this with operands of stmts which have a range-ops handler.. meaning we do not use the API for anything range-ops understands. We will still use the main API for resolving PHIS and other statements as they are encountered. We could do this for PHIS as well, but for the most part it was the chains of stmts within a block that were causing the vast majority of the issue. If we later discover large chains of PHIs are causing issues as well, then I can easily add them to this as well. I avoided them this time because there is extra overhead involved in traversing all the PHI arguments extra times. Sticking with range-ops limits us to 2 operands to check, and the overhead is very minimal. I have tested this with PHIs as well and we could just include them upfront. The overhead is more than doubled, but the increased compile time of a VRP pass is still under 1%. Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK? Andrew From 28d1fea6e6c0c0368dbc04e895aaa0a6b47c19da Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Mon, 22 Nov 2021 14:39:41 -0500 Subject: [PATCH 3/3] Directly resolve range_of_stmt dependencies. All ranger API entries eventually call range_of_stmt to ensure there is an initial global value to work with. This can cause very deep call chains when satisfied via the normal API. Instead, push any dependencies onto a stack and evaluate them in a depth first manner, mirroring what would have happened via the normal API calls. PR tree-optimization/103231 gcc/ * gimple-range.cc (gimple_ranger::gimple_ranger): Create stmt stack. (gimple_ranger::gimple_ranger): Delete stmt stack. (gimple_ranger::range_of_stmt): Process depenedencies if they have no global cache entry. (gimple_ranger::prefill_name): New. (gimple_ranger::prefill_stmt_dependencies): New. * gimple-range.h (class gimple_ranger): Add prototypes. --- gcc/gimple-range.cc | 107 +++- gcc/gimple-range.h | 4 ++ 2 files changed, 109 insertions(+), 2 deletions(-) diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc index e3ab3a8bb48..178a470a419 100644 --- a/gcc/gimple-range.cc +++ b/gcc/gimple-range.cc @@ -46,6 +46,9 @@ gimple_ranger::gimple_ranger () : m_oracle = m_cache.oracle (); if (dump_file && (param_ranger_debug & RANGER_DEBUG_TRACE)) tracer.enable_trace (); + m_stmt_list.create (0); + m_stmt_list.safe_grow (num_ssa_names); + m_stmt_list.truncate (0); // Ensure the not_executable flag is clear everywhere. if (flag_checking) @@ -61,6 +64,11 @@ gimple_ranger::gimple_ranger () : } } +gimple_ranger::~gimple_ranger () +{ + m_stmt_list.release (); +} + bool gimple_ranger::range_of_expr (irange , tree expr, gimple *stmt) { @@ -284,9 +292,10 @@ gimple_ranger::range_of_stmt (irange , gimple *s, tree name) else { bool current; - // Check if the stmt has already been processed, and is not stale. + // Check if the stmt has already been processed. if (m_cache.get_global_range (r, name, current)) { + // If it isn't stale,