Re: [PATCH 2/2] PR tree-optimization/103231 - Directly resolve range_of_stmt dependencies.

2021-11-24 Thread Andrew MacLeod via Gcc-patches

On 11/24/21 04:17, Richard Biener wrote:

On Tue, Nov 23, 2021 at 6:04 PM Andrew MacLeod via Gcc-patches
 wrote:

This is the second patch in the series.

Ranger uses its own API to recursively satisfy dependencies. When
range_of_stmt is called on _1482 = _1154 + _1177;  it picks up the
ranges of _1154 and _1177 from it's cache. If those statements have not
been seen yet, it recursively calls range_of_stmt on each one to resolve
the answer.  Each main API call can trigger up to 5 other calls to get
to the next API point:

 gimple_ranger::fold_range_internal (...)
 gimple_ranger::range_of_stmt (_1154,...)
 gimple_ranger::range_of_expr (_1154,)
 fold_using_range::range_of_range_op (..)
 fold_using_range::fold_stmt (...)
 gimple_ranger::fold_range_internal (...)
 gimple_ranger::range_of_stmt (_1482,...)

For a normal forward walk, values tend to already be in the cache, but
when we try to answer a range_on_edge question on a back edge, it can
trigger a very long series of queries.  I spent some time analyzing
these patterns, and found that regardless of which API entry point was
used, ultimately range_of_stmt is invoked in a predictable order to
initiate the cache values.

This patch implements a dependency resolver which when range_of_stmt
uses when it is called on something which does not have a cache entry
yet (thus the disambiguation of the temporal failure vs lack of cache
entry in the previous patch)

This looks at each operand, and if that operand does not have a cache
entry, pushes it on a stack.   Names are popped from the stack and
fold_using_range() is invoked once all the operands have been
resolved.   When we do get to call fold_using_range::fold_stmt(), we are
sure the operands are cached and the value will simply be calculated.
This is ultimately the exact series of events that would have happened
had the main API been used... except we don't involve the call stack
anymore for each one.

Well, mostly :-).  For this fix, we only do this with operands of stmts
which have a range-ops handler.. meaning we do not use the API for
anything range-ops understands.  We will still use the main API for
resolving PHIS and other statements as they are encountered.We could
do this for PHIS as well, but for the most part it was the chains of
stmts within a block that were causing the vast majority of the issue.
If we later discover large chains of PHIs are causing issues as well,
then I can easily add them to this as well.  I avoided them this time
because there is extra overhead involved in traversing all the PHI
arguments extra times.  Sticking with range-ops limits us to 2 operands
to check, and the overhead is very minimal.

I have tested this with PHIs as well and we could just include them
upfront. The overhead is more than doubled, but the increased compile
time of a VRP pass is still under 1%.

Bootstrapped on x86_64-pc-linux-gnu with no regressions.  OK?

OK.

Richard.


Andrew




committed.



Re: [PATCH 2/2] PR tree-optimization/103231 - Directly resolve range_of_stmt dependencies.

2021-11-24 Thread Richard Biener via Gcc-patches
On Tue, Nov 23, 2021 at 6:04 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> This is the second patch in the series.
>
> Ranger uses its own API to recursively satisfy dependencies. When
> range_of_stmt is called on _1482 = _1154 + _1177;  it picks up the
> ranges of _1154 and _1177 from it's cache. If those statements have not
> been seen yet, it recursively calls range_of_stmt on each one to resolve
> the answer.  Each main API call can trigger up to 5 other calls to get
> to the next API point:
>
> gimple_ranger::fold_range_internal (...)
> gimple_ranger::range_of_stmt (_1154,...)
> gimple_ranger::range_of_expr (_1154,)
> fold_using_range::range_of_range_op (..)
> fold_using_range::fold_stmt (...)
> gimple_ranger::fold_range_internal (...)
> gimple_ranger::range_of_stmt (_1482,...)
>
> For a normal forward walk, values tend to already be in the cache, but
> when we try to answer a range_on_edge question on a back edge, it can
> trigger a very long series of queries.  I spent some time analyzing
> these patterns, and found that regardless of which API entry point was
> used, ultimately range_of_stmt is invoked in a predictable order to
> initiate the cache values.
>
> This patch implements a dependency resolver which when range_of_stmt
> uses when it is called on something which does not have a cache entry
> yet (thus the disambiguation of the temporal failure vs lack of cache
> entry in the previous patch)
>
> This looks at each operand, and if that operand does not have a cache
> entry, pushes it on a stack.   Names are popped from the stack and
> fold_using_range() is invoked once all the operands have been
> resolved.   When we do get to call fold_using_range::fold_stmt(), we are
> sure the operands are cached and the value will simply be calculated.
> This is ultimately the exact series of events that would have happened
> had the main API been used... except we don't involve the call stack
> anymore for each one.
>
> Well, mostly :-).  For this fix, we only do this with operands of stmts
> which have a range-ops handler.. meaning we do not use the API for
> anything range-ops understands.  We will still use the main API for
> resolving PHIS and other statements as they are encountered.We could
> do this for PHIS as well, but for the most part it was the chains of
> stmts within a block that were causing the vast majority of the issue.
> If we later discover large chains of PHIs are causing issues as well,
> then I can easily add them to this as well.  I avoided them this time
> because there is extra overhead involved in traversing all the PHI
> arguments extra times.  Sticking with range-ops limits us to 2 operands
> to check, and the overhead is very minimal.
>
> I have tested this with PHIs as well and we could just include them
> upfront. The overhead is more than doubled, but the increased compile
> time of a VRP pass is still under 1%.
>
> Bootstrapped on x86_64-pc-linux-gnu with no regressions.  OK?

OK.

Richard.

> Andrew
>
>
>


[PATCH 2/2] PR tree-optimization/103231 - Directly resolve range_of_stmt dependencies.

2021-11-23 Thread Andrew MacLeod via Gcc-patches

This is the second patch in the series.

Ranger uses its own API to recursively satisfy dependencies. When 
range_of_stmt is called on _1482 = _1154 + _1177;  it picks up the 
ranges of _1154 and _1177 from it's cache. If those statements have not 
been seen yet, it recursively calls range_of_stmt on each one to resolve 
the answer.  Each main API call can trigger up to 5 other calls to get 
to the next API point:


   gimple_ranger::fold_range_internal (...)
   gimple_ranger::range_of_stmt (_1154,...)
   gimple_ranger::range_of_expr (_1154,)
   fold_using_range::range_of_range_op (..)
   fold_using_range::fold_stmt (...)
   gimple_ranger::fold_range_internal (...)
   gimple_ranger::range_of_stmt (_1482,...)

For a normal forward walk, values tend to already be in the cache, but 
when we try to answer a range_on_edge question on a back edge, it can 
trigger a very long series of queries.  I spent some time analyzing 
these patterns, and found that regardless of which API entry point was 
used, ultimately range_of_stmt is invoked in a predictable order to 
initiate the cache values.


This patch implements a dependency resolver which when range_of_stmt 
uses when it is called on something which does not have a cache entry 
yet (thus the disambiguation of the temporal failure vs lack of cache 
entry in the previous patch)


This looks at each operand, and if that operand does not have a cache 
entry, pushes it on a stack.   Names are popped from the stack and 
fold_using_range() is invoked once all the operands have been 
resolved.   When we do get to call fold_using_range::fold_stmt(), we are 
sure the operands are cached and the value will simply be calculated.  
This is ultimately the exact series of events that would have happened 
had the main API been used... except we don't involve the call stack 
anymore for each one.


Well, mostly :-).  For this fix, we only do this with operands of stmts 
which have a range-ops handler.. meaning we do not use the API for 
anything range-ops understands.  We will still use the main API for 
resolving PHIS and other statements as they are encountered.    We could 
do this for PHIS as well, but for the most part it was the chains of 
stmts within a block that were causing the vast majority of the issue.  
If we later discover large chains of PHIs are causing issues as well, 
then I can easily add them to this as well.  I avoided them this time 
because there is extra overhead involved in traversing all the PHI 
arguments extra times.  Sticking with range-ops limits us to 2 operands 
to check, and the overhead is very minimal.


I have tested this with PHIs as well and we could just include them 
upfront. The overhead is more than doubled, but the increased compile 
time of a VRP pass is still under 1%.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  OK?

Andrew



From 28d1fea6e6c0c0368dbc04e895aaa0a6b47c19da Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 22 Nov 2021 14:39:41 -0500
Subject: [PATCH 3/3] Directly resolve range_of_stmt dependencies.

All ranger API entries eventually call range_of_stmt to ensure there is an
initial global value to work with.  This can cause very deep call chains when
satisfied via the normal API.  Instead, push any dependencies onto a stack
and evaluate them in a depth first manner, mirroring what would have happened
via the normal API calls.

	PR tree-optimization/103231
	gcc/
	* gimple-range.cc (gimple_ranger::gimple_ranger): Create stmt stack.
	(gimple_ranger::gimple_ranger): Delete stmt stack.
	(gimple_ranger::range_of_stmt): Process depenedencies if they have no
	global cache entry.
	(gimple_ranger::prefill_name): New.
	(gimple_ranger::prefill_stmt_dependencies): New.
	* gimple-range.h (class gimple_ranger): Add prototypes.
---
 gcc/gimple-range.cc | 107 +++-
 gcc/gimple-range.h  |   4 ++
 2 files changed, 109 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index e3ab3a8bb48..178a470a419 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -46,6 +46,9 @@ gimple_ranger::gimple_ranger () :
   m_oracle = m_cache.oracle ();
   if (dump_file && (param_ranger_debug & RANGER_DEBUG_TRACE))
 tracer.enable_trace ();
+  m_stmt_list.create (0);
+  m_stmt_list.safe_grow (num_ssa_names);
+  m_stmt_list.truncate (0);
 
   // Ensure the not_executable flag is clear everywhere.
   if (flag_checking)
@@ -61,6 +64,11 @@ gimple_ranger::gimple_ranger () :
 }
 }
 
+gimple_ranger::~gimple_ranger ()
+{
+  m_stmt_list.release ();
+}
+
 bool
 gimple_ranger::range_of_expr (irange , tree expr, gimple *stmt)
 {
@@ -284,9 +292,10 @@ gimple_ranger::range_of_stmt (irange , gimple *s, tree name)
   else
 {
   bool current;
-  // Check if the stmt has already been processed, and is not stale.
+  // Check if the stmt has already been processed.
   if (m_cache.get_global_range (r, name, current))
 	{
+	  // If it isn't stale,