Re: Building libgccjit with -fno-semantic-interposition? ( was Re: 1.76% performance loss in VRP due to inlining)

2024-06-10 Thread Sam James via Gcc
Andrea Corallo via Gcc  writes:

>> FWIW I've no idea if any libgccjit users are using semantic
>> interposition; I suspect the answer is "no one is using it".
>> 
>> Antoyo, Andrea [also CCed]: are either of you using semantic
>> interposition of symbols within libgccjit?
>
> Hi David,
>
> AFAIU in Emacs we are not relying on interposition of symbols.

FWIW, I've built GCC (inc. libgccjit for Emacs) with
-fno-semantic-interposition for a few years now and had no issues.

It's worth considering, I think, given the above.

>
> Thanks
>
>   Andrea


Re: 1.76% performance loss in VRP due to inlining

2024-05-03 Thread Aldy Hernandez via Gcc
After some very painful analysis, I was able to reduce the degradation
we are experiencing in VRP to a handful of lines in the new
implementation of prange.

What happens is that any series of small changes to a new prange class
causes changes in the inlining of wide_int_storage elsewhere.  With
the attached patch, one difference lies in irange::singleton_p(tree
*).  Note that this is in irange, which is completely unrelated to the
new (unused) code.

Using trunk as the stage1 compiler, we can see the assembly for
irange::singleton_p(tree *) in value-range.cc is different with and
without my patch.

The number of calls into wide_int within irange::singleton_p(tree *) changes:

awk '/^_ZNK6irange11singleton_pEPP9tree_node/,/endproc/' value-range.s
| grep call.*wide_int

With mainline sources:

call_ZN16wide_int_storageC2ERKS_
call
_Z16wide_int_to_treeP9tree_nodeRK8poly_intILj1E16generic_wide_intI20wide_int_ref_storageILb0ELb1

With the attached patch:

call_ZN16wide_int_storageC2ERKS_
call_ZN16wide_int_storageC2ERKS_
call
_Z16wide_int_to_treeP9tree_nodeRK8poly_intILj1E16generic_wide_intI20wide_int_ref_storageILb0ELb1
call_ZN16wide_int_storageC2ERKS_

The additional calls correspond to the wide_int_storage constructor:

$ c++filt _ZN16wide_int_storageC2ERKS_
wide_int_storage::wide_int_storage(wide_int_storage const&)

Using -fno-semantic-interposition makes no difference.

Here are the relevant bits in the difference from -Winline with and
without my patch:

> inlined from ‘virtual bool irange::singleton_p(tree_node**) const’ at 
> /home/aldyh/src/gcc/gcc/value-range.cc:1254:40:
> /home/aldyh/src/gcc/gcc/wide-int.h:1196:8: warning: inlining failed in call 
> to ‘wide_int_storage::wide_int_storage(const wide_int_storage&)’: --param 
> inline-unit-growth limit reached [-Winline]
>  1196 | inline wide_int_storage::wide_int_storage (const wide_int_storage &x)
>   |^~~~
> /home/aldyh/src/gcc/gcc/wide-int.h:775:7: note: called from here
>   775 | class GTY(()) generic_wide_int : public storage
>   |   ^~~~
> /home/aldyh/src/gcc/gcc/wide-int.h:1196:8: warning: inlining failed in call 
> to ‘wide_int_storage::wide_int_storage(const wide_int_storage&)’: --param 
> inline-unit-growth limit reached [-Winline]
>  1196 | inline wide_int_storage::wide_int_storage (const wide_int_storage &x)
>   |^~~~
> /home/aldyh/src/gcc/gcc/wide-int.h:775:7: note: called from here
>   775 | class GTY(()) generic_wide_int : public storage
>   |   ^~~~
> In copy constructor 
> ‘generic_wide_int::generic_wide_int(const 
> generic_wide_int&)’,
> inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at 
> /home/aldyh/src/gcc/gcc/value-range.h:1122:25,

Note that this is just one example.  There are also inlining
differences to irange::get_bitmask(), irange::union_bitmask(),
irange::operator=, among others.  Most of the inlining failures seem
to be related to wide_int_storage.  I am attaching the difference in
-Winline for the curious.

Tracking this down is tricky because the slightest change in the patch
causes different inlining in irange.  Even using a slightly different
stage1 compiler produces different changes.  For example, using GCC 13
as the stage1 compiler, VRP exhibits a slowdown of 2% with the full
prange class.  Although this is virtually identical to the slowdown
for using trunk as the stage1 compiler, the inlining failures are a
tad different.

I am tempted to commit the attached to mainline, which slows down VRP
by 0.3%, but is measurable enough to analyze, just so we have a base
commit-point from where to do the analysis.  My wife is about to give
birth any day now, so I'm afraid if I drop off for a few months, we'll
lose the analysis and the point in time from where to do it.

One final thing.  The full prange class patch, even when disabled,
slows VRP by 2%.  I tried to implement the class in small increments,
and every small change caused a further slowdown.  I don't know if
this 2% is final, or if further tweaks in this space will slow us down
more.

On a positive note, with the entirety of prange implemented (not just
the base class but range-ops implemented and prange enabled, there is
no overall change to VRP, and IPA-cp speeds up by 7%.  This is because
holding pointers in prange is a net win that overcomes the 2% handicap
the inliner is hitting us with.

I would love to hear thoughts, and if y'all agree that committing a
small skeleton now can help us track this down in the future.

Aldy

On Tue, Apr 30, 2024 at 11:37 PM Jason Merrill  wrote:
>
> On 4/30/24 12:22, Jakub Jelinek wrote:
> > On Tue, Apr 30, 2024 at 03:09:51PM -0400, Jason Merrill via Gcc wrote:
> >> On Fri, Apr 26, 2024 at 5:44 AM Aldy Hernandez via Gcc  
> >> wrote:
> >>>
> >>> In implementing prange (pointer ranges), I have found a 1.74% slowdown
> >>> in VRP,

Re: Building libgccjit with -fno-semantic-interposition? ( was Re: 1.76% performance loss in VRP due to inlining)

2024-05-02 Thread Andrea Corallo via Gcc
> FWIW I've no idea if any libgccjit users are using semantic
> interposition; I suspect the answer is "no one is using it".
> 
> Antoyo, Andrea [also CCed]: are either of you using semantic
> interposition of symbols within libgccjit?

Hi David,

AFAIU in Emacs we are not relying on interposition of symbols.

Thanks

  Andrea


Building libgccjit with -fno-semantic-interposition? ( was Re: 1.76% performance loss in VRP due to inlining)

2024-04-30 Thread David Malcolm via Gcc
On Tue, 2024-04-30 at 21:15 +0200, Richard Biener via Gcc wrote:
> 
> 
> > Am 30.04.2024 um 21:11 schrieb Jason Merrill via Gcc
> > :
> > 
> > On Fri, Apr 26, 2024 at 5:44 AM Aldy Hernandez via Gcc
> >  wrote:
> > > 
> > > In implementing prange (pointer ranges), I have found a 1.74%
> > > slowdown
> > > in VRP, even without any code path actually using the code.  I
> > > have
> > > tracked this down to irange::get_bitmask() being compiled
> > > differently
> > > with and without the bare bones patch.  With the patch,
> > > irange::get_bitmask() has a lot of code inlined into it,
> > > particularly
> > > get_bitmask_from_range() and consequently the wide_int_storage
> > > code.
> > ...
> > > +static irange_bitmask
> > > +get_bitmask_from_range (tree type,
> > > + const wide_int &min, const wide_int &max)
> > ...
> > > -irange_bitmask
> > > -irange::get_bitmask_from_range () const
> > 
> > My guess is that this is the relevant change: the old function has
> > external linkage, and is therefore interposable, which inhibits
> > inlining.  The new function has internal linkage, which allows
> > inlining.
> > 
> > Relatedly, I wonder if we want to build GCC with -fno-semantic-
> > interposition?
> 
> I guess that’s a good idea, though it’s already implied when doing
> LTO bootstrap and building cc1 and friends?  (But not for libgccjit?)

[CCing jit mailing list]

FWIW I've no idea if any libgccjit users are using semantic
interposition; I suspect the answer is "no one is using it".

Antoyo, Andrea [also CCed]: are either of you using semantic
interposition of symbols within libgccjit?

If not, we *might* get a slightly faster libgccjit by building it with
-fno-semantic-interposition.  Or maybe not...


Dave
 
> 
> Richard 
> 
> > 
> > Jason
> > 
> 



Re: 1.76% performance loss in VRP due to inlining

2024-04-30 Thread Jason Merrill via Gcc

On 4/30/24 12:22, Jakub Jelinek wrote:

On Tue, Apr 30, 2024 at 03:09:51PM -0400, Jason Merrill via Gcc wrote:

On Fri, Apr 26, 2024 at 5:44 AM Aldy Hernandez via Gcc  wrote:


In implementing prange (pointer ranges), I have found a 1.74% slowdown
in VRP, even without any code path actually using the code.  I have
tracked this down to irange::get_bitmask() being compiled differently
with and without the bare bones patch.  With the patch,
irange::get_bitmask() has a lot of code inlined into it, particularly
get_bitmask_from_range() and consequently the wide_int_storage code.

...

+static irange_bitmask
+get_bitmask_from_range (tree type,
+ const wide_int &min, const wide_int &max)

...

-irange_bitmask
-irange::get_bitmask_from_range () const


My guess is that this is the relevant change: the old function has
external linkage, and is therefore interposable, which inhibits
inlining.  The new function has internal linkage, which allows
inlining.


Even when a function is exported, when not compiled with -fpic/-fPIC
if we know the function is defined in current TU, it can't be interposed,


Ah, I was misremembering the effect of the change.  Rather, it's that if 
we see that a function with internal linkage has only a single caller, 
we try harder to inline it.


Jason



Re: 1.76% performance loss in VRP due to inlining

2024-04-30 Thread Jakub Jelinek via Gcc
On Tue, Apr 30, 2024 at 03:09:51PM -0400, Jason Merrill via Gcc wrote:
> On Fri, Apr 26, 2024 at 5:44 AM Aldy Hernandez via Gcc  
> wrote:
> >
> > In implementing prange (pointer ranges), I have found a 1.74% slowdown
> > in VRP, even without any code path actually using the code.  I have
> > tracked this down to irange::get_bitmask() being compiled differently
> > with and without the bare bones patch.  With the patch,
> > irange::get_bitmask() has a lot of code inlined into it, particularly
> > get_bitmask_from_range() and consequently the wide_int_storage code.
> ...
> > +static irange_bitmask
> > +get_bitmask_from_range (tree type,
> > + const wide_int &min, const wide_int &max)
> ...
> > -irange_bitmask
> > -irange::get_bitmask_from_range () const
> 
> My guess is that this is the relevant change: the old function has
> external linkage, and is therefore interposable, which inhibits
> inlining.  The new function has internal linkage, which allows
> inlining.

Even when a function is exported, when not compiled with -fpic/-fPIC
if we know the function is defined in current TU, it can't be interposed,
Try
int
foo (int x)
{
  return x + 1;
}

int
bar (int x, int y)
{
  return foo (x) + foo (y);
}
with -O2 -fpic -fno-semantic-interposition vs. -O2 -fpic vs. -O2 -fpie vs.
-O2.

> Relatedly, I wonder if we want to build GCC with -fno-semantic-interposition?

It could be useful just for libgccjit.  And not sure if libgccjit users
don't want to interpose something.

Jakub



Re: 1.76% performance loss in VRP due to inlining

2024-04-30 Thread Richard Biener via Gcc



> Am 30.04.2024 um 21:11 schrieb Jason Merrill via Gcc :
> 
> On Fri, Apr 26, 2024 at 5:44 AM Aldy Hernandez via Gcc  
> wrote:
>> 
>> In implementing prange (pointer ranges), I have found a 1.74% slowdown
>> in VRP, even without any code path actually using the code.  I have
>> tracked this down to irange::get_bitmask() being compiled differently
>> with and without the bare bones patch.  With the patch,
>> irange::get_bitmask() has a lot of code inlined into it, particularly
>> get_bitmask_from_range() and consequently the wide_int_storage code.
> ...
>> +static irange_bitmask
>> +get_bitmask_from_range (tree type,
>> + const wide_int &min, const wide_int &max)
> ...
>> -irange_bitmask
>> -irange::get_bitmask_from_range () const
> 
> My guess is that this is the relevant change: the old function has
> external linkage, and is therefore interposable, which inhibits
> inlining.  The new function has internal linkage, which allows
> inlining.
> 
> Relatedly, I wonder if we want to build GCC with -fno-semantic-interposition?

I guess that’s a good idea, though it’s already implied when doing LTO 
bootstrap and building cc1 and friends?  (But not for libgccjit?)

Richard 

> 
> Jason
> 


Re: 1.76% performance loss in VRP due to inlining

2024-04-30 Thread Jason Merrill via Gcc
On Fri, Apr 26, 2024 at 5:44 AM Aldy Hernandez via Gcc  wrote:
>
> In implementing prange (pointer ranges), I have found a 1.74% slowdown
> in VRP, even without any code path actually using the code.  I have
> tracked this down to irange::get_bitmask() being compiled differently
> with and without the bare bones patch.  With the patch,
> irange::get_bitmask() has a lot of code inlined into it, particularly
> get_bitmask_from_range() and consequently the wide_int_storage code.
...
> +static irange_bitmask
> +get_bitmask_from_range (tree type,
> + const wide_int &min, const wide_int &max)
...
> -irange_bitmask
> -irange::get_bitmask_from_range () const

My guess is that this is the relevant change: the old function has
external linkage, and is therefore interposable, which inhibits
inlining.  The new function has internal linkage, which allows
inlining.

Relatedly, I wonder if we want to build GCC with -fno-semantic-interposition?

Jason



Re: 1.76% performance loss in VRP due to inlining

2024-04-30 Thread Martin Jambor
Hi,

On Fri, Apr 26 2024, Aldy Hernandez via Gcc wrote:
> Hi folks!
>
> In implementing prange (pointer ranges), I have found a 1.74% slowdown
> in VRP, even without any code path actually using the code.  I have
> tracked this down to irange::get_bitmask() being compiled differently
> with and without the bare bones patch.  With the patch,
> irange::get_bitmask() has a lot of code inlined into it, particularly
> get_bitmask_from_range() and consequently the wide_int_storage code.
>
> I don't know whether this is expected behavior, and if it is, how to
> mitigate it.  I have tried declaring get_bitmask_from_range() inline,
> but that didn't help.  OTOH, using __attribute__((always_inline))
> helps a bit, but not entirely.  What does help is inlining
> irange::get_bitmask() entirely, but that seems like a big hammer.
>
> The overall slowdown in compilation is 0.26%, because VRP is a
> relatively fast pass, but a measurable pass slowdown is something we'd
> like to avoid.
>
> What's the recommended approach here?

I'm afraid that the right approach (not sure if that also means the
recommended approach) is to figure out why inlining
irange::get_bitmask() helps, i.e. what unnecessary computations or
memory accesses it avoids or which other subsequent optimizations it
enables, etc.  Then we can have a look whether IPA could facilitate this
without inlining (or if eventually code shrinks to a reasonable size,
how to teach the inliner to predict this).

Martin


>
> For the curious, I am attaching before and after copies of
> value-range.s.  I am also attaching the two patches needed to
> reproduce the problem on mainline.  The first patch is merely setup.
> It is the second patch that exhibits the problem.  Notice there are no
> uses of prange yet.
>
> Thanks.
> Aldy
> From ee63833c5f56064ef47c2bb9debd485f77d00171 Mon Sep 17 00:00:00 2001
> From: Aldy Hernandez 
> Date: Tue, 19 Mar 2024 18:04:55 +0100
> Subject: [PATCH] Move get_bitmask_from_range out of irange class.
>
> ---
>  gcc/value-range.cc | 52 +++---
>  gcc/value-range.h  |  1 -
>  2 files changed, 26 insertions(+), 27 deletions(-)
>
> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> index 70375f7abf9..0f81ce32615 100644
> --- a/gcc/value-range.cc
> +++ b/gcc/value-range.cc
> @@ -31,6 +31,30 @@ along with GCC; see the file COPYING3.  If not see
>  #include "fold-const.h"
>  #include "gimple-range.h"
>  
> +// Return the bitmask inherent in a range.
> +
> +static irange_bitmask
> +get_bitmask_from_range (tree type,
> + const wide_int &min, const wide_int &max)
> +{
> +  unsigned prec = TYPE_PRECISION (type);
> +
> +  // All the bits of a singleton are known.
> +  if (min == max)
> +{
> +  wide_int mask = wi::zero (prec);
> +  wide_int value = min;
> +  return irange_bitmask (value, mask);
> +}
> +
> +  wide_int xorv = min ^ max;
> +
> +  if (xorv != 0)
> +xorv = wi::mask (prec - wi::clz (xorv), false, prec);
> +
> +  return irange_bitmask (wi::zero (prec), min | xorv);
> +}
> +
>  void
>  irange::accept (const vrange_visitor &v) const
>  {
> @@ -1832,31 +1856,6 @@ irange::invert ()
>  verify_range ();
>  }
>  
> -// Return the bitmask inherent in the range.
> -
> -irange_bitmask
> -irange::get_bitmask_from_range () const
> -{
> -  unsigned prec = TYPE_PRECISION (type ());
> -  wide_int min = lower_bound ();
> -  wide_int max = upper_bound ();
> -
> -  // All the bits of a singleton are known.
> -  if (min == max)
> -{
> -  wide_int mask = wi::zero (prec);
> -  wide_int value = lower_bound ();
> -  return irange_bitmask (value, mask);
> -}
> -
> -  wide_int xorv = min ^ max;
> -
> -  if (xorv != 0)
> -xorv = wi::mask (prec - wi::clz (xorv), false, prec);
> -
> -  return irange_bitmask (wi::zero (prec), min | xorv);
> -}
> -
>  // Remove trailing ranges that this bitmask indicates can't exist.
>  
>  void
> @@ -1978,7 +1977,8 @@ irange::get_bitmask () const
>// in the mask.
>//
>// See also the note in irange_bitmask::intersect.
> -  irange_bitmask bm = get_bitmask_from_range ();
> +  irange_bitmask bm
> += get_bitmask_from_range (type (), lower_bound (), upper_bound ());
>if (!m_bitmask.unknown_p ())
>  bm.intersect (m_bitmask);
>return bm;
> diff --git a/gcc/value-range.h b/gcc/value-range.h
> index 9531df56988..dc5b153a83e 100644
> --- a/gcc/value-range.h
> +++ b/gcc/value-range.h
> @@ -351,7 +351,6 @@ private:
>bool varying_compatible_p () const;
>bool intersect_bitmask (const irange &r);
>bool union_bitmask (const irange &r);
> -  irange_bitmask get_bitmask_from_range () const;
>bool set_range_from_bitmask ();
>  
>bool intersect (const wide_int& lb, const wide_int& ub);
> -- 
> 2.44.0
>
> From 03c70de43177a97ec5e9c243aafc798c1f37e6d8 Mon Sep 17 00:00:00 2001
> From: Aldy Hernandez 
> Date: Wed, 20 Mar 2024 06:25:52 +0100
> Subject: [PATCH] Implement minimum prange class exhibiting VRP sl

Re: 1.76% performance loss in VRP due to inlining

2024-04-30 Thread Aldy Hernandez via Gcc
On Tue, Apr 30, 2024 at 9:58 AM Richard Biener
 wrote:
>
> On Fri, Apr 26, 2024 at 11:45 AM Aldy Hernandez via Gcc  
> wrote:
> >
> > Hi folks!
> >
> > In implementing prange (pointer ranges), I have found a 1.74% slowdown
> > in VRP, even without any code path actually using the code.  I have
> > tracked this down to irange::get_bitmask() being compiled differently
> > with and without the bare bones patch.  With the patch,
> > irange::get_bitmask() has a lot of code inlined into it, particularly
> > get_bitmask_from_range() and consequently the wide_int_storage code.
> >
> > I don't know whether this is expected behavior, and if it is, how to
> > mitigate it.  I have tried declaring get_bitmask_from_range() inline,
> > but that didn't help.  OTOH, using __attribute__((always_inline))
> > helps a bit, but not entirely.  What does help is inlining
> > irange::get_bitmask() entirely, but that seems like a big hammer.
>
> You can use -Winline to see why we don't inline an inline declared
> function.  I would guess the unit-growth limit kicks in?

Ah, will do.  Thanks.

>
> Did you check a release checking compiler?  That might still
> inline things.

Yes, we only measure performance with release builds.

Aldy



Re: 1.76% performance loss in VRP due to inlining

2024-04-30 Thread Richard Biener via Gcc
On Fri, Apr 26, 2024 at 11:45 AM Aldy Hernandez via Gcc  wrote:
>
> Hi folks!
>
> In implementing prange (pointer ranges), I have found a 1.74% slowdown
> in VRP, even without any code path actually using the code.  I have
> tracked this down to irange::get_bitmask() being compiled differently
> with and without the bare bones patch.  With the patch,
> irange::get_bitmask() has a lot of code inlined into it, particularly
> get_bitmask_from_range() and consequently the wide_int_storage code.
>
> I don't know whether this is expected behavior, and if it is, how to
> mitigate it.  I have tried declaring get_bitmask_from_range() inline,
> but that didn't help.  OTOH, using __attribute__((always_inline))
> helps a bit, but not entirely.  What does help is inlining
> irange::get_bitmask() entirely, but that seems like a big hammer.

You can use -Winline to see why we don't inline an inline declared
function.  I would guess the unit-growth limit kicks in?

Did you check a release checking compiler?  That might still
inline things.

> The overall slowdown in compilation is 0.26%, because VRP is a
> relatively fast pass, but a measurable pass slowdown is something we'd
> like to avoid.
>
> What's the recommended approach here?
>
> For the curious, I am attaching before and after copies of
> value-range.s.  I am also attaching the two patches needed to
> reproduce the problem on mainline.  The first patch is merely setup.
> It is the second patch that exhibits the problem.  Notice there are no
> uses of prange yet.
>
> Thanks.
> Aldy


1.76% performance loss in VRP due to inlining

2024-04-26 Thread Aldy Hernandez via Gcc
Hi folks!

In implementing prange (pointer ranges), I have found a 1.74% slowdown
in VRP, even without any code path actually using the code.  I have
tracked this down to irange::get_bitmask() being compiled differently
with and without the bare bones patch.  With the patch,
irange::get_bitmask() has a lot of code inlined into it, particularly
get_bitmask_from_range() and consequently the wide_int_storage code.

I don't know whether this is expected behavior, and if it is, how to
mitigate it.  I have tried declaring get_bitmask_from_range() inline,
but that didn't help.  OTOH, using __attribute__((always_inline))
helps a bit, but not entirely.  What does help is inlining
irange::get_bitmask() entirely, but that seems like a big hammer.

The overall slowdown in compilation is 0.26%, because VRP is a
relatively fast pass, but a measurable pass slowdown is something we'd
like to avoid.

What's the recommended approach here?

For the curious, I am attaching before and after copies of
value-range.s.  I am also attaching the two patches needed to
reproduce the problem on mainline.  The first patch is merely setup.
It is the second patch that exhibits the problem.  Notice there are no
uses of prange yet.

Thanks.
Aldy
From ee63833c5f56064ef47c2bb9debd485f77d00171 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Tue, 19 Mar 2024 18:04:55 +0100
Subject: [PATCH] Move get_bitmask_from_range out of irange class.

---
 gcc/value-range.cc | 52 +++---
 gcc/value-range.h  |  1 -
 2 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 70375f7abf9..0f81ce32615 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -31,6 +31,30 @@ along with GCC; see the file COPYING3.  If not see
 #include "fold-const.h"
 #include "gimple-range.h"
 
+// Return the bitmask inherent in a range.
+
+static irange_bitmask
+get_bitmask_from_range (tree type,
+			const wide_int &min, const wide_int &max)
+{
+  unsigned prec = TYPE_PRECISION (type);
+
+  // All the bits of a singleton are known.
+  if (min == max)
+{
+  wide_int mask = wi::zero (prec);
+  wide_int value = min;
+  return irange_bitmask (value, mask);
+}
+
+  wide_int xorv = min ^ max;
+
+  if (xorv != 0)
+xorv = wi::mask (prec - wi::clz (xorv), false, prec);
+
+  return irange_bitmask (wi::zero (prec), min | xorv);
+}
+
 void
 irange::accept (const vrange_visitor &v) const
 {
@@ -1832,31 +1856,6 @@ irange::invert ()
 verify_range ();
 }
 
-// Return the bitmask inherent in the range.
-
-irange_bitmask
-irange::get_bitmask_from_range () const
-{
-  unsigned prec = TYPE_PRECISION (type ());
-  wide_int min = lower_bound ();
-  wide_int max = upper_bound ();
-
-  // All the bits of a singleton are known.
-  if (min == max)
-{
-  wide_int mask = wi::zero (prec);
-  wide_int value = lower_bound ();
-  return irange_bitmask (value, mask);
-}
-
-  wide_int xorv = min ^ max;
-
-  if (xorv != 0)
-xorv = wi::mask (prec - wi::clz (xorv), false, prec);
-
-  return irange_bitmask (wi::zero (prec), min | xorv);
-}
-
 // Remove trailing ranges that this bitmask indicates can't exist.
 
 void
@@ -1978,7 +1977,8 @@ irange::get_bitmask () const
   // in the mask.
   //
   // See also the note in irange_bitmask::intersect.
-  irange_bitmask bm = get_bitmask_from_range ();
+  irange_bitmask bm
+= get_bitmask_from_range (type (), lower_bound (), upper_bound ());
   if (!m_bitmask.unknown_p ())
 bm.intersect (m_bitmask);
   return bm;
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 9531df56988..dc5b153a83e 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -351,7 +351,6 @@ private:
   bool varying_compatible_p () const;
   bool intersect_bitmask (const irange &r);
   bool union_bitmask (const irange &r);
-  irange_bitmask get_bitmask_from_range () const;
   bool set_range_from_bitmask ();
 
   bool intersect (const wide_int& lb, const wide_int& ub);
-- 
2.44.0

From 03c70de43177a97ec5e9c243aafc798c1f37e6d8 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Wed, 20 Mar 2024 06:25:52 +0100
Subject: [PATCH] Implement minimum prange class exhibiting VRP slowdown.

---
 gcc/value-range-pretty-print.cc |  25 +++
 gcc/value-range-pretty-print.h  |   1 +
 gcc/value-range.cc  | 274 
 gcc/value-range.h   | 196 +++
 4 files changed, 496 insertions(+)

diff --git a/gcc/value-range-pretty-print.cc b/gcc/value-range-pretty-print.cc
index c75cbea3955..154253e047f 100644
--- a/gcc/value-range-pretty-print.cc
+++ b/gcc/value-range-pretty-print.cc
@@ -113,6 +113,31 @@ vrange_printer::print_irange_bitmasks (const irange &r) const
   pp_string (pp, p);
 }
 
+void
+vrange_printer::visit (const prange &r) const
+{
+  pp_string (pp, "[prange] ");
+  if (r.undefined_p ())
+{
+  pp_string (pp, "UNDEFINED");
+  return;
+}
+  dump_generic_node (pp, r.type (), 0