subject:"Re\: \[PATCH\] AArch64\: Improve address rematerialization costs"

Re: [PATCH] AArch64: Improve address rematerialization costs

2022-05-12 Thread Richard Sandiford via Gcc-patches

Wilco Dijkstra  writes:
> Hi Richard,
>
>> But even if the costs are too high, the patch seems to be overcompensating.
>> It doesn't make logical sense for an ADRP+LDR to be cheaper than an LDR.
>
> An LDR is not a replacement for ADRP+LDR, you need a store in addition the
> original ADRP+LDR. Basically a simple spill would be comparing these 2 
> sequences:
>
> ADRP x0, ...
> LDR x0, [x0, ...]
> STR x0, [SP, ...]
> ...
> LDR x0, [SP, ...]
>
>
> ADRP x0, ...
> LDR x0, [x0, ...]
> ...
> ADRP x0, ...
> LDR x0, [x0, ...]
>
> Obviously it's far cheaper to do the latter than the former.

Sure.  Like I say, I'm not disagreeing with the intent of reducing
spilling and promoting rematerialisation.  I agree we should do that.

I'm just disagreeing with the approach of using rtx_costs.  The rtx_cost
hook isn't being asked the question: is spilling this better value than
rematerialising it?  It's being asked for the cost of an operation, on
the understanding that that cost will be compared with the cost of other
operations.  An ADRP+LDR operation then ought to be at least as costly
as an LDR, because in a two-way comparison, it is.

[…]

>> Maybe it would help to turn the question around for a minute.  Can we
>> describe the cases in which it's *better* for the RA to spill a constant
>> address to the stack and reload it, rather than rematerialise on demand?
>
> Rematerialization is almost always better than spilling and reloading from the
> stack. If the constant requires multiple instructions and there are more than 
> 2
> references it would be better for codesize to spill, but for performance it is
> better to rematerialize unless there are many references.
>
> You also want to prefer rematerialization over spilling a different liferange 
> when
> other aspects are comparable.

Yeah, that's what I thought the answer would be.  So the question is:
why is the RA choosing to spill and reload rather than rematerialise
these values?  Does it not know how to rematerialise them, and so we
rely on earlier passes not reusing the constants?  Or does the RA
know how but decides it isn't worthwhile, because of the way that
the RA uses the target costs?  If the latter, I would be much happier with
a new hook that allows the target to force the RA to rematerialise a given
value, if that's the heuristic we want to use when optimising for speed.

Thanks,
Richard

Re: [PATCH] AArch64: Improve address rematerialization costs

2022-05-12 Thread Wilco Dijkstra via Gcc-patches

Hi Richard,

> But even if the costs are too high, the patch seems to be overcompensating.
> It doesn't make logical sense for an ADRP+LDR to be cheaper than an LDR.

An LDR is not a replacement for ADRP+LDR, you need a store in addition the
original ADRP+LDR. Basically a simple spill would be comparing these 2 
sequences:

ADRP x0, ...
LDR x0, [x0, ...]
STR x0, [SP, ...]
...
LDR x0, [SP, ...]


ADRP x0, ...
LDR x0, [x0, ...]
...
ADRP x0, ...
LDR x0, [x0, ...]

Obviously it's far cheaper to do the latter than the former.

> Giving X zero cost means that a sequence like:
>
>   (set (reg x0) X)
>   (set (reg x1) X)
>
> should stay as-is rather than be changed to:
>
>   (set (reg x0) X)
>   (set (reg x1) (reg x0))
>
> I don't think we want that for multi-instruction constants when
> optimising for size.

I don't believe this is a real problem. The cost queries for address constants 
come
from register allocation, I don't see them affect other optimizations.

> Yeah, I wasn't suggesting that we increase the spill costs.  I'm saying

I'm saying that because we've set the spill costs low on purpose to work around
register allocation bugs. There have been some fixes since, so increasing the 
spill
costs may now be feasible (but not trivial).

> that we should look at whether the target-independent RA heuristics need
> to change, whether new target hooks are needed, etc.  We shouldn't go
> into this with the assumption that the target-independent code is
> invariant and that any fix must be in existing aarch64 hooks (rtx costs
> or spill costs).

But what bug do you think exists in target independent code? It behaves
correctly once we supply more accurate costs. If there was no rematerialization
irrespectively of the cost settings then you could claim there was a bug.

> Maybe it would help to turn the question around for a minute.  Can we
> describe the cases in which it's *better* for the RA to spill a constant
> address to the stack and reload it, rather than rematerialise on demand?

Rematerialization is almost always better than spilling and reloading from the
stack. If the constant requires multiple instructions and there are more than 2
references it would be better for codesize to spill, but for performance it is
better to rematerialize unless there are many references.

You also want to prefer rematerialization over spilling a different liferange 
when
other aspects are comparable.

Cheers,
Wilco

Re: [PATCH] AArch64: Improve address rematerialization costs

2022-05-12 Thread Wilco Dijkstra via Gcc-patches

Hi,

>> It's also said that chosen alternatives might be the reason that
>> rematerialization
>> is not choosen and alternatives are chosen based on reload heuristics, not 
>> based
>> on actual costs.
>
> Thanks for the pointer.  Yeah, it'd be interesting to know if this
> is the same issue, although I fear the only way of knowing for sure
> is to fix it first and see whether both targets benefit. ;-)

I don't believe this is the same issue - there are lots of register allocation 
problems
indeed, many are caused by the complex design. All the alternatives and 
register classes
create a huge crossproduct, making it almost impossible to make good allocation
decisions even if they were accurately costed.

I've found that the correct way to deal with this is to reduce all this choice 
as much
as possible. That means splitting instructions into simpler ones with fewer 
alternatives
and register classes. You also need to block it from treating all register 
classes as
equivalent - on AArch64 we had to force floating point values to be allocated 
to floating
point registers (which is obviously how any register allocator should work by 
default),
but maybe x86 doesn't do that yet.

Cheers,
Wilco

Re: [PATCH] AArch64: Improve address rematerialization costs

2022-05-12 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> On Wed, May 11, 2022 at 2:23 PM Richard Sandiford via Gcc-patches
>  wrote:
>>
>> Wilco Dijkstra  writes:
>> > Hi Richard,
>> >
>> >> Yeah, I'm not disagreeing with any of that.  It's just a question of
>> >> whether the problem should be fixed by artificially lowering the general
>> >> rtx costs with one particular user (RA spill costs) in mind, or whether
>> >> it should be fixed by making the RA spill code take the factors above
>> >> into account.
>> >
>> > The RA spill code already works fine on immediates but not on address
>> > constants. And the reason is that the current rtx costs for addresses are
>> > set artificially high without justification (I checked the patch that 
>> > increased
>> > the costs but there was nothing explaining why it was beneficial).
>>
>> But even if the costs are too high, the patch seems to be overcompensating.
>> It doesn't make logical sense for an ADRP+LDR to be cheaper than an LDR.
>>
>> Giving X zero cost means that a sequence like:
>>
>>   (set (reg x0) X)
>>   (set (reg x1) X)
>>
>> should stay as-is rather than be changed to:
>>
>>   (set (reg x0) X)
>>   (set (reg x1) (reg x0))
>>
>> I don't think we want that for multi-instruction constants when
>> optimising for size.
>>
>> > It's certainly possible to experiment with increasing the spill costs, but 
>> > that
>> > won't improve the issue with address constants unless they are at least 
>> > doubled.
>> > And it has the effect of halving all rtx costs in the register allocator 
>> > which is
>> > likely to cause regressions. So we'd need to adjust many rtx costs to keep 
>> > the
>> > allocator working, plus fix any further regressions this causes.
>>
>> Yeah, I wasn't suggesting that we increase the spill costs.  I'm saying
>> that we should look at whether the target-independent RA heuristics need
>> to change, whether new target hooks are needed, etc.  We shouldn't go
>> into this with the assumption that the target-independent code is
>> invariant and that any fix must be in existing aarch64 hooks (rtx costs
>> or spill costs).
>>
>> Maybe it would help to turn the question around for a minute.  Can we
>> describe the cases in which it's *better* for the RA to spill a constant
>> address to the stack and reload it, rather than rematerialise on demand?
>
> From the discussion in PR102178 it seems that LRA cannot rematerialize
> all "constants" (though here it is constant pool loads).  Some constants
> might also not be 'constant'.   See the PR for more fun "spilling" behavior
> on x86_64.
>
> It's also said that chosen alternatives might be the reason that
> rematerialization
> is not choosen and alternatives are chosen based on reload heuristics, not 
> based
> on actual costs.

Thanks for the pointer.  Yeah, it'd be interesting to know if this
is the same issue, although I fear the only way of knowing for sure
is to fix it first and see whether both targets benefit. ;-)

Richard

Re: [PATCH] AArch64: Improve address rematerialization costs

2022-05-11 Thread Richard Biener via Gcc-patches

On Wed, May 11, 2022 at 2:23 PM Richard Sandiford via Gcc-patches
 wrote:
>
> Wilco Dijkstra  writes:
> > Hi Richard,
> >
> >> Yeah, I'm not disagreeing with any of that.  It's just a question of
> >> whether the problem should be fixed by artificially lowering the general
> >> rtx costs with one particular user (RA spill costs) in mind, or whether
> >> it should be fixed by making the RA spill code take the factors above
> >> into account.
> >
> > The RA spill code already works fine on immediates but not on address
> > constants. And the reason is that the current rtx costs for addresses are
> > set artificially high without justification (I checked the patch that 
> > increased
> > the costs but there was nothing explaining why it was beneficial).
>
> But even if the costs are too high, the patch seems to be overcompensating.
> It doesn't make logical sense for an ADRP+LDR to be cheaper than an LDR.
>
> Giving X zero cost means that a sequence like:
>
>   (set (reg x0) X)
>   (set (reg x1) X)
>
> should stay as-is rather than be changed to:
>
>   (set (reg x0) X)
>   (set (reg x1) (reg x0))
>
> I don't think we want that for multi-instruction constants when
> optimising for size.
>
> > It's certainly possible to experiment with increasing the spill costs, but 
> > that
> > won't improve the issue with address constants unless they are at least 
> > doubled.
> > And it has the effect of halving all rtx costs in the register allocator 
> > which is
> > likely to cause regressions. So we'd need to adjust many rtx costs to keep 
> > the
> > allocator working, plus fix any further regressions this causes.
>
> Yeah, I wasn't suggesting that we increase the spill costs.  I'm saying
> that we should look at whether the target-independent RA heuristics need
> to change, whether new target hooks are needed, etc.  We shouldn't go
> into this with the assumption that the target-independent code is
> invariant and that any fix must be in existing aarch64 hooks (rtx costs
> or spill costs).
>
> Maybe it would help to turn the question around for a minute.  Can we
> describe the cases in which it's *better* for the RA to spill a constant
> address to the stack and reload it, rather than rematerialise on demand?

>From the discussion in PR102178 it seems that LRA cannot rematerialize
all "constants" (though here it is constant pool loads).  Some constants
might also not be 'constant'.   See the PR for more fun "spilling" behavior
on x86_64.

It's also said that chosen alternatives might be the reason that
rematerialization
is not choosen and alternatives are chosen based on reload heuristics, not based
on actual costs.

Richard.

> Thanks,
> Richard

Re: [PATCH] AArch64: Improve address rematerialization costs

2022-05-11 Thread Richard Sandiford via Gcc-patches

Wilco Dijkstra  writes:
> Hi Richard,
>
>> Yeah, I'm not disagreeing with any of that.  It's just a question of
>> whether the problem should be fixed by artificially lowering the general
>> rtx costs with one particular user (RA spill costs) in mind, or whether
>> it should be fixed by making the RA spill code take the factors above
>> into account.
>
> The RA spill code already works fine on immediates but not on address
> constants. And the reason is that the current rtx costs for addresses are
> set artificially high without justification (I checked the patch that 
> increased
> the costs but there was nothing explaining why it was beneficial).

But even if the costs are too high, the patch seems to be overcompensating.
It doesn't make logical sense for an ADRP+LDR to be cheaper than an LDR.

Giving X zero cost means that a sequence like:

  (set (reg x0) X)
  (set (reg x1) X)

should stay as-is rather than be changed to:

  (set (reg x0) X)
  (set (reg x1) (reg x0))

I don't think we want that for multi-instruction constants when
optimising for size.

> It's certainly possible to experiment with increasing the spill costs, but 
> that
> won't improve the issue with address constants unless they are at least 
> doubled.
> And it has the effect of halving all rtx costs in the register allocator 
> which is
> likely to cause regressions. So we'd need to adjust many rtx costs to keep the
> allocator working, plus fix any further regressions this causes.

Yeah, I wasn't suggesting that we increase the spill costs.  I'm saying
that we should look at whether the target-independent RA heuristics need
to change, whether new target hooks are needed, etc.  We shouldn't go
into this with the assumption that the target-independent code is
invariant and that any fix must be in existing aarch64 hooks (rtx costs
or spill costs).

Maybe it would help to turn the question around for a minute.  Can we
describe the cases in which it's *better* for the RA to spill a constant
address to the stack and reload it, rather than rematerialise on demand?

Thanks,
Richard

Re: [PATCH] AArch64: Improve address rematerialization costs

2022-05-11 Thread Wilco Dijkstra via Gcc-patches

Hi Richard,

> Yeah, I'm not disagreeing with any of that.  It's just a question of
> whether the problem should be fixed by artificially lowering the general
> rtx costs with one particular user (RA spill costs) in mind, or whether
> it should be fixed by making the RA spill code take the factors above
> into account.

The RA spill code already works fine on immediates but not on address
constants. And the reason is that the current rtx costs for addresses are
set artificially high without justification (I checked the patch that increased
the costs but there was nothing explaining why it was beneficial).

It's certainly possible to experiment with increasing the spill costs, but that
won't improve the issue with address constants unless they are at least doubled.
And it has the effect of halving all rtx costs in the register allocator which 
is
likely to cause regressions. So we'd need to adjust many rtx costs to keep the
allocator working, plus fix any further regressions this causes.

Cheers,
Wilco

Re: [PATCH] AArch64: Improve address rematerialization costs

2022-05-10 Thread Richard Sandiford via Gcc-patches

Wilco Dijkstra  writes:
> Hi Richard,
>
>>> There isn't really a better way of doing this within the existing costing 
>>> code.
>>
>> Yeah, I was wondering whether we could change something there.
>> ADRP+LDR is logically more expensive than a single LDR, especially
>> when optimising for size, so I think it's reasonable for the rtx_costs
>> to say so.  But that doesn't/shouldn't mean that spilling is better
>> (for either size or speed).
>>
>> So it feels like there's something missing in the way the costs are
>> being applied.
>
> Calculating accurate spill costs is hard. Spill optimization is done later, so
> until then you can't know the actual cost of a spill decision already made.
> Spills are also more expensive than you think due to store latency, more
> dirty cachelines etc. There is little benefit in lifting an ADRP to the start 
> of
> a function but keep ADD/LDR close to references. Basically ADRP/MOV are
> very cheap, so it's a waste to allocate these to long-lived registers.
>
> Given that there are significant codesize and performance improvements,
> it is clear that doing more rematerialization is better even in cases where it
> takes 2 instructions to recompute the address. Binaries show a significant
> reduction in stack-based loads and stores.

Yeah, I'm not disagreeing with any of that.  It's just a question of
whether the problem should be fixed by artificially lowering the general
rtx costs with one particular user (RA spill costs) in mind, or whether
it should be fixed by making the RA spill code take the factors above
into account.

Thanks,
Richard

Re: [PATCH] AArch64: Improve address rematerialization costs

2022-05-10 Thread Wilco Dijkstra via Gcc-patches

Hi Richard,

>> There isn't really a better way of doing this within the existing costing 
>> code.
>
> Yeah, I was wondering whether we could change something there.
> ADRP+LDR is logically more expensive than a single LDR, especially
> when optimising for size, so I think it's reasonable for the rtx_costs
> to say so.  But that doesn't/shouldn't mean that spilling is better
> (for either size or speed).
>
> So it feels like there's something missing in the way the costs are
> being applied.

Calculating accurate spill costs is hard. Spill optimization is done later, so
until then you can't know the actual cost of a spill decision already made.
Spills are also more expensive than you think due to store latency, more
dirty cachelines etc. There is little benefit in lifting an ADRP to the start of
a function but keep ADD/LDR close to references. Basically ADRP/MOV are
very cheap, so it's a waste to allocate these to long-lived registers.

Given that there are significant codesize and performance improvements,
it is clear that doing more rematerialization is better even in cases where it
takes 2 instructions to recompute the address. Binaries show a significant
reduction in stack-based loads and stores.

Cheers,
Wilco

Re: [PATCH] AArch64: Improve address rematerialization costs

2022-05-09 Thread Richard Sandiford via Gcc-patches

Wilco Dijkstra via Gcc-patches  writes:
> Hi Richard,
>
>> I'm not questioning the results, but I think we need to look in more
>> detail why rematerialisation requires such low costs.  The point of
>> comparison should be against a spill and reload, so any constant
>> that is as cheap as a load should be rematerialised.  If that isn't
>> happening then it sounds like changes are needed elsewhere.
>
> The simple answer is that rematerializable expressions must have a lower cost
> than the spill cost (potentially of something else), otherwise it will never 
> happen.
> The previous costs were set way too high (eg. 12 for ADRP+LDR vs 4 for a 
> reload).
> This patch basically ensures that is indeed the case. In principle a zero cost
> works fine for anything that can be rematerialized. However it may use more
> instructions than a spill (of something else), so a small non-zero cost avoids
> bloating codesize.
>
> There isn't really a better way of doing this within the existing costing 
> code.

Yeah, I was wondering whether we could change something there.
ADRP+LDR is logically more expensive than a single LDR, especially
when optimising for size, so I think it's reasonable for the rtx_costs
to say so.  But that doesn't/shouldn't mean that spilling is better
(for either size or speed).

So it feels like there's something missing in the way the costs are
being applied.

Thanks,
Richard

> We could try doubling or quadrupling the spill costs but that would create a
> lot of fallout since it affects everything.

Re: [PATCH] AArch64: Improve address rematerialization costs

2022-05-09 Thread Wilco Dijkstra via Gcc-patches

Hi Richard,

> I'm not questioning the results, but I think we need to look in more
> detail why rematerialisation requires such low costs.  The point of
> comparison should be against a spill and reload, so any constant
> that is as cheap as a load should be rematerialised.  If that isn't
> happening then it sounds like changes are needed elsewhere.

The simple answer is that rematerializable expressions must have a lower cost
than the spill cost (potentially of something else), otherwise it will never 
happen.
The previous costs were set way too high (eg. 12 for ADRP+LDR vs 4 for a 
reload).
This patch basically ensures that is indeed the case. In principle a zero cost
works fine for anything that can be rematerialized. However it may use more
instructions than a spill (of something else), so a small non-zero cost avoids
bloating codesize.

There isn't really a better way of doing this within the existing costing code.
We could try doubling or quadrupling the spill costs but that would create a
lot of fallout since it affects everything.

Cheers,
Wilco

Re: [PATCH] AArch64: Improve address rematerialization costs

2022-05-09 Thread Richard Sandiford via Gcc-patches

Wilco Dijkstra  writes:
> Improve rematerialization costs of addresses.  The current costs are set too 
> high
> which results in extra register pressure and spilling.  Using lower costs 
> means
> addresses will be rematerialized more often rather than being spilled or 
> causing
> spills.  This results in significant codesize reductions and performance 
> gains.
> SPECINT2017 improves by 0.27% with LTO and 0.16% without LTO.  Codesize is 
> 0.12%
> smaller.

I'm not questioning the results, but I think we need to look in more
detail why rematerialisation requires such low costs.  The point of
comparison should be against a spill and reload, so any constant
that is as cheap as a load should be rematerialised.  If that isn't
happening then it sounds like changes are needed elsewhere.

Thanks,
Richard

> Passes bootstrap and regress. OK for commit?
>
> ChangeLog:
> 2021-06-01  Wilco Dijkstra  
>
> * config/aarch64/aarch64.cc (aarch64_rtx_costs): Use better 
> rematerialization
> costs for HIGH, LO_SUM and SYMREF.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 43d87d1b9c4ef1a85094e51f81745f98f1ef27fb..7341849121ffd6b3b0b77c9730e74e751742e852
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -14529,45 +14529,28 @@ cost_plus:
>   return false;  /* All arguments need to be in registers.  */
> }
>
> +/* The following costs are used for rematerialization of addresses.
> +   Set a low cost for all global accesses - this ensures they are
> +   preferred for rematerialization, blocks them from being spilled
> +   and reduces register pressure.  The result is significant codesize
> +   reductions and performance gains. */
> +
>  case SYMBOL_REF:
> +  *cost = 0;
>
> -  if (aarch64_cmodel == AARCH64_CMODEL_LARGE
> - || aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC)
> -   {
> - /* LDR.  */
> - if (speed)
> -   *cost += extra_cost->ldst.load;
> -   }
> -  else if (aarch64_cmodel == AARCH64_CMODEL_SMALL
> -  || aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC)
> -   {
> - /* ADRP, followed by ADD.  */
> - *cost += COSTS_N_INSNS (1);
> - if (speed)
> -   *cost += 2 * extra_cost->alu.arith;
> -   }
> -  else if (aarch64_cmodel == AARCH64_CMODEL_TINY
> -  || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
> -   {
> - /* ADR.  */
> - if (speed)
> -   *cost += extra_cost->alu.arith;
> -   }
> +  /* Use a separate remateralization cost for GOT accesses.  */
> +  if (aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC
> + && aarch64_classify_symbol (x, 0) == SYMBOL_SMALL_GOT_4G)
> +   *cost = COSTS_N_INSNS (1) / 2;
>
> -  if (flag_pic)
> -   {
> - /* One extra load instruction, after accessing the GOT.  */
> - *cost += COSTS_N_INSNS (1);
> - if (speed)
> -   *cost += extra_cost->ldst.load;
> -   }
>return true;
>
>  case HIGH:
> +  *cost = 0;
> +  return true;
> +
>  case LO_SUM:
> -  /* ADRP/ADD (immediate).  */
> -  if (speed)
> -   *cost += extra_cost->alu.arith;
> +  *cost = COSTS_N_INSNS (3) / 4;
>return true;
>
>  case ZERO_EXTRACT:

Re: [PATCH] AArch64: Improve address rematerialization costs

2021-11-24 Thread Wilco Dijkstra via Gcc-patches

Hi Richard,

> Can you fold in the rtx costs part of the original GOT relaxation patch?

Sure, see below for the updated version.

> I don't think there's enough information here for me to be able to review
> the patch though.  I'll need to find testcases, look in detail at what
> the rtl passes are doing, and try to work out whether (and why) this is
> a good way of fixing things.

Well today GCC does everything with costs rather than backend callbacks.
I'd be interested in hearing about alternatives that have the same effect 
without a callback that allows a backend to decide between spilling and
rematerialization.

Cheers,
Wilco


v2: fold in GOT remat cost

Improve rematerialization costs of addresses.  The current costs are set too 
high
which results in extra register pressure and spilling.  Using lower costs means
addresses will be rematerialized more often rather than being spilled or causing
spills.  This results in significant codesize reductions and performance gains.
SPECINT2017 improves by 0.27% with LTO and 0.16% without LTO.  Codesize is 0.12%
smaller.

Passes bootstrap and regress. OK for commit?

ChangeLog:
2021-06-01  Wilco Dijkstra  

* config/aarch64/aarch64.c (aarch64_rtx_costs): Use better 
rematerialization
costs for HIGH, LO_SUM and SYMREF.
---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
39de231d8ac6d10362cdd2b48eb9bd9de60c6703..a7f99ece55383168fb0f77e5c11c501d0bb2f013
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13610,45 +13610,28 @@ cost_plus:
  return false;  /* All arguments need to be in registers.  */
}
 
+/* The following costs are used for rematerialization of addresses.
+   Set a low cost for all global accesses - this ensures they are
+   preferred for rematerialization, blocks them from being spilled
+   and reduces register pressure.  The result is significant codesize
+   reductions and performance gains. */
+
 case SYMBOL_REF:
 
-  if (aarch64_cmodel == AARCH64_CMODEL_LARGE
- || aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC)
-   {
- /* LDR.  */
- if (speed)
-   *cost += extra_cost->ldst.load;
-   }
-  else if (aarch64_cmodel == AARCH64_CMODEL_SMALL
-  || aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC)
-   {
- /* ADRP, followed by ADD.  */
- *cost += COSTS_N_INSNS (1);
- if (speed)
-   *cost += 2 * extra_cost->alu.arith;
-   }
-  else if (aarch64_cmodel == AARCH64_CMODEL_TINY
-  || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
-   {
- /* ADR.  */
- if (speed)
-   *cost += extra_cost->alu.arith;
-   }
+  /* Use a separate remateralization cost for GOT accesses.  */
+  if (aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC
+ && aarch64_classify_symbol (x, 0) == SYMBOL_SMALL_GOT_4G)
+   *cost = COSTS_N_INSNS (1) / 2;
 
-  if (flag_pic)
-   {
- /* One extra load instruction, after accessing the GOT.  */
- *cost += COSTS_N_INSNS (1);
- if (speed)
-   *cost += extra_cost->ldst.load;
-   }
+  *cost = 0;
   return true;
 
 case HIGH:
+  *cost = 0;
+  return true;
+
 case LO_SUM:
-  /* ADRP/ADD (immediate).  */
-  if (speed)
-   *cost += extra_cost->alu.arith;
+  *cost = COSTS_N_INSNS (3) / 4;
   return true;
 
 case ZERO_EXTRACT:

Re: [PATCH] AArch64: Improve address rematerialization costs

2021-11-04 Thread Richard Sandiford via Gcc-patches

Wilco Dijkstra  writes:
> ping

Can you fold in the rtx costs part of the original GOT relaxation patch?

I don't think there's enough information here for me to be able to review
the patch though.  I'll need to find testcases, look in detail at what
the rtl passes are doing, and try to work out whether (and why) this is
a good way of fixing things.

I don't mind doing that, but I don't think I'll have time before stage 3.

Thanks,
Richard

>
>
> From: Wilco Dijkstra
> Sent: 02 June 2021 11:21
> To: GCC Patches 
> Cc: Kyrylo Tkachov ; Richard Sandiford 
> 
> Subject: [PATCH] AArch64: Improve address rematerialization costs
>
> Hi,
>
> Given the large improvements from better register allocation of GOT accesses,
> I decided to generalize it to get large gains for normal addressing too:
>
> Improve rematerialization costs of addresses.  The current costs are set too 
> high
> which results in extra register pressure and spilling.  Using lower costs 
> means
> addresses will be rematerialized more often rather than being spilled or 
> causing
> spills.  This results in significant codesize reductions and performance 
> gains.
> SPECINT2017 improves by 0.27% with LTO and 0.16% without LTO.  Codesize is 
> 0.12%
> smaller.
>
> Passes bootstrap and regress. OK for commit?
>
> ChangeLog:
> 2021-06-01  Wilco Dijkstra  
>
> * config/aarch64/aarch64.c (aarch64_rtx_costs): Use better 
> rematerialization
> costs for HIGH, LO_SUM and SYMBOL_REF.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 641c83b479e76cbcc75b299eb7ae5f634d9db7cd..08245827daa3f8199b29031e754244c078f0f500
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13444,45 +13444,22 @@ cost_plus:
>return false;  /* All arguments need to be in registers.  */
>  }
>
> -case SYMBOL_REF:
> +/* The following costs are used for rematerialization of addresses.
> +   Set a low cost for all global accesses - this ensures they are
> +   preferred for rematerialization, blocks them from being spilled
> +   and reduces register pressure.  The result is significant codesize
> +   reductions and performance gains. */
>
> -  if (aarch64_cmodel == AARCH64_CMODEL_LARGE
> - || aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC)
> -   {
> - /* LDR.  */
> - if (speed)
> -   *cost += extra_cost->ldst.load;
> -   }
> -  else if (aarch64_cmodel == AARCH64_CMODEL_SMALL
> -  || aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC)
> -   {
> - /* ADRP, followed by ADD.  */
> - *cost += COSTS_N_INSNS (1);
> - if (speed)
> -   *cost += 2 * extra_cost->alu.arith;
> -   }
> -  else if (aarch64_cmodel == AARCH64_CMODEL_TINY
> -  || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
> -   {
> - /* ADR.  */
> - if (speed)
> -   *cost += extra_cost->alu.arith;
> -   }
> -
> -  if (flag_pic)
> -   {
> - /* One extra load instruction, after accessing the GOT.  */
> - *cost += COSTS_N_INSNS (1);
> - if (speed)
> -   *cost += extra_cost->ldst.load;
> -   }
> +case SYMBOL_REF:
> +  *cost = 0;
>return true;
>
>  case HIGH:
> +  *cost = 0;
> +  return true;
> +
>  case LO_SUM:
> -  /* ADRP/ADD (immediate).  */
> -  if (speed)
> -   *cost += extra_cost->alu.arith;
> +  *cost = COSTS_N_INSNS (3) / 4;
>return true;
>
>  case ZERO_EXTRACT:

Re: [PATCH] AArch64: Improve address rematerialization costs

2021-11-04 Thread Wilco Dijkstra via Gcc-patches


ping


From: Wilco Dijkstra
Sent: 02 June 2021 11:21
To: GCC Patches 
Cc: Kyrylo Tkachov ; Richard Sandiford 

Subject: [PATCH] AArch64: Improve address rematerialization costs 
 
Hi,

Given the large improvements from better register allocation of GOT accesses,
I decided to generalize it to get large gains for normal addressing too:

Improve rematerialization costs of addresses.  The current costs are set too 
high
which results in extra register pressure and spilling.  Using lower costs means
addresses will be rematerialized more often rather than being spilled or causing
spills.  This results in significant codesize reductions and performance gains.
SPECINT2017 improves by 0.27% with LTO and 0.16% without LTO.  Codesize is 0.12%
smaller.

Passes bootstrap and regress. OK for commit?

ChangeLog:
2021-06-01  Wilco Dijkstra  

    * config/aarch64/aarch64.c (aarch64_rtx_costs): Use better 
rematerialization
    costs for HIGH, LO_SUM and SYMBOL_REF.

---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
641c83b479e76cbcc75b299eb7ae5f634d9db7cd..08245827daa3f8199b29031e754244c078f0f500
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13444,45 +13444,22 @@ cost_plus:
   return false;  /* All arguments need to be in registers.  */
 }
 
-    case SYMBOL_REF:
+    /* The following costs are used for rematerialization of addresses.
+   Set a low cost for all global accesses - this ensures they are
+   preferred for rematerialization, blocks them from being spilled
+   and reduces register pressure.  The result is significant codesize
+   reductions and performance gains. */
 
-  if (aarch64_cmodel == AARCH64_CMODEL_LARGE
- || aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC)
-   {
- /* LDR.  */
- if (speed)
-   *cost += extra_cost->ldst.load;
-   }
-  else if (aarch64_cmodel == AARCH64_CMODEL_SMALL
-  || aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC)
-   {
- /* ADRP, followed by ADD.  */
- *cost += COSTS_N_INSNS (1);
- if (speed)
-   *cost += 2 * extra_cost->alu.arith;
-   }
-  else if (aarch64_cmodel == AARCH64_CMODEL_TINY
-  || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
-   {
- /* ADR.  */
- if (speed)
-   *cost += extra_cost->alu.arith;
-   }
-
-  if (flag_pic)
-   {
- /* One extra load instruction, after accessing the GOT.  */
- *cost += COSTS_N_INSNS (1);
- if (speed)
-   *cost += extra_cost->ldst.load;
-   }
+    case SYMBOL_REF:
+  *cost = 0;
   return true;
 
 case HIGH:
+  *cost = 0;
+  return true;
+
 case LO_SUM:
-  /* ADRP/ADD (immediate).  */
-  if (speed)
-   *cost += extra_cost->alu.arith;
+  *cost = COSTS_N_INSNS (3) / 4;
   return true;
 
 case ZERO_EXTRACT:

Re: [PATCH] AArch64: Improve address rematerialization costs

2021-10-20 Thread Wilco Dijkstra via Gcc-patches

ping


From: Wilco Dijkstra
Sent: 02 June 2021 11:21
To: GCC Patches 
Cc: Kyrylo Tkachov ; Richard Sandiford 

Subject: [PATCH] AArch64: Improve address rematerialization costs 
 
Hi,

Given the large improvements from better register allocation of GOT accesses,
I decided to generalize it to get large gains for normal addressing too:

Improve rematerialization costs of addresses.  The current costs are set too 
high
which results in extra register pressure and spilling.  Using lower costs means
addresses will be rematerialized more often rather than being spilled or causing
spills.  This results in significant codesize reductions and performance gains.
SPECINT2017 improves by 0.27% with LTO and 0.16% without LTO.  Codesize is 0.12%
smaller.

Passes bootstrap and regress. OK for commit?

ChangeLog:
2021-06-01  Wilco Dijkstra  

    * config/aarch64/aarch64.c (aarch64_rtx_costs): Use better 
rematerialization
    costs for HIGH, LO_SUM and SYMBOL_REF.

---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
641c83b479e76cbcc75b299eb7ae5f634d9db7cd..08245827daa3f8199b29031e754244c078f0f500
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13444,45 +13444,22 @@ cost_plus:
   return false;  /* All arguments need to be in registers.  */
 }
 
-    case SYMBOL_REF:
+    /* The following costs are used for rematerialization of addresses.
+   Set a low cost for all global accesses - this ensures they are
+   preferred for rematerialization, blocks them from being spilled
+   and reduces register pressure.  The result is significant codesize
+   reductions and performance gains. */
 
-  if (aarch64_cmodel == AARCH64_CMODEL_LARGE
- || aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC)
-   {
- /* LDR.  */
- if (speed)
-   *cost += extra_cost->ldst.load;
-   }
-  else if (aarch64_cmodel == AARCH64_CMODEL_SMALL
-  || aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC)
-   {
- /* ADRP, followed by ADD.  */
- *cost += COSTS_N_INSNS (1);
- if (speed)
-   *cost += 2 * extra_cost->alu.arith;
-   }
-  else if (aarch64_cmodel == AARCH64_CMODEL_TINY
-  || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
-   {
- /* ADR.  */
- if (speed)
-   *cost += extra_cost->alu.arith;
-   }
-
-  if (flag_pic)
-   {
- /* One extra load instruction, after accessing the GOT.  */
- *cost += COSTS_N_INSNS (1);
- if (speed)
-   *cost += extra_cost->ldst.load;
-   }
+    case SYMBOL_REF:
+  *cost = 0;
   return true;
 
 case HIGH:
+  *cost = 0;
+  return true;
+
 case LO_SUM:
-  /* ADRP/ADD (immediate).  */
-  if (speed)
-   *cost += extra_cost->alu.arith;
+  *cost = COSTS_N_INSNS (3) / 4;
   return true;
 
 case ZERO_EXTRACT:

Re: [PATCH] AArch64: Improve address rematerialization costs

2021-06-02 Thread Wilco Dijkstra via Gcc-patches

Hi Richard,

> No.  It's never correct to completely wipe out the existing cost - you 
> don't know the context where this is being used.
> 
> The most you can do is not add any additional cost.

Remember that aarch64_rtx_costs starts like this:

  /* By default, assume that everything has equivalent cost to the
 cheapest instruction.  Any additional costs are applied as a delta
 above this default.  */
  *cost = COSTS_N_INSNS (1);

This is literally the last statement executed before the big switch...
Given the cost is always initialized, there is no existing cost besides this
default value, and thus changing it to something else is not an issue.
We could of course do something like:

*cost -= COSTS_N_INSNS (1);

But that is less clear and problematic if the default value ever changes.

Cheers,
Wilco

Re: [PATCH] AArch64: Improve address rematerialization costs

2021-06-02 Thread Richard Earnshaw via Gcc-patches





On 02/06/2021 11:21, Wilco Dijkstra via Gcc-patches wrote:

Hi,

Given the large improvements from better register allocation of GOT accesses,
I decided to generalize it to get large gains for normal addressing too:

Improve rematerialization costs of addresses.  The current costs are set too 
high
which results in extra register pressure and spilling.  Using lower costs means
addresses will be rematerialized more often rather than being spilled or causing
spills.  This results in significant codesize reductions and performance gains.
SPECINT2017 improves by 0.27% with LTO and 0.16% without LTO.  Codesize is 0.12%
smaller.

Passes bootstrap and regress. OK for commit?

ChangeLog:
2021-06-01  Wilco Dijkstra  

 * config/aarch64/aarch64.c (aarch64_rtx_costs): Use better 
rematerialization
 costs for HIGH, LO_SUM and SYMBOL_REF.

---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
641c83b479e76cbcc75b299eb7ae5f634d9db7cd..08245827daa3f8199b29031e754244c078f0f500
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13444,45 +13444,22 @@ cost_plus:
  return false;  /* All arguments need to be in registers.  */
}
  
-case SYMBOL_REF:

+/* The following costs are used for rematerialization of addresses.
+   Set a low cost for all global accesses - this ensures they are
+   preferred for rematerialization, blocks them from being spilled
+   and reduces register pressure.  The result is significant codesize
+   reductions and performance gains. */
  
-  if (aarch64_cmodel == AARCH64_CMODEL_LARGE

- || aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC)
-   {
- /* LDR.  */
- if (speed)
-   *cost += extra_cost->ldst.load;
-   }
-  else if (aarch64_cmodel == AARCH64_CMODEL_SMALL
-  || aarch64_cmodel == AARCH64_CMODEL_SMALL_PIC)
-   {
- /* ADRP, followed by ADD.  */
- *cost += COSTS_N_INSNS (1);
- if (speed)
-   *cost += 2 * extra_cost->alu.arith;
-   }
-  else if (aarch64_cmodel == AARCH64_CMODEL_TINY
-  || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
-   {
- /* ADR.  */
- if (speed)
-   *cost += extra_cost->alu.arith;
-   }
-
-  if (flag_pic)
-   {
- /* One extra load instruction, after accessing the GOT.  */
- *cost += COSTS_N_INSNS (1);
- if (speed)
-   *cost += extra_cost->ldst.load;
-   }
+case SYMBOL_REF:
+  *cost = 0;
return true;


No.  It's never correct to completely wipe out the existing cost - you 
don't know the context where this is being used.


The most you can do is not add any additional cost.

Similarly for all the other cases.

  
  case HIGH:

+  *cost = 0;
+  return true;
+
  case LO_SUM:
-  /* ADRP/ADD (immediate).  */
-  if (speed)
-   *cost += extra_cost->alu.arith;
+  *cost = COSTS_N_INSNS (3) / 4;
return true;
  
  case ZERO_EXTRACT:

Re: [PATCH] AArch64: Improve address rematerialization costs

Re: [PATCH] AArch64: Improve address rematerialization costs

Re: [PATCH] AArch64: Improve address rematerialization costs

Re: [PATCH] AArch64: Improve address rematerialization costs

Re: [PATCH] AArch64: Improve address rematerialization costs

Re: [PATCH] AArch64: Improve address rematerialization costs

Re: [PATCH] AArch64: Improve address rematerialization costs

Re: [PATCH] AArch64: Improve address rematerialization costs

Re: [PATCH] AArch64: Improve address rematerialization costs

Re: [PATCH] AArch64: Improve address rematerialization costs

Re: [PATCH] AArch64: Improve address rematerialization costs

Re: [PATCH] AArch64: Improve address rematerialization costs

Re: [PATCH] AArch64: Improve address rematerialization costs

Re: [PATCH] AArch64: Improve address rematerialization costs

Re: [PATCH] AArch64: Improve address rematerialization costs

Re: [PATCH] AArch64: Improve address rematerialization costs

Re: [PATCH] AArch64: Improve address rematerialization costs

Re: [PATCH] AArch64: Improve address rematerialization costs

18 matches

Site Navigation

Mail list logo

Footer information