RE: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.

2021-08-23 Thread Roger Sayle

Many thanks.  Here’s the version that I've committed with a ??? comment as
requested (even a no-op else clause to make the logic easier to understand).

2021-08-24  Roger Sayle  
Richard Biener  

gcc/ChangeLog
* config/i386/i386-features.c (compute_convert_gain): Provide
more accurate values for CONST_INT, when optimizing for size.
* config/i386/i386.c (COSTS_N_BYTES): Move definition from here...
* config/i386/i386.h (COSTS_N_BYTES): to here.

Cheers,
Roger
--

-Original Message-
From: Richard Biener  
Sent: 23 August 2021 14:47
To: Roger Sayle 
Cc: GCC Patches 
Subject: Re: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.

On Fri, Aug 20, 2021 at 9:55 PM Roger Sayle  wrote:
> Hi Richard,
>
> Benchmarking this patch using CSiBE on x86_64-pc-linux-gnu with -Os -m32 
> saves 2432 bytes.
> Of the 893 tests, 34 have size differences, 30 are improvements, 4 are 
> regressions (of a few bytes).
>
> > Also I'm missing a 'else' - in the default case there's no cost/benefit of 
> > using SSE vs. GPR regs?
> > For SSE it would be a constant pool load.
>
> The code size regression  I primarily wanted to tackle was the zero 
> vs. non-zero case when dealing with immediate operands, which was the 
> piece affected by my and Jakub's xor improvements.
>
> Alas my first attempt to specify a non-zero gain in the default 
> (doesn't fit in SImode) case, increased the code size slightly.  The 
> use of the constant pool complicates things, as the number of times 
> the same value is used becomes an issue.  If the constant being loaded 
> is unique, then clearly the increase in constant pool size should 
> (ideally) be taken into account.  But if the same constant is used 
> multiple times in a chain (or is already in the constant pool), the 
> observed cost is much cheaper.  Empirically, a value of zero isn't a 
> poor choice, so the decision on whether to use vector instructions is shifted 
> to the gains from operations being performed, rather than the loading of 
> integer constants.  No doubt, like rtx_costs, these are free parameters that 
> future generations will continue to tweak and refine.
>
> Given that this patch reduces code size with -Os, both with and without -m32, 
> ok for mainline?

OK if you add a comment for the missing 'else'.

Thanks,
Richard.

> Thanks in advance,
> Roger
> --

diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
index d9c6652..5a99ea7 100644
--- a/gcc/config/i386/i386-features.c
+++ b/gcc/config/i386/i386-features.c
@@ -610,12 +610,40 @@ general_scalar_chain::compute_convert_gain ()
 
  case CONST_INT:
if (REG_P (dst))
- /* DImode can be immediate for TARGET_64BIT and SImode always.  */
- igain += m * COSTS_N_INSNS (1);
+ {
+   if (optimize_insn_for_size_p ())
+ {
+   /* xor (2 bytes) vs. xorps (3 bytes).  */
+   if (src == const0_rtx)
+ igain -= COSTS_N_BYTES (1);
+   /* movdi_internal vs. movv2di_internal.  */
+   /* => mov (5 bytes) vs. movaps (7 bytes).  */
+   else if (x86_64_immediate_operand (src, SImode))
+ igain -= COSTS_N_BYTES (2);
+   else
+ /* ??? Larger immediate constants are placed in the
+constant pool, where the size benefit/impact of
+STV conversion is affected by whether and how
+often each constant pool entry is shared/reused.
+The value below is empirically derived from the
+CSiBE benchmark (and the optimal value may drift
+over time).  */
+ igain += COSTS_N_BYTES (0);
+ }
+   else
+ {
+   /* DImode can be immediate for TARGET_64BIT
+  and SImode always.  */
+   igain += m * COSTS_N_INSNS (1);
+   igain -= vector_const_cost (src);
+ }
+ }
else if (MEM_P (dst))
- igain += (m * ix86_cost->int_store[2]
-   - ix86_cost->sse_store[sse_cost_idx]);
-   igain -= vector_const_cost (src);
+ {
+   igain += (m * ix86_cost->int_store[2]
+ - ix86_cost->sse_store[sse_cost_idx]);
+   igain -= vector_const_cost (src);
+ }
break;
 
  default:
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4d4ab6a..5abf2a6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19982,8 +19982,6 @@ ix86_division_cost (const struct processor_costs *cost,
 return cost->divid

Re: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.

2021-08-23 Thread Richard Biener via Gcc-patches
On Fri, Aug 20, 2021 at 9:55 PM Roger Sayle  wrote:
>
>
> Hi Richard,
>
> Benchmarking this patch using CSiBE on x86_64-pc-linux-gnu with -Os -m32 
> saves 2432 bytes.
> Of the 893 tests, 34 have size differences, 30 are improvements, 4 are 
> regressions (of a few bytes).
>
> > Also I'm missing a 'else' - in the default case there's no cost/benefit of 
> > using SSE vs. GPR regs?
> > For SSE it would be a constant pool load.
>
> The code size regression  I primarily wanted to tackle was the zero vs. 
> non-zero case when
> dealing with immediate operands, which was the piece affected by my and 
> Jakub's xor
> improvements.
>
> Alas my first attempt to specify a non-zero gain in the default (doesn't fit 
> in SImode) case,
> increased the code size slightly.  The use of the constant pool complicates 
> things, as the number
> of times the same value is used becomes an issue.  If the constant being 
> loaded is unique, then
> clearly the increase in constant pool size should (ideally) be taken into 
> account.  But if the same
> constant is used multiple times in a chain (or is already in the constant 
> pool), the observed cost
> is much cheaper.  Empirically, a value of zero isn't a poor choice, so the 
> decision on whether to
> use vector instructions is shifted to the gains from operations being 
> performed, rather than the
> loading of integer constants.  No doubt, like rtx_costs, these are free 
> parameters that future
> generations will continue to tweak and refine.
>
> Given that this patch reduces code size with -Os, both with and without -m32, 
> ok for mainline?

OK if you add a comment for the missing 'else'.

Thanks,
Richard.

> Thanks in advance,
> Roger
> --
>
> -Original Message-
> From: Richard Biener 
> Sent: 20 August 2021 08:29
> To: Roger Sayle 
> Cc: GCC Patches 
> Subject: Re: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.
>
> On Thu, Aug 19, 2021 at 6:01 PM Roger Sayle  
> wrote:
> >
> >
> > Doh!  ENOPATCH.
> >
> > -Original Message-
> > From: Roger Sayle 
> > Sent: 19 August 2021 16:59
> > To: 'GCC Patches' 
> > Subject: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.
> >
> >
> > Back in June I briefly mentioned in one of my gcc-patches posts that a
> > change that should have always reduced code size, would mysteriously
> > occasionally result in slightly larger code (according to CSiBE):
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573233.html
> >
> > Investigating further, the cause turns out to be that x86_64's
> > scalar-to-vector (stv) pass is relying on poor estimates of the size
> > costs/benefits.  This patch tweaks the backend's compute_convert_gain
> > method to provide slightly more accurate values when compiling with -Os.
> > Compilation without -Os is (should be) unaffected.  And for
> > completeness, I'll mention that the stv pass is a net win for code
> > size so it's much better to improve its heuristics than simply gate
> > the pass on !optimize_for_size.
> >
> > The net effect of this change is to save 1399 bytes on the CSiBE code
> > size benchmark when compiling with -Os.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> > and "make -k check" with no new failures.
> >
> > Ok for mainline?
>
> +   /* xor (2 bytes) vs. xorps (3 bytes).  */
> +   if (src == const0_rtx)
> + igain -= COSTS_N_BYTES (1);
> +   /* movdi_internal vs. movv2di_internal.  */
> +   /* => mov (5 bytes) vs. movaps (7 bytes).  */
> +   else if (x86_64_immediate_operand (src, SImode))
> + igain -= COSTS_N_BYTES (2);
>
> doesn't it need two GPR xor for 32bit DImode and two mov?  Thus the non-SSE 
> cost should be times 'm'?  For const0_rtx we may eventually re-use the zero 
> reg for the high part so that is eventually correct.
>
> Also I'm missing a 'else' - in the default case there's no cost/benefit of 
> using SSE vs. GPR regs?  For SSE it would be a constant pool load.
>
> I also wonder, since I now see COSTS_N_BYTES for the first time (heh), 
> whether with -Os we'd need to replace all COSTS_N_INSNS (1) scaling with 
> COSTS_N_BYTES scaling?  OTOH costs_add_n_insns uses COSTS_N_INSNS for the 
> size part as well.
>
> That said, it looks like we're eventually mixing apples and oranges now or 
> even previously?
>
> Thanks,
> Richard.
>
> >
> >
> > 2021-08-19  Roger Sayle  
> >
> > gcc/ChangeLog
> > * config/i386/i386-features.c (compute_convert_gain): Provide
> > more accurate values for CONST_INT, when optimizing for size.
> > * config/i386/i386.c (COSTS_N_BYTES): Move definition from here...
> > * config/i386/i386.h (COSTS_N_BYTES): to here.
> >
> > Roger
> > --
> >
>


RE: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.

2021-08-20 Thread Roger Sayle


Hi Richard,

Benchmarking this patch using CSiBE on x86_64-pc-linux-gnu with -Os -m32 saves 
2432 bytes.
Of the 893 tests, 34 have size differences, 30 are improvements, 4 are 
regressions (of a few bytes).

> Also I'm missing a 'else' - in the default case there's no cost/benefit of 
> using SSE vs. GPR regs?
> For SSE it would be a constant pool load.

The code size regression  I primarily wanted to tackle was the zero vs. 
non-zero case when
dealing with immediate operands, which was the piece affected by my and Jakub's 
xor
improvements.

Alas my first attempt to specify a non-zero gain in the default (doesn't fit in 
SImode) case,
increased the code size slightly.  The use of the constant pool complicates 
things, as the number
of times the same value is used becomes an issue.  If the constant being loaded 
is unique, then
clearly the increase in constant pool size should (ideally) be taken into 
account.  But if the same
constant is used multiple times in a chain (or is already in the constant 
pool), the observed cost
is much cheaper.  Empirically, a value of zero isn't a poor choice, so the 
decision on whether to
use vector instructions is shifted to the gains from operations being 
performed, rather than the
loading of integer constants.  No doubt, like rtx_costs, these are free 
parameters that future
generations will continue to tweak and refine.

Given that this patch reduces code size with -Os, both with and without -m32, 
ok for mainline?

Thanks in advance,
Roger
--

-Original Message-
From: Richard Biener  
Sent: 20 August 2021 08:29
To: Roger Sayle 
Cc: GCC Patches 
Subject: Re: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.

On Thu, Aug 19, 2021 at 6:01 PM Roger Sayle  wrote:
>
>
> Doh!  ENOPATCH.
>
> -Original Message-
> From: Roger Sayle 
> Sent: 19 August 2021 16:59
> To: 'GCC Patches' 
> Subject: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.
>
>
> Back in June I briefly mentioned in one of my gcc-patches posts that a 
> change that should have always reduced code size, would mysteriously 
> occasionally result in slightly larger code (according to CSiBE):
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573233.html
>
> Investigating further, the cause turns out to be that x86_64's 
> scalar-to-vector (stv) pass is relying on poor estimates of the size 
> costs/benefits.  This patch tweaks the backend's compute_convert_gain 
> method to provide slightly more accurate values when compiling with -Os.
> Compilation without -Os is (should be) unaffected.  And for 
> completeness, I'll mention that the stv pass is a net win for code 
> size so it's much better to improve its heuristics than simply gate 
> the pass on !optimize_for_size.
>
> The net effect of this change is to save 1399 bytes on the CSiBE code 
> size benchmark when compiling with -Os.
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures.
>
> Ok for mainline?

+   /* xor (2 bytes) vs. xorps (3 bytes).  */
+   if (src == const0_rtx)
+ igain -= COSTS_N_BYTES (1);
+   /* movdi_internal vs. movv2di_internal.  */
+   /* => mov (5 bytes) vs. movaps (7 bytes).  */
+   else if (x86_64_immediate_operand (src, SImode))
+ igain -= COSTS_N_BYTES (2);

doesn't it need two GPR xor for 32bit DImode and two mov?  Thus the non-SSE 
cost should be times 'm'?  For const0_rtx we may eventually re-use the zero reg 
for the high part so that is eventually correct.

Also I'm missing a 'else' - in the default case there's no cost/benefit of 
using SSE vs. GPR regs?  For SSE it would be a constant pool load.

I also wonder, since I now see COSTS_N_BYTES for the first time (heh), whether 
with -Os we'd need to replace all COSTS_N_INSNS (1) scaling with COSTS_N_BYTES 
scaling?  OTOH costs_add_n_insns uses COSTS_N_INSNS for the size part as well.

That said, it looks like we're eventually mixing apples and oranges now or even 
previously?

Thanks,
Richard.

>
>
> 2021-08-19  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-features.c (compute_convert_gain): Provide
> more accurate values for CONST_INT, when optimizing for size.
> * config/i386/i386.c (COSTS_N_BYTES): Move definition from here...
> * config/i386/i386.h (COSTS_N_BYTES): to here.
>
> Roger
> --
>



RE: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.

2021-08-20 Thread Roger Sayle


Hi Richard,
Fortunately, it's (moderately) safe to mix COSTS_N_INSNS and 
COSTS_N_BYTES in the i386 backend.  The average length of an
x86_64 instruction in typical code is between 2 and 3 bytes, so
the definition of N*4 for COSTS_N_INSNS(N) and N*2 for
COSTS_N_BYTES(N) allows these be mixed, with no less approximation
error than the more common problem, that rtx_costs usually
encodes cycle counts (but converted to units of COSTS_N_INSNS).

The thing that I like about STV's use of a "gain calculator" is that
it allows much more accurate fine tuning.  Many passes, particularly
combine, base their decisions of obtaining total cost estimates (with
large approximate values) and then comparing those two totals.
As any physicist will confirm, it's much better to parameterize on
the observed delta between those two large approximate values.

But you're also right, that I need to run CSiBE with -m32 and get back
to you with the results.

Roger
--

-Original Message-
From: Richard Biener  
Sent: 20 August 2021 08:29
To: Roger Sayle 
Cc: GCC Patches 
Subject: Re: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.

On Thu, Aug 19, 2021 at 6:01 PM Roger Sayle  wrote:
>
>
> Doh!  ENOPATCH.
>
> -Original Message-
> From: Roger Sayle 
> Sent: 19 August 2021 16:59
> To: 'GCC Patches' 
> Subject: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.
>
>
> Back in June I briefly mentioned in one of my gcc-patches posts that a 
> change that should have always reduced code size, would mysteriously 
> occasionally result in slightly larger code (according to CSiBE):
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573233.html
>
> Investigating further, the cause turns out to be that x86_64's 
> scalar-to-vector (stv) pass is relying on poor estimates of the size 
> costs/benefits.  This patch tweaks the backend's compute_convert_gain 
> method to provide slightly more accurate values when compiling with -Os.
> Compilation without -Os is (should be) unaffected.  And for 
> completeness, I'll mention that the stv pass is a net win for code 
> size so it's much better to improve its heuristics than simply gate 
> the pass on !optimize_for_size.
>
> The net effect of this change is to save 1399 bytes on the CSiBE code 
> size benchmark when compiling with -Os.
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures.
>
> Ok for mainline?

+   /* xor (2 bytes) vs. xorps (3 bytes).  */
+   if (src == const0_rtx)
+ igain -= COSTS_N_BYTES (1);
+   /* movdi_internal vs. movv2di_internal.  */
+   /* => mov (5 bytes) vs. movaps (7 bytes).  */
+   else if (x86_64_immediate_operand (src, SImode))
+ igain -= COSTS_N_BYTES (2);

doesn't it need two GPR xor for 32bit DImode and two mov?  Thus the non-SSE 
cost should be times 'm'?  For const0_rtx we may eventually re-use the zero reg 
for the high part so that is eventually correct.

Also I'm missing a 'else' - in the default case there's no cost/benefit of 
using SSE vs. GPR regs?  For SSE it would be a constant pool load.

I also wonder, since I now see COSTS_N_BYTES for the first time (heh), whether 
with -Os we'd need to replace all COSTS_N_INSNS (1) scaling with COSTS_N_BYTES 
scaling?  OTOH costs_add_n_insns uses COSTS_N_INSNS for the size part as well.

That said, it looks like we're eventually mixing apples and oranges now or even 
previously?

Thanks,
Richard.

>
>
> 2021-08-19  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-features.c (compute_convert_gain): Provide
> more accurate values for CONST_INT, when optimizing for size.
> * config/i386/i386.c (COSTS_N_BYTES): Move definition from here...
> * config/i386/i386.h (COSTS_N_BYTES): to here.
>
> Roger
> --
>



Re: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.

2021-08-20 Thread Richard Biener via Gcc-patches
On Thu, Aug 19, 2021 at 6:01 PM Roger Sayle  wrote:
>
>
> Doh!  ENOPATCH.
>
> -Original Message-
> From: Roger Sayle 
> Sent: 19 August 2021 16:59
> To: 'GCC Patches' 
> Subject: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.
>
>
> Back in June I briefly mentioned in one of my gcc-patches posts that a
> change that should have always reduced code size, would mysteriously
> occasionally result in slightly larger code (according to CSiBE):
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573233.html
>
> Investigating further, the cause turns out to be that x86_64's
> scalar-to-vector (stv) pass is relying on poor estimates of the size
> costs/benefits.  This patch tweaks the backend's compute_convert_gain method
> to provide slightly more accurate values when compiling with -Os.
> Compilation without -Os is (should be) unaffected.  And for completeness,
> I'll mention that the stv pass is a net win for code size so it's much
> better to improve its heuristics than simply gate the pass on
> !optimize_for_size.
>
> The net effect of this change is to save 1399 bytes on the CSiBE code size
> benchmark when compiling with -Os.
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures.
>
> Ok for mainline?

+   /* xor (2 bytes) vs. xorps (3 bytes).  */
+   if (src == const0_rtx)
+ igain -= COSTS_N_BYTES (1);
+   /* movdi_internal vs. movv2di_internal.  */
+   /* => mov (5 bytes) vs. movaps (7 bytes).  */
+   else if (x86_64_immediate_operand (src, SImode))
+ igain -= COSTS_N_BYTES (2);

doesn't it need two GPR xor for 32bit DImode and two mov?  Thus
the non-SSE cost should be times 'm'?  For const0_rtx we may
eventually re-use the zero reg for the high part so that is eventually
correct.

Also I'm missing a 'else' - in the default case there's no cost/benefit
of using SSE vs. GPR regs?  For SSE it would be a constant pool
load.

I also wonder, since I now see COSTS_N_BYTES for the first time (heh),
whether with -Os we'd need to replace all COSTS_N_INSNS (1)
scaling with COSTS_N_BYTES scaling?  OTOH costs_add_n_insns
uses COSTS_N_INSNS for the size part as well.

That said, it looks like we're eventually mixing apples and oranges
now or even previously?

Thanks,
Richard.

>
>
> 2021-08-19  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-features.c (compute_convert_gain): Provide
> more accurate values for CONST_INT, when optimizing for size.
> * config/i386/i386.c (COSTS_N_BYTES): Move definition from here...
> * config/i386/i386.h (COSTS_N_BYTES): to here.
>
> Roger
> --
>


RE: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.

2021-08-19 Thread Roger Sayle

Doh!  ENOPATCH.

-Original Message-
From: Roger Sayle  
Sent: 19 August 2021 16:59
To: 'GCC Patches' 
Subject: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.


Back in June I briefly mentioned in one of my gcc-patches posts that a
change that should have always reduced code size, would mysteriously
occasionally result in slightly larger code (according to CSiBE):
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573233.html

Investigating further, the cause turns out to be that x86_64's
scalar-to-vector (stv) pass is relying on poor estimates of the size
costs/benefits.  This patch tweaks the backend's compute_convert_gain method
to provide slightly more accurate values when compiling with -Os.
Compilation without -Os is (should be) unaffected.  And for completeness,
I'll mention that the stv pass is a net win for code size so it's much
better to improve its heuristics than simply gate the pass on
!optimize_for_size.

The net effect of this change is to save 1399 bytes on the CSiBE code size
benchmark when compiling with -Os.

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures.

Ok for mainline?


2021-08-19  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-features.c (compute_convert_gain): Provide
more accurate values for CONST_INT, when optimizing for size.
* config/i386/i386.c (COSTS_N_BYTES): Move definition from here...
* config/i386/i386.h (COSTS_N_BYTES): to here.

Roger
--

diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
index d9c6652..cdae3dc 100644
--- a/gcc/config/i386/i386-features.c
+++ b/gcc/config/i386/i386-features.c
@@ -610,12 +610,31 @@ general_scalar_chain::compute_convert_gain ()
 
  case CONST_INT:
if (REG_P (dst))
- /* DImode can be immediate for TARGET_64BIT and SImode always.  */
- igain += m * COSTS_N_INSNS (1);
+ {
+   if (optimize_insn_for_size_p ())
+ {
+   /* xor (2 bytes) vs. xorps (3 bytes).  */
+   if (src == const0_rtx)
+ igain -= COSTS_N_BYTES (1);
+   /* movdi_internal vs. movv2di_internal.  */
+   /* => mov (5 bytes) vs. movaps (7 bytes).  */
+   else if (x86_64_immediate_operand (src, SImode))
+ igain -= COSTS_N_BYTES (2);
+ }
+   else
+ {
+   /* DImode can be immediate for TARGET_64BIT
+  and SImode always.  */
+   igain += m * COSTS_N_INSNS (1);
+   igain -= vector_const_cost (src);
+ }
+ }
else if (MEM_P (dst))
- igain += (m * ix86_cost->int_store[2]
-   - ix86_cost->sse_store[sse_cost_idx]);
-   igain -= vector_const_cost (src);
+ {
+   igain += (m * ix86_cost->int_store[2]
+ - ix86_cost->sse_store[sse_cost_idx]);
+   igain -= vector_const_cost (src);
+ }
break;
 
  default:
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4d4ab6a..5abf2a6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19982,8 +19982,6 @@ ix86_division_cost (const struct processor_costs *cost,
 return cost->divide[MODE_INDEX (mode)];
 }
 
-#define COSTS_N_BYTES(N) ((N) * 2)
-
 /* Return cost of shift in MODE.
If CONSTANT_OP1 is true, the op1 value is known and set in OP1_VAL.
AND_IN_OP1 specify in op1 is result of and and SHIFT_AND_TRUNCATE
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 21fe51b..edbfcaf 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -88,6 +88,11 @@ struct stringop_algs
   } size [MAX_STRINGOP_ALGS];
 };
 
+/* Analog of COSTS_N_INSNS when optimizing for size.  */
+#ifndef COSTS_N_BYTES
+#define COSTS_N_BYTES(N) ((N) * 2)
+#endif
+
 /* Define the specific costs for a given cpu.  NB: hard_register is used
by TARGET_REGISTER_MOVE_COST and TARGET_MEMORY_MOVE_COST to compute
hard register move costs by register allocator.  Relative costs of