Re: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.

Richard Biener via Gcc-patches Mon, 23 Aug 2021 06:47:55 -0700

On Fri, Aug 20, 2021 at 9:55 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> Hi Richard,
>
> Benchmarking this patch using CSiBE on x86_64-pc-linux-gnu with -Os -m32 
> saves 2432 bytes.
> Of the 893 tests, 34 have size differences, 30 are improvements, 4 are 
> regressions (of a few bytes).
>
> > Also I'm missing a 'else' - in the default case there's no cost/benefit of 
> > using SSE vs. GPR regs?
> > For SSE it would be a constant pool load.
>
> The code size regression  I primarily wanted to tackle was the zero vs. 
> non-zero case when
> dealing with immediate operands, which was the piece affected by my and 
> Jakub's xor
> improvements.
>
> Alas my first attempt to specify a non-zero gain in the default (doesn't fit 
> in SImode) case,
> increased the code size slightly.  The use of the constant pool complicates 
> things, as the number
> of times the same value is used becomes an issue.  If the constant being 
> loaded is unique, then
> clearly the increase in constant pool size should (ideally) be taken into 
> account.  But if the same
> constant is used multiple times in a chain (or is already in the constant 
> pool), the observed cost
> is much cheaper.  Empirically, a value of zero isn't a poor choice, so the 
> decision on whether to
> use vector instructions is shifted to the gains from operations being 
> performed, rather than the
> loading of integer constants.  No doubt, like rtx_costs, these are free 
> parameters that future
> generations will continue to tweak and refine.
>
> Given that this patch reduces code size with -Os, both with and without -m32, 
> ok for mainline?


OK if you add a comment for the missing 'else'.

Thanks,
Richard.

> Thanks in advance,
> Roger
> --
>
> -----Original Message-----
> From: Richard Biener <richard.guent...@gmail.com>
> Sent: 20 August 2021 08:29
> To: Roger Sayle <ro...@nextmovesoftware.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.
>
> On Thu, Aug 19, 2021 at 6:01 PM Roger Sayle <ro...@nextmovesoftware.com> 
> wrote:
> >
> >
> > Doh!  ENOPATCH.
> >
> > -----Original Message-----
> > From: Roger Sayle <ro...@nextmovesoftware.com>
> > Sent: 19 August 2021 16:59
> > To: 'GCC Patches' <gcc-patches@gcc.gnu.org>
> > Subject: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.
> >
> >
> > Back in June I briefly mentioned in one of my gcc-patches posts that a
> > change that should have always reduced code size, would mysteriously
> > occasionally result in slightly larger code (according to CSiBE):
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573233.html
> >
> > Investigating further, the cause turns out to be that x86_64's
> > scalar-to-vector (stv) pass is relying on poor estimates of the size
> > costs/benefits.  This patch tweaks the backend's compute_convert_gain
> > method to provide slightly more accurate values when compiling with -Os.
> > Compilation without -Os is (should be) unaffected.  And for
> > completeness, I'll mention that the stv pass is a net win for code
> > size so it's much better to improve its heuristics than simply gate
> > the pass on !optimize_for_size.
> >
> > The net effect of this change is to save 1399 bytes on the CSiBE code
> > size benchmark when compiling with -Os.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> > and "make -k check" with no new failures.
> >
> > Ok for mainline?
>
> +                   /* xor (2 bytes) vs. xorps (3 bytes).  */
> +                   if (src == const0_rtx)
> +                     igain -= COSTS_N_BYTES (1);
> +                   /* movdi_internal vs. movv2di_internal.  */
> +                   /* => mov (5 bytes) vs. movaps (7 bytes).  */
> +                   else if (x86_64_immediate_operand (src, SImode))
> +                     igain -= COSTS_N_BYTES (2);
>
> doesn't it need two GPR xor for 32bit DImode and two mov?  Thus the non-SSE 
> cost should be times 'm'?  For const0_rtx we may eventually re-use the zero 
> reg for the high part so that is eventually correct.
>
> Also I'm missing a 'else' - in the default case there's no cost/benefit of 
> using SSE vs. GPR regs?  For SSE it would be a constant pool load.
>
> I also wonder, since I now see COSTS_N_BYTES for the first time (heh), 
> whether with -Os we'd need to replace all COSTS_N_INSNS (1) scaling with 
> COSTS_N_BYTES scaling?  OTOH costs_add_n_insns uses COSTS_N_INSNS for the 
> size part as well.
>
> That said, it looks like we're eventually mixing apples and oranges now or 
> even previously?
>
> Thanks,
> Richard.
>
> >
> >
> > 2021-08-19  Roger Sayle  <ro...@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * config/i386/i386-features.c (compute_convert_gain): Provide
> >         more accurate values for CONST_INT, when optimizing for size.
> >         * config/i386/i386.c (COSTS_N_BYTES): Move definition from here...
> >         * config/i386/i386.h (COSTS_N_BYTES): to here.
> >
> > Roger
> > --
> >
>

Re: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.

Reply via email to