On Fri, Jul 29, 2022 at 12:18 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch improves TImode STV by adding support for logical shifts by
> integer constants that are multiples of 8.  For the test case:
>
> __int128 a, b;
> void foo() { a = b << 16; }
>
> on x86_64, gcc -O2 currently generates:
>
>         movq    b(%rip), %rax
>         movq    b+8(%rip), %rdx
>         shldq   $16, %rax, %rdx
>         salq    $16, %rax
>         movq    %rax, a(%rip)
>         movq    %rdx, a+8(%rip)
>         ret
>
> with this patch we now generate:
>
>         movdqa  b(%rip), %xmm0
>         pslldq  $2, %xmm0
>         movaps  %xmm0, a(%rip)
>         ret
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check. both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-07-28  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386-features.cc (compute_convert_gain): Add gain
>         for converting suitable TImode shift to a V1TImode shift.
>         (timode_scalar_chain::convert_insn): Add support for converting
>         suitable ASHIFT and LSHIFTRT.
>         (timode_scalar_to_vector_candidate_p): Consider logical shifts
>         by integer constants that are multiples of 8 to be candidates.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/sse4_1-stv-7.c: New test case.

+ case ASHIFT:
+ case LSHIFTRT:
+  /* For logical shifts by constant multiples of 8. */
+  igain = optimize_insn_for_size_p () ? COSTS_N_BYTES (4)
+      : COSTS_N_INSNS (1);

Isn't the conversion an universal win for -O2 as well as for -Os? The
conversion to/from XMM register is already accounted for, so for -Os
substituting shldq/salq with pslldq should always be a win. I'd expect
the cost calculation to be similar to the
general_scalar_chain::compute_convert_gain cost calculation with m =
2.

Uros.

Reply via email to