On Sun, Jun 5, 2022 at 7:19 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch extends the recent and;cmp to not;test optimization to also
> perform this transformation for TImode on TARGET_64BIT and DImode on -m32,
> One motivation for this is that it's a step to fixing the current failure
> of gcc.target/i386/pr65105-5.c on -m32.
>
> A more direct benefit for x86_64 is that the following code:
>
> int foo(__int128 x, __int128 y)
> {
>   return (x & y) == y;
> }
>
> improves (with -O2 -mbmi) from:
>
>         movq    %rdi, %r8
>         movq    %rsi, %rdi
>         movq    %rdx, %rsi
>         andq    %rcx, %rdi
>         movq    %r8, %rax
>         andq    %rdx, %rax
>         movq    %rdi, %rdx
>         xorq    %rsi, %rax
>         xorq    %rcx, %rdx
>         orq     %rdx, %rax
>         sete    %al
>         movzbl  %al, %eax
>         ret
>
> to the much better:
>
>         movq    %rdi, %r8
>         movq    %rsi, %rdi
>         andn    %rdx, %r8, %rax
>         andn    %rcx, %rdi, %rsi
>         orq     %rsi, %rax
>         sete    %al
>         movzbl  %al, %eax
>         ret
>
> The major theme of this patch is to generalize many of i386.md's
> *di3_doubleword patterns to become *<dwi>_doubleword patterns, i.e.
> whenever there exists a "double word" optimization for DImode with -m32,
> there should be an equivalent TImode optimization on TARGET_64BIT.

No, please do not mix two different themes in one patch.

OTOH, the only TImode optimization that can be used with SSE registers
is with logic instructions and some constant shifts, but there is no
TImode arithmetic. I assume your end goal is to introduce STV for
TImode on 64-bit targets, because DImode patterns for x86_32 were
introduced to avoid early decomposition by middle end and to split
instructions that STV didn't convert to vector instructions after STV
pass. So, let's start with basic V1TImode support before optimizations
are introduced.

Uros.

> The following patch has been tested on x86_64-pc-linux-gnu with
> make bootstrap and make -k check, where on TARGET_64BIT there are
> no new failures, but paradoxically with --target_board=unix{-m32}
> the other dg-final clause in gcc.target/i386/pr65105-5.c now fails.
> Counter-intuitively, this is progress, and pr65105-5.c may now be
> fixed (without using peephole2) simply by tweaking the STV pass to
> handle andn/test (in a follow-up patch).
> OK for mainline?
>
>
> 2022-06-05  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.cc (ix86_rtx_costs) <COMPARE>: Provide costs
>         for double word comparisons and tests (comparisons against zero).
>         * config/i386/i386.md (*test<mode>_not_doubleword): Split DWI
>         and;cmp into andn;cmp $0 as a pre-reload splitter.
>         (define_expand and<mode>3): Generalize from SWIM1248x to SWIDWI.
>         (define_insn_and_split "*anddi3_doubleword"): Rename/generalize...
>         (define_insn_and_split "*and<dwi>3_doubleword"): ... to this.
>         (define_insn "*andndi3_doubleword"): Rename and generalize...
>         (define_insn "*andn<mode>3_doubleword): ... to this.
>         (define_split): Split andn when TARGET_BMI for both <DWI> modes.
>         (define_split): Split andn when !TARGET_BMI for both <DWI> modes.
>         (define_expand <any_or><mode>3): Generalize from SWIM1248x to
> SWIDWI.
>         (define_insn_and_split "*<any_or><dwi>3_doubleword): Generalize
>         from DI mode to both <DWI> modes.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/testnot-3.c: New test case.
>
>
> Thanks again,
> Roger
> --
>

Reply via email to