On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch is the backend piece of a solution to PRs 101955 and 106245,
> that adds a define_insn_and_split to the i386 backend, to perform sign
> extension of a single (least significant) bit using AND $1 then NEG.
>
> Previously, (x<<31)>>31 would be generated as
>
>         sall    $31, %eax       // 3 bytes
>         sarl    $31, %eax       // 3 bytes
>
> with this patch the backend now generates:
>
>         andl    $1, %eax        // 3 bytes
>         negl    %eax            // 2 bytes
>
> Not only is this smaller in size, but microbenchmarking confirms
> that it's a performance win on both Intel and AMD; Intel sees only a
> 2% improvement (perhaps just a size effect), but AMD sees a 7% win.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-10-17  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR middle-end/101955
>         PR tree-optimization/106245
>         * config/i386/i386.md (*extv<mode>_1_0): New define_insn_and_split.
>
> gcc/testsuite/ChangeLog
>         PR middle-end/101955
>         PR tree-optimization/106245
>         * gcc.target/i386/pr106245-2.c: New test case.
>         * gcc.target/i386/pr106245-3.c: New 32-bit test case.
>         * gcc.target/i386/pr106245-4.c: New 64-bit test case.
>         * gcc.target/i386/pr106245-5.c: Likewise.

+;; Split sign-extension of single least significant bit as and x,$1;neg x
+(define_insn_and_split "*extv<mode>_1_0"
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+ (sign_extract:SWI48 (match_operand:SWI48 1 "register_operand" "0")
+    (const_int 1)
+    (const_int 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  ""
+  "#"
+  "&& 1"

No need to use "&&" for an empty insn constraint. Just use
"reload_completed" in this case.

+  [(parallel [(set (match_dup 0) (and:SWI48 (match_dup 1) (const_int 1)))
+      (clobber (reg:CC FLAGS_REG))])
+   (parallel [(set (match_dup 0) (neg:SWI48 (match_dup 0)))
+      (clobber (reg:CC FLAGS_REG))])])

Did you intend to split this after reload? If this is the case, then
reload_completed is missing.

Uros.

Reply via email to