On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch is the backend piece of a solution to PRs 101955 and 106245, > that adds a define_insn_and_split to the i386 backend, to perform sign > extension of a single (least significant) bit using AND $1 then NEG. > > Previously, (x<<31)>>31 would be generated as > > sall $31, %eax // 3 bytes > sarl $31, %eax // 3 bytes > > with this patch the backend now generates: > > andl $1, %eax // 3 bytes > negl %eax // 2 bytes > > Not only is this smaller in size, but microbenchmarking confirms > that it's a performance win on both Intel and AMD; Intel sees only a > 2% improvement (perhaps just a size effect), but AMD sees a 7% win. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2023-10-17 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > PR middle-end/101955 > PR tree-optimization/106245 > * config/i386/i386.md (*extv<mode>_1_0): New define_insn_and_split. > > gcc/testsuite/ChangeLog > PR middle-end/101955 > PR tree-optimization/106245 > * gcc.target/i386/pr106245-2.c: New test case. > * gcc.target/i386/pr106245-3.c: New 32-bit test case. > * gcc.target/i386/pr106245-4.c: New 64-bit test case. > * gcc.target/i386/pr106245-5.c: Likewise.
+;; Split sign-extension of single least significant bit as and x,$1;neg x +(define_insn_and_split "*extv<mode>_1_0" + [(set (match_operand:SWI48 0 "register_operand" "=r") + (sign_extract:SWI48 (match_operand:SWI48 1 "register_operand" "0") + (const_int 1) + (const_int 0))) + (clobber (reg:CC FLAGS_REG))] + "" + "#" + "&& 1" No need to use "&&" for an empty insn constraint. Just use "reload_completed" in this case. + [(parallel [(set (match_dup 0) (and:SWI48 (match_dup 1) (const_int 1))) + (clobber (reg:CC FLAGS_REG))]) + (parallel [(set (match_dup 0) (neg:SWI48 (match_dup 0))) + (clobber (reg:CC FLAGS_REG))])]) Did you intend to split this after reload? If this is the case, then reload_completed is missing. Uros.