On Fri, Jan 6, 2023 at 6:55 PM Takayuki 'January June' Suwa <jjsuwa_sys3...@yahoo.co.jp> wrote: > > This patch optimizes the operation of cutting and splicing two register > values at a specified bit position, in other words, combining (bitwise > ORing) bits 0 through (C-1) of the register with bits C through 31 > of the other, where C is the specified immediate integer 1 through 31. > > This typically applies to signedness copy of floating point number or > __builtin_return_address() if the windowed register ABI, and saves one > instruction compared to four shifts and a bitwise OR by the RTL > generation pass.
While I indeed see this kind of change, e.g.: - extui a3, a3, 27, 5 - slli a2, a2, 5 - srli a2, a2, 5 - slli a3, a3, 27 - or a2, a2, a3 + slli a2, a2, 5 + extui a3, a3, 27, 5 + ssai 5 + src a2, a3, a2 I also see the following: - movi.n a6, -4 - and a5, a5, a6 - extui a3, a3, 0, 2 - or a3, a3, a5 + srli a5, a5, 2 + slli a3, a3, 30 + ssai 30 + src a3, a5, a3 i.e. after the split there's the same number of instructions, but the new sequence is one byte longer than the original one because of the movi.n. Looking at a bunch of linux builds I observe a slight code size growth in call0 kernels and a slight code size reduction in windowed kernels. > gcc/ChangeLog: > > * config/xtensa/xtensa.md (*splice_bits): > New insn_and_split pattern. > --- > gcc/config/xtensa/xtensa.md | 47 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 47 insertions(+) > > diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md > index 0a26d3dccf4..36ec1b1918e 100644 > --- a/gcc/config/xtensa/xtensa.md > +++ b/gcc/config/xtensa/xtensa.md > @@ -746,6 +746,53 @@ > (set_attr "mode" "SI") > (set_attr "length" "3")]) > > +(define_insn_and_split "*splice_bits" > + [(set (match_operand:SI 0 "register_operand" "=a") > + (ior:SI (and:SI (match_operand:SI 1 "register_operand" "r") > + (match_operand:SI 3 "const_int_operand" "i")) > + (and:SI (match_operand:SI 2 "register_operand" "r") > + (match_operand:SI 4 "const_int_operand" "i"))))] > + > + "!optimize_debug && optimize > + && INTVAL (operands[3]) + INTVAL (operands[4]) == -1 > + && (exact_log2 (INTVAL (operands[3]) + 1) > 0 > + || exact_log2 (INTVAL (operands[4]) + 1) > 0)" > + "#" > + "&& can_create_pseudo_p ()" > + [(set (match_dup 5) > + (ashift:SI (match_dup 1) > + (match_dup 4))) > + (set (match_dup 6) > + (lshiftrt:SI (match_dup 2) > + (match_dup 3))) > + (set (match_dup 0) > + (ior:SI (lshiftrt:SI (match_dup 5) > + (match_dup 4)) > + (ashift:SI (match_dup 6) > + (match_dup 3))))] > +{ > + int shift; > + if (INTVAL (operands[3]) < 0) > + { > + rtx x; > + x = operands[1], operands[1] = operands[2], operands[2] = x; > + x = operands[3], operands[3] = operands[4], operands[4] = x; > + } > + shift = floor_log2 (INTVAL (operands[3]) + 1); > + operands[3] = GEN_INT (shift); > + operands[4] = GEN_INT (32 - shift); > + operands[5] = gen_reg_rtx (SImode); > + operands[6] = gen_reg_rtx (SImode); > +} > + [(set_attr "type" "arith") > + (set_attr "mode" "SI") > + (set (attr "length") > + (if_then_else (match_test "TARGET_DENSITY > + && (INTVAL (operands[3]) == 0x7FFFFFFF > + || INTVAL (operands[4]) == > 0x7FFFFFFF)") > + (const_int 11) > + (const_int 12)))]) I wonder how the length could be 11 here? I always see 4 3-byte instructions generated by this pattern. -- Thanks. -- Max