https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords|needs-bisection | --- Comment #17 from Jakub Jelinek <jakub at gcc dot gnu.org> --- So, I've tried: --- gcc/config/i386/i386.md.jj 2022-06-13 10:53:26.739290704 +0200 +++ gcc/config/i386/i386.md 2022-06-14 11:09:24.467024047 +0200 @@ -13734,14 +13734,13 @@ ;; shift instructions and a scratch register. (define_insn_and_split "ix86_rotl<dwi>3_doubleword" - [(set (match_operand:<DWI> 0 "register_operand" "=r") - (rotate:<DWI> (match_operand:<DWI> 1 "register_operand" "0") - (match_operand:QI 2 "<shift_immediate_operand>" "<S>"))) - (clobber (reg:CC FLAGS_REG)) - (clobber (match_scratch:DWIH 3 "=&r"))] - "" + [(set (match_operand:<DWI> 0 "register_operand") + (rotate:<DWI> (match_operand:<DWI> 1 "register_operand") + (match_operand:QI 2 "<shift_immediate_operand>"))) + (clobber (reg:CC FLAGS_REG))] + "ix86_pre_reload_split ()" "#" - "reload_completed" + "&& 1" [(set (match_dup 3) (match_dup 4)) (parallel [(set (match_dup 4) @@ -13764,6 +13763,7 @@ (match_dup 6)))) 0))) (clobber (reg:CC FLAGS_REG))])] { + operands[3] = gen_reg_rtx (<MODE>mode); operands[6] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - 1); operands[7] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode)); @@ -13771,14 +13771,13 @@ }) (define_insn_and_split "ix86_rotr<dwi>3_doubleword" - [(set (match_operand:<DWI> 0 "register_operand" "=r") - (rotatert:<DWI> (match_operand:<DWI> 1 "register_operand" "0") - (match_operand:QI 2 "<shift_immediate_operand>" "<S>"))) - (clobber (reg:CC FLAGS_REG)) - (clobber (match_scratch:DWIH 3 "=&r"))] - "" + [(set (match_operand:<DWI> 0 "register_operand") + (rotatert:<DWI> (match_operand:<DWI> 1 "register_operand") + (match_operand:QI 2 "<shift_immediate_operand>"))) + (clobber (reg:CC FLAGS_REG))] + "ix86_pre_reload_split ()" "#" - "reload_completed" + "&& 1" [(set (match_dup 3) (match_dup 4)) (parallel [(set (match_dup 4) @@ -13801,6 +13800,7 @@ (match_dup 6)))) 0))) (clobber (reg:CC FLAGS_REG))])] { + operands[3] = gen_reg_rtx (<MODE>mode); operands[6] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - 1); operands[7] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode)); On the #c0 test with -O2 -m32 -mno-mmx -mno-sse it makes some difference, but not as much as one would hope for: Numbers from gcc 11.3.1 20220614, 11.3.1 20220614 with the patch, 13.0.0 20220610, 13.0.0 20220614 with the patch: sub on %esp 428 2556 2620 2556 fn size in B 21657 23186 28413 23534 .s lines 6199 3942 7260 4198 So, trunk patched with the above patch results in significantly fewer instructions, but larger (more of them use 32-bit immediates, mostly in form of whatever(%esp) memory source operand). And the stack usage is high. I think the patch is still a good idea, it gives the RA more options, but we should investigate why it consumes so much more stack and results in larger code.