https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|needs-bisection             |

--- Comment #17 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So, I've tried:
--- gcc/config/i386/i386.md.jj  2022-06-13 10:53:26.739290704 +0200
+++ gcc/config/i386/i386.md     2022-06-14 11:09:24.467024047 +0200
@@ -13734,14 +13734,13 @@
 ;; shift instructions and a scratch register.

 (define_insn_and_split "ix86_rotl<dwi>3_doubleword"
- [(set (match_operand:<DWI> 0 "register_operand" "=r")
-       (rotate:<DWI> (match_operand:<DWI> 1 "register_operand" "0")
-                    (match_operand:QI 2 "<shift_immediate_operand>" "<S>")))
-  (clobber (reg:CC FLAGS_REG))
-  (clobber (match_scratch:DWIH 3 "=&r"))]
- ""
+ [(set (match_operand:<DWI> 0 "register_operand")
+       (rotate:<DWI> (match_operand:<DWI> 1 "register_operand")
+                    (match_operand:QI 2 "<shift_immediate_operand>")))
+  (clobber (reg:CC FLAGS_REG))]
+ "ix86_pre_reload_split ()"
  "#"
- "reload_completed"
+ "&& 1"
  [(set (match_dup 3) (match_dup 4))
   (parallel
    [(set (match_dup 4)
@@ -13764,6 +13763,7 @@
                                                       (match_dup 6)))) 0)))
     (clobber (reg:CC FLAGS_REG))])]
 {
+  operands[3] = gen_reg_rtx (<MODE>mode);
   operands[6] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - 1);
   operands[7] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode));

@@ -13771,14 +13771,13 @@
 })

 (define_insn_and_split "ix86_rotr<dwi>3_doubleword"
- [(set (match_operand:<DWI> 0 "register_operand" "=r")
-       (rotatert:<DWI> (match_operand:<DWI> 1 "register_operand" "0")
-                      (match_operand:QI 2 "<shift_immediate_operand>" "<S>")))
-  (clobber (reg:CC FLAGS_REG))
-  (clobber (match_scratch:DWIH 3 "=&r"))]
- ""
+ [(set (match_operand:<DWI> 0 "register_operand")
+       (rotatert:<DWI> (match_operand:<DWI> 1 "register_operand")
+                      (match_operand:QI 2 "<shift_immediate_operand>")))
+  (clobber (reg:CC FLAGS_REG))]
+ "ix86_pre_reload_split ()"
  "#"
- "reload_completed"
+ "&& 1"
  [(set (match_dup 3) (match_dup 4))
   (parallel
    [(set (match_dup 4)
@@ -13801,6 +13800,7 @@
                                                     (match_dup 6)))) 0)))
     (clobber (reg:CC FLAGS_REG))])]
 {
+  operands[3] = gen_reg_rtx (<MODE>mode);
   operands[6] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - 1);
   operands[7] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode));


On the #c0 test with -O2 -m32 -mno-mmx -mno-sse it makes some difference, but
not as much as one would hope for:
Numbers from gcc 11.3.1 20220614, 11.3.1 20220614 with the patch, 13.0.0
20220610, 13.0.0 20220614 with the patch:
sub on %esp    428      2556      2620      2556
fn size in B 21657     23186     28413     23534
.s lines      6199      3942      7260      4198
So, trunk patched with the above patch results in significantly fewer
instructions, but larger (more of them use 32-bit immediates, mostly in form of
whatever(%esp) memory source operand).
And the stack usage is high.

I think the patch is still a good idea, it gives the RA more options, but we
should investigate why it consumes so much more stack and results in larger
code.

Reply via email to