https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119091
Milan Tripkovic <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #4 from Milan Tripkovic <[email protected]> --- In the cse1 pass (as seen in this Godbolt link: https://godbolt.org/z/bYGbffTj9), the compiler performs early pattern recognition that leads to suboptimal code generation. The following transformations occur: Instruction Merge 1: set r140 (mv) + add const 0x101 is transformed into set r141 const 0x1010101. Transformation: mv + add -> mvconst_internal Instruction Merge 2: set r141 + ashift const 0x20 is transformed into set r139 const 0x101010100000000. Transformation: mvconst_internal (mv + add) + shift -> mvconst_internal When these patterns are eventually processed by the split1 pass, they expand into five instructions: MV, ADD, MV, ADD, and SHIFT. This results in significant instruction redundancy. We tried to fix it by disabling mvconst_internal patern recognition in recog.cc:insn_invalid_p for cse pass and chanes.cc:recog_level2 for fwprop pass and it disable the patern till combine pass. New RTL State at the start of the combine pass: ``` (note 5 0 4 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 4 5 8 2 NOTE_INSN_FUNCTION_BEG) (insn 8 4 9 2 (set (reg:DI 140) (const_int 16842752 [0x1010000])) "1.c":8:3 275 {*movdi_64bit}) (insn 9 8 10 2 (set (reg:DI 141) (plus:DI (reg:DI 140) (const_int 257 [0x101]))) "1.c":8:3 5 {*adddi3} (expr_list:REG_EQUAL (const_int 16843009 [0x1010101]))) (insn 10 9 11 2 (set (reg:DI 139) (ashift:DI (reg:DI 141) (const_int 32 [0x20]))) "1.c":8:3 297 {ashldi3} (expr_list:REG_EQUAL (const_int 72340172821233664 [0x101010100000000]))) (insn 11 10 14 2 (set (reg:DI 138 [ t ]) (asm_operands:DI ("") ("=r") 0 [(reg:DI 139)] ...))) (insn 14 11 19 2 (set (reg:DI 142 [ _2 ]) (ior:DI (reg:DI 138 [ t ]) (reg:DI 141))) "1.c":9:12 107 {*iordi3}) ``` By delaying the transformation, the combine pass handles the logic more efficiently: It first merges insn 8 and insn 9 into a single set: (set (reg:DI 141) (const_int 16843009 [0x1010101])). It then recognizes the mvconst_internal pattern (mv + shift). Consequently, only one mvconst_internal is generated. Final Result: During the split1 pass, this will expand into only three instructions (MV, ADD, SHIFT) instead of five, successfully eliminating the redundancy. Is this diagnosis of the root cause as premature pattern recognition in cse1 correct? If not, what direction should be taken to properly address this issue?
