https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84369
Roman Zhuykov <zhroma at ispras dot ru> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |zhroma at ispras dot ru --- Comment #3 from Roman Zhuykov <zhroma at ispras dot ru> --- I compared modulo-sched DDGs in “power8 vs power9” modes and main difference is not combined instruction mentioned by Peter, but movsi_internal1 dependencies cost. For this two instructions: (insn 23 22 25 4 (set (mem:SI (plus:DI (reg/f:DI 126 [ regstat_n_sets_and_refs.1_9 ]) (reg:DI 141 [ ivtmp.26 ])) [2 MEM[base: regstat_n_sets_and_refs.1_9, index: ivtmp.26_35, offset: 0B]+0 S4 A32]) (reg:SI 148 [ _7->n_refs ])) "sms-10.c":50:40 502 {*movsi_internal1} (expr_list:REG_DEAD (reg:SI 148 [ _7->n_refs ]) (nil))) (insn 32 31 33 4 (set (mem:SI (plus:DI (reg/f:DI 159) (reg:DI 141 [ ivtmp.26 ])) [2 MEM[base: _44, index: ivtmp.26_35, offset: 0B]+0 S4 A32]) (reg:SI 154)) "sms-10.c":51:40 502 {*movsi_internal1} (expr_list:REG_DEAD (reg:SI 154) (nil))) insn_default_latency (and then insn_sched_cost) function returns significantly different values: 5 for power8, 0 for power9. There are other movsi_internal1 instructions in this loop, their cost also differ, but it’s only 1 cycle “3 vs 4” change, hopefully it is correct. The same thing (“5 vs 0” cost) also broke this test on 32-bit powerpc. There are no combining difference there, only the cost issue, but it also prevents modulo-sched to succeed. I’m not familiar with .md files, so I ask somebody to look at “5 vs 0” issue. If such cost difference is correct, then it seems a good solution just to skip this test on power9 cpus.