https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84369

Roman Zhuykov <zhroma at ispras dot ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |zhroma at ispras dot ru

--- Comment #3 from Roman Zhuykov <zhroma at ispras dot ru> ---
I compared modulo-sched DDGs in “power8 vs power9” modes and main difference is
not combined instruction mentioned by Peter, but movsi_internal1 dependencies
cost. For this two instructions:

(insn 23 22 25 4 (set (mem:SI (plus:DI (reg/f:DI 126 [
regstat_n_sets_and_refs.1_9 ])
                (reg:DI 141 [ ivtmp.26 ])) [2 MEM[base:
regstat_n_sets_and_refs.1_9, index: ivtmp.26_35, offset: 0B]+0 S4 A32])
        (reg:SI 148 [ _7->n_refs ])) "sms-10.c":50:40 502 {*movsi_internal1}
     (expr_list:REG_DEAD (reg:SI 148 [ _7->n_refs ])
        (nil)))

(insn 32 31 33 4 (set (mem:SI (plus:DI (reg/f:DI 159)
                (reg:DI 141 [ ivtmp.26 ])) [2 MEM[base: _44, index:
ivtmp.26_35, offset: 0B]+0 S4 A32])
        (reg:SI 154)) "sms-10.c":51:40 502 {*movsi_internal1}
     (expr_list:REG_DEAD (reg:SI 154)
        (nil)))

insn_default_latency (and then insn_sched_cost) function returns significantly
different values: 5 for power8, 0 for power9.

There are other movsi_internal1 instructions in this loop, their cost also
differ, but it’s only 1 cycle “3 vs 4” change, hopefully it is correct.

The same thing (“5 vs 0” cost) also broke this test on 32-bit powerpc. There
are no combining difference there, only the cost issue, but it also prevents
modulo-sched to succeed.

I’m not familiar with .md files, so I ask somebody to look at “5 vs 0” issue.
If such cost difference is correct, then it seems a good solution just to skip
this test on power9 cpus.

Reply via email to