On 07/17/2012 12:51 PM, David Edelsohn wrote:
The change to power4-store-update looks incorrect or at least incomplete.

These reservations and others were changed by Vlad in March/April 2004
to fix a consistency check that he introduced at the time. Note that
the dispatch units for the final choice also is wrong and duplicated
from the previous line for power4-store-update:

    |(du3_power4+du4_power4,lsu2_power4)\
     |(du3_power4+du4_power4,lsu2_power4))+\

It looks like the final line should be

du4_power4+du1_power4,lsu1_power4

This 4th line should be removed altogether. If only the 4th dispatch slot is 
available and a cracked (or microcoded) insn is next, a nop will fill that slot 
and the cracked insn will start a new dispatch group.


On POWER4, the dispatch slot forces specific function units.  I don't
remember and don't have the documents handy to know if the function
units for store-update should be

2-1
2-2
1-1
1-2

or

2-1
2-2
1-2
1-1

I think it is the latter.

On Power4, the dispatch to issue/execution unit affinity is as follows (4 
dispatch slots, 2 FXU/LSU issue/execution units):
1 -> 1
2 -> 2
3 -> 2
4 -> 1

Update form stores are cracked into a store and addi, and the store is dual 
issued to the LSU/FXU. So based on this info I would think it should look like 
the following (untested).

@@ -141,12 +141,10 @@ (define_insn_reservation "power4-store-u
        (eq_attr "cpu" "power4"))
   "((du1_power4+du2_power4,lsu1_power4)\
     |(du2_power4+du3_power4,lsu2_power4)\
-    |(du3_power4+du4_power4,lsu2_power4)\
     |(du3_power4+du4_power4,lsu2_power4))+\
-   ((nothing,iu2_power4,iu1_power4)\
+   ((nothing,iu1_power4,iu2_power4)\
     |(nothing,iu2_power4,iu2_power4)\
-    |(nothing,iu1_power4,iu2_power4)\
-    |(nothing,iu1_power4,iu2_power4))")
+    |(nothing,iu2_power4,iu1_power4))")

-Pat


If you look at the thread from 2009, we discuss that the POWER4
scheduler description already is an approximation because an accurate
description creates an unreasonably large automata.  A lot of the
problem is the 1 cycle delay for dependent integer ops.

Reply via email to