On 07/17/2012 12:51 PM, David Edelsohn wrote:
The change to power4-store-update looks incorrect or at least incomplete.
These reservations and others were changed by Vlad in March/April 2004
to fix a consistency check that he introduced at the time. Note that
the dispatch units for the final choice also is wrong and duplicated
from the previous line for power4-store-update:
|(du3_power4+du4_power4,lsu2_power4)\
|(du3_power4+du4_power4,lsu2_power4))+\
It looks like the final line should be
du4_power4+du1_power4,lsu1_power4
This 4th line should be removed altogether. If only the 4th dispatch slot is
available and a cracked (or microcoded) insn is next, a nop will fill that slot
and the cracked insn will start a new dispatch group.
On POWER4, the dispatch slot forces specific function units. I don't
remember and don't have the documents handy to know if the function
units for store-update should be
2-1
2-2
1-1
1-2
or
2-1
2-2
1-2
1-1
I think it is the latter.
On Power4, the dispatch to issue/execution unit affinity is as follows (4
dispatch slots, 2 FXU/LSU issue/execution units):
1 -> 1
2 -> 2
3 -> 2
4 -> 1
Update form stores are cracked into a store and addi, and the store is dual
issued to the LSU/FXU. So based on this info I would think it should look like
the following (untested).
@@ -141,12 +141,10 @@ (define_insn_reservation "power4-store-u
(eq_attr "cpu" "power4"))
"((du1_power4+du2_power4,lsu1_power4)\
|(du2_power4+du3_power4,lsu2_power4)\
- |(du3_power4+du4_power4,lsu2_power4)\
|(du3_power4+du4_power4,lsu2_power4))+\
- ((nothing,iu2_power4,iu1_power4)\
+ ((nothing,iu1_power4,iu2_power4)\
|(nothing,iu2_power4,iu2_power4)\
- |(nothing,iu1_power4,iu2_power4)\
- |(nothing,iu1_power4,iu2_power4))")
+ |(nothing,iu2_power4,iu1_power4))")
-Pat
If you look at the thread from 2009, we discuss that the POWER4
scheduler description already is an approximation because an accurate
description creates an unreasonably large automata. A lot of the
problem is the 1 cycle delay for dependent integer ops.