On Thu, 6 Nov 2025, Richard Braun wrote:
> On Thu, Nov 06, 2025 at 09:50:24AM +0300, Alexander Monakov wrote:
> > Try define_bypass (it's "common" use is for situations where latency
> > between instructions A and B is shorter than A would have otherwise,
> > but using it for longer exceptional latency should work as well).
>
> I've looked at define_bypass, but my problem is that, according to
> its description, it only applies to instructions with data dependency,
> not functional unit (or reservation) conflicts. Am I wrong about that ?
You're not wrong, I didn't quite catch what you were asking (and didn't
inspect the PDF before replying), and likewise Benoit's suggestion
doesn't look appropriate either. Let me try again; you said,
> Specifically, mpydp has a latency of 4 cycles, but if it is followed
> by mpyspdp, then it's 7. See SPRUFE8B [1] 4.3.2 .M-Unit Constraints for
> the details.
So, looking at the PDF now, I see that mpydp latency, as commonly understood
and what GCC uses, is 10 cycles, from the first read in stage E1 to the last
write in stage E10. What you (and the PDF) refers to as "4 cycles" is the
period some execution resource is tied up for mpydp and becomes unavailable
for other instructions, blocking them. In GCC that is modeled as "unit
reservations", for mpydp we have
(define_insn_reservation "mpydp_m1n" 10
(and (eq_attr "type" "mpydp")
(and (eq_attr "cross" "n")
(and (eq_attr "units" "m")
(eq_attr "dest_regfile" "a"))))
"(m1)*4,nothing*4,m1w*2")
10 in the first line is the latency, the last line gives the unit reservations.
And for mpyspdp we have
(define_insn_reservation "mpyspdp_m1n" 7
(and (eq_attr "type" "mpyspdp")
(and (eq_attr "cross" "n")
(and (eq_attr "units" "m")
(eq_attr "dest_regfile" "a"))))
"(m1)*2,nothing*3,m1w*2")
which correctly models what's illustrated as 'Xw' in table 4-32: if mpydp issues
on cycle i, mpyspdp cannot issue on cycle i+4, because they would conflict on
writeback on cycle i+9.
I guess your issue is that it doesn't model what PDF calles 'Xu', and the
conflict there is that they would conflict on the multiplication circuit.
If so, you can introduce another unit to model that circuit, and replace the
appropriate parts of 'nothing*4' and 'nothing*3' with reservations of that unit.
HTH
Alexander