On 26/11/2012, at 1:28 PM, Greg McGary wrote:

> I'm working onaport to a VLIW DSP with anexposed pipeline (i.e., no
> interlocks).  Some operations OPhave as much as 2-cycle latency on values
> of the call-preserved regs CPR.  E.g., if the callee's epiloguerestores a
> CPR in the delay slot of the return instruction, then any OP with that CPR
> as input needs to schedule 2 clocks after the call in order to get the
> expected value.  If OP schedules immediately after the call, then it will
> getthevalue the callee's value prior to the epilogue restore.
> 
> The easy, low-performance way to solve the problem is to schedule
> epilogues to restore CPRs before the return and its delay slot.  The
> harder, usually better performing way is to manage dependences in the
> caller so that uses of CPRs for OPs that require extra cycles schedule
> at sufficient distance from the call.
> 
> How shall I introduce these dependences for only the scheduler?  As an
> experiment, I added CLOBBERs to the call insn, which createdtrue
> depencences between the call and downstream instructions that read the
> CPRs, but had the undesired effect of perturbing dataflowacross calls.
> I'm thinking sched-depsneedsnew code for targets with
> TARGET_SCHED_EXPOSED_PIPELINE to add dependencesfor call-insn producers
> and CPR-user consumers.

You essentially need a fix-up pass just before the end of compilation 
(machine-dependent reorg, if memory serves me right) to space instructions 
consuming values from CPRs from the CALL_INSNS that set those CPRs.  I.e., for 
the 99% of compilation you don't care about this restriction, it's only the 
very last VLIW bundling and delay slot passes that need to know about it.

You, probably, want to make the 2nd scheduler pass run as machine-dependent 
reorg (as ia64 does) and enable an additional constraint (through scheduling 
bypass) for the scheduler DFA to space CALL_INSNs from their consumers for at 
least for 2 cycles.  One challenge here is that scheduler operates on basic 
blocks, and it is difficult to track dependencies across basic block 
boundaries.  To workaround basic-block scope of the scheduler you could emit 
dummy instructions at the beginning of basic blocks that have predecessors that 
end with CALL_INSNs.  These dummy instructions would set the appropriate 
registers (probably just assign the register to itself), and you will have a 
bypass (see define_bypass) between these dummy instructions and consumers to 
guarantee the 2-cycle delay.

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



Reply via email to