Thanks for the clarification, Korey. I was just in the middle of composing an email asking for exactly this kind of explanation.
There's still one aspect that confuses me though: why does translation need to be separated out of xc->read() and xc->write()? The model for all the other CPUs is that translation is part of that process. I can believe that in the InOrder model you may want to separate the translation and the cache access into separate cycles, but I think that could be done without involving the StaticInst at all. Basically I'd think you could view the call to xc->read() or xc->write() from initiateAcc "kicks off" the access, but whether the translation and cache access both get started right away or they happen in separate phases is up to the CPU model. (And as I think has been already mentioned, the fact that they can happen right away is really a broken historical artifact of using atomic rather than the new timing-based translation call.) OK, so as I wrote that last paragraph, one complication became apparent to me: I think what I wrote applies for read(), but possibly not for write(), as the EA computation and the translation could both be kicked off before the store data is available. This leads me to my second question: now that EAComp is not a separate sub-instruction with its own source operand list, how do you distinguish the EA operands from the store data operand to allow the EA computation to possibly proceed before the store data is ready? Right now I can't say definitively that I have a better solution than your #1, but I think if we consider all the issues together (your needs for the in-order model, the need to transition O3 to the timing-based translation model, etc.) we should be able to come up with something a little cleaner. Steve On Sat, Apr 11, 2009 at 10:43 AM, Korey Sewell <ksew...@umich.edu> wrote: > I'll attempt to paint a clearer picture of the situation here (sorry if I > confused earlier): > > Typically, a M5 CPU Model wants to do the TLB translation and the Memory > Access on the same cycle. This is triggered by an instruction calling > "execute" (SimpleCPU models) or "initiateAcc" (O3CPU). > > In the execute (or initiateAcc) functions of an instruction, there is a call > to read or write which as Steve mentions below interfaces with the CPU > model. Typically, the function will look something like this: > " > Fault Ldl_l::execute(AtomicSimpleCPU *xc, > Trace::InstRecord *traceData) const > { > ... > if (fault == NoFault) { > fault = xc->read(EA, (uint32_t&)Mem, memAccessFlags); > Ra = Mem; ; > } > ... > } > " > > The key thing to note is that the in that read() call the instruction is > giving the size of the access and the flags for that access as arguments. > Those arguments are member variables to the instruction object. > > Because the instruction gave the CPU models those local variables, the > AtomicSimpleCPU::read() function has all the info it needs to create the > memory request object and then use that request to make the translation and > finally do the data access. > > Now, the problem here is what if you want to do the translation without > doing the read access? You are going to need the size of the access and also > the flags for that access. > > So how do we do that and at the same time keep the basic framework that > Steve outlines below? > > #1. My current solution was to create accessors for the size() and flags > inside the instruction. That way, the CPU model can just query for that info > upon needing to create a request for that object to do a translation or to > do a data access. However, as Steve notes this kind of goes out of bounds of > how M5 traditionally interfaces with the CPU. > > #2. Another solution might be to create a "translate" function for an > instruction and interface similar to how the read function works. The > translate function would need to calculate the effective address based on > the instruction's operands and then it could call back to the CPU model. The > thing that I wouldnt be a fan of is that it would potentially mark the 2nd > place where we are unneccessarily computing the EA (1st is the initiateAcc). > Something like: > " > Fault Ldl_l::translate(AtomicSimpleCPU *xc, > Trace::InstRecord *traceData) const > { > ... > if (fault == NoFault) { > fault = xc->translate(EA, (uint32_t&)Mem, memAccessFlags); > } > ... > } > " > > #3. Anyone else? > > So I hope that sums up the problem and what I was trying to do to fix it. > There problem is a better way to do it then #1, but I'm just not sure there > so anybody's thoughts would be welcomed. Once we come to consensus, then > I'll commit the patches. _______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev