Thanks for the clarification, Korey.  I was just in the middle of
composing an email asking for exactly this kind of explanation.

There's still one aspect that confuses me though: why does translation
need to be separated out of xc->read() and xc->write()?  The model for
all the other CPUs is that translation is part of that process.  I can
believe that in the InOrder model you may want to separate the
translation and the cache access into separate cycles, but I think
that could be done without involving the StaticInst at all.  Basically
I'd think you could view the call to xc->read() or xc->write() from
initiateAcc "kicks off" the access, but whether the translation and
cache access both get started right away or they happen in separate
phases is up to the CPU model.  (And as I think has been already
mentioned, the fact that they can happen right away is really a broken
historical artifact of using atomic rather than the new timing-based
translation call.)

OK, so as I wrote that last paragraph, one complication became
apparent to me: I think what I wrote applies for read(), but possibly
not for write(), as the EA computation and the translation could both
be kicked off before the store data is available.  This leads me to my
second question: now that EAComp is not a separate sub-instruction
with its own source operand list, how do you distinguish the EA
operands from the store data operand to allow the EA computation to
possibly proceed before the store data is ready?

Right now I can't say definitively that I have a better solution than
your #1, but I think if we consider all the issues together (your
needs for the in-order model, the need to transition O3 to the
timing-based translation model, etc.) we should be able to come up
with something a little cleaner.

Steve

On Sat, Apr 11, 2009 at 10:43 AM, Korey Sewell <ksew...@umich.edu> wrote:
> I'll attempt to paint a clearer picture of the situation here (sorry if I
> confused earlier):
>
> Typically, a M5 CPU Model wants to do the TLB translation and the Memory
> Access on the same cycle. This is triggered by an instruction calling
> "execute" (SimpleCPU models) or "initiateAcc" (O3CPU).
>
> In the execute (or initiateAcc) functions of an instruction, there is a call
> to read or write which as Steve mentions below interfaces with the CPU
> model. Typically, the function will look something like this:
> "
>  Fault Ldl_l::execute(AtomicSimpleCPU *xc,
>                                   Trace::InstRecord *traceData) const
>     {
> ...
>        if (fault == NoFault) {
>             fault = xc->read(EA, (uint32_t&)Mem, memAccessFlags);
>              Ra = Mem; ;
>         }
> ...
> }
> "
>
> The key thing to note is that the in that read() call the instruction is
> giving the size of the access and the flags for that access as arguments.
> Those arguments are member variables to the instruction object.
>
> Because the instruction gave the CPU models those local variables, the
> AtomicSimpleCPU::read() function has all the info it needs to create the
> memory request object and then use that request to make the translation and
> finally do the data access.
>
> Now, the problem here is what if you want to do the translation without
> doing the read access? You are going to need the size of the access and also
> the flags for that access.
>
> So how do we do that and at the same time keep the basic framework that
> Steve outlines below?
>
> #1. My current solution was to create accessors for the size() and flags
> inside the instruction. That way, the CPU model can just query for that info
> upon needing to create a request for that object to do a translation or to
> do a data access. However, as Steve notes this kind of goes out of bounds of
> how M5 traditionally interfaces with the CPU.
>
> #2. Another solution might be to create a "translate" function for an
> instruction and interface similar to how the read function works. The
> translate function would need to calculate the effective address based on
> the instruction's operands and then it could call back to the CPU model. The
> thing that I wouldnt be a fan of is that it would potentially mark the 2nd
> place where we are unneccessarily computing the EA (1st is the initiateAcc).
> Something like:
> "
>  Fault Ldl_l::translate(AtomicSimpleCPU *xc,
>                                   Trace::InstRecord *traceData) const
>     {
> ...
>        if (fault == NoFault) {
>             fault = xc->translate(EA, (uint32_t&)Mem, memAccessFlags);
>         }
> ...
> }
> "
>
> #3. Anyone else?
>
> So I hope that sums up the problem and what I was trying to do to fix it.
> There problem is a better way to do it then #1, but I'm just not sure there
> so anybody's thoughts would be welcomed. Once we come to consensus, then
> I'll commit the patches.
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to