Re: [m5-dev] [PATCH] CPU: Defer completing an access until we're no longer running out of initiateAcc

Steve Reinhardt Wed, 06 May 2009 17:28:23 -0700

I actually looked at the code a bit this time; and I have a hypothesis that
the problem arises from two similar but fundamentally different models of
"bypassing" potential event-based delays:

main() {
    x_will_callback = x();
    if (!x_will_callback) y();
}

x() {
    if (...) { sched_callback(&cb); return true; }
    else { return false; }
}

cb() { y(); }

as opposed to:

main() { x(); }

x() {
    if (...) { sched_callback(&cb); }
    else { y(); /* or cb(); */ }
}

cb() { y(); }

Both of these have the overall effect of calling x() and then y(), sometimes
with a delay and sometimes not.  However in the latter case y() is called
from inside the call to x(), which leads to problems when that's not
expected... basically this is the root of the initiateAcc/completeAcc
problem.  Also if there's a cycle (like there is in our pipeline) where you
do x,y,z,x,y,z,x,y,z then as Gabe points out you can run into stack overflow
problems too.

My hypothesis is that the old TimingSimpleCPU code worked because it always
did the former, and Gabe has introduced two points that do the latter: one
in timingTranslate(), and one in fetch().  I think the right solution is
that for each of these we should either change it into the first model or
eliminate the bypass option altogether and always do a separately scheduled
callback.

I think the distinction of having main() call y() directly rather than
x_cb() is potentially important, as this gives you points where you can do
slightly different things depending on whether you did the event or bypassed
it.  It also (to me) provides some logical separation between "what comes
next" (the code in y()) and how you got there.

Coming at this from a different angle, while the code is getting
increasingly messy (or maybe just inherently complex), I'd say a significant
fraction of the complexity is dealing with cache/page-crossing memory
operations, which I don't think would be significantly improved by a global
restructuring.  (Let me know if anyone thinks otherwise.)  Thus I'm not too
keen on doing a significant restructuring since I think the code will still
be messy afterward.

On Wed, May 6, 2009 at 11:42 AM, Gabriel Michael Black <
gbl...@eecs.umich.edu> wrote:

> The example I mentioned would be if
> you have a microcode loop that doesn't touch memory to, for instance,
> stall until you get an interrupt or a countdown expires for a small
> delay.

Although I agree that it's good to avoid this possibility altogether, I'd
argue that any microcode loop like you describe is broken.  If for no other
reason than power dissipation I don't think you'd ever want to busy-wait in
a real system, and certainly even if you did we wouldn't want to write it
that way in m5 for performance reasons.

Steve

_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] [PATCH] CPU: Defer completing an access until we're no longer running out of initiateAcc

Reply via email to