Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2010-03-22 Thread Steve Reinhardt
Thanks to Brad's prompting, I took another look at this tonight.

I think the fundamental problem is that the transition to
split-transaction translation didn't move all the operations that
depend on the translation completing into the translation callback or
something that was guaranteed to run after the translation callback.
The req-isUncacheable() test that Brad moved around in his recent
patch (which I commented on just a few minutes ago) is one example of
that.  Another case is that we're calling traceData-setData() in
read() even though the translation may not be done yet (how does that
work?  are we calling setData() again when the real data comes back,
so we haven't noticed that this is broken?).  The other traceData
examples (like setAddr() or setData() in write()) are more like
anti-dependences; there's no direct dependence on the translation
finishing, but there is a dependence on it not deleting the traceData
object.

So in other words the problem isn't that the finishTranslation() or
completeAcc() calls are happening too soon, it's that the code that
runs after the translation is initiated in read()/write() (or in
initiateAcc() from which it is called) should not be making
assumptions about whether finishTranslation() or completeAcc() has or
has not been called.  I think it's pretty easy to fix this, in some
cases by moving independent code up above where the translation is
initiated, and in other cases by moving dependent code into
finishTranslation().

Interestingly, there are a number of cases where we trace the store
data twice, once in the CPU's write() function and again in the ISA's
initiateAcc() function.  I propose to just get rid of the latter.

As I mentioned in the other email, I'm tempted to just get rid of the
recordEvent() calls, unless someone actually uses these.  I see that
Nate created the basic recordEvent() infrastructure in 2004, and right
now it's only uncached accesses and faults that get recorded, which
seems
very ad hoc and irregular.

I'll try and make these changes and test them as soon as I can.  I
thought I'd send this note out first though in case anyone has
feedback before I get started.

Steve
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2010-03-22 Thread Gabe Black
It's been a while since I've looked at this, but I want to make sure I
remember to respond so I don't want to wait until I have a chance to
re-research all this. Isn't the problem that initiateAcc ends up calling
completeAcc mid-function, that cleans up after the access, then
initateAcc gets control back and tries to finish initiating the access?
The problem to me seems to be that initiateAcc doesn't know whether it's
just letting the rest of the CPU know something needs to happen later,
or if it's actually causing that thing to happen right away. The other
part, completeAcc, doesn't know if it's happening in the middle of
initiateAcc, or if its in a callback. If it's in initiateAcc there are
things it (or maybe initiateAcc, is that what you're saying?) shouldn't
touch. If it's in a callback, it's likely the last thing to handle the
instruction/translation/packet/whatever, so it needs to be sure to clean
up everything. These seem like contradictory responsibilities. What I
had been advocating before (if I remember right) was to disambiguate
what the circumstances each function can expect by breaking up the call
chains so that the functions don't end up nested inside each other
unpredictably. Like I said, though, I'm making sure I respond rather
than getting my facts straight, so please let me know if I'm missing
your point (likely) or misunderstanding something.

Gabe

Steve Reinhardt wrote:
 Thanks to Brad's prompting, I took another look at this tonight.

 I think the fundamental problem is that the transition to
 split-transaction translation didn't move all the operations that
 depend on the translation completing into the translation callback or
 something that was guaranteed to run after the translation callback.
 The req-isUncacheable() test that Brad moved around in his recent
 patch (which I commented on just a few minutes ago) is one example of
 that.  Another case is that we're calling traceData-setData() in
 read() even though the translation may not be done yet (how does that
 work?  are we calling setData() again when the real data comes back,
 so we haven't noticed that this is broken?).  The other traceData
 examples (like setAddr() or setData() in write()) are more like
 anti-dependences; there's no direct dependence on the translation
 finishing, but there is a dependence on it not deleting the traceData
 object.

 So in other words the problem isn't that the finishTranslation() or
 completeAcc() calls are happening too soon, it's that the code that
 runs after the translation is initiated in read()/write() (or in
 initiateAcc() from which it is called) should not be making
 assumptions about whether finishTranslation() or completeAcc() has or
 has not been called.  I think it's pretty easy to fix this, in some
 cases by moving independent code up above where the translation is
 initiated, and in other cases by moving dependent code into
 finishTranslation().

 Interestingly, there are a number of cases where we trace the store
 data twice, once in the CPU's write() function and again in the ISA's
 initiateAcc() function.  I propose to just get rid of the latter.

 As I mentioned in the other email, I'm tempted to just get rid of the
 recordEvent() calls, unless someone actually uses these.  I see that
 Nate created the basic recordEvent() infrastructure in 2004, and right
 now it's only uncached accesses and faults that get recorded, which
 seems
 very ad hoc and irregular.

 I'll try and make these changes and test them as soon as I can.  I
 thought I'd send this note out first though in case anyone has
 feedback before I get started.

 Steve
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev
   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2010-03-22 Thread Steve Reinhardt
Hi Gabe,

On Sun, Mar 21, 2010 at 5:22 PM, Gabe Black gbl...@eecs.umich.edu wrote:
 It's been a while since I've looked at this, but I want to make sure I
 remember to respond so I don't want to wait until I have a chance to
 re-research all this. Isn't the problem that initiateAcc ends up calling
 completeAcc mid-function, that cleans up after the access, then
 initateAcc gets control back and tries to finish initiating the access?

Yes and no... you're right that that's how we viewed it before (so
your memory is good), and that's one way of looking at it, and it's
not incorrect.  I'm saying that if we look at the problem differently
I think there's an easier solution.

 The problem to me seems to be that initiateAcc doesn't know whether it's
 just letting the rest of the CPU know something needs to happen later,
 or if it's actually causing that thing to happen right away. The other
 part, completeAcc, doesn't know if it's happening in the middle of
 initiateAcc, or if its in a callback. If it's in initiateAcc there are
 things it (or maybe initiateAcc, is that what you're saying?) shouldn't
 touch. If it's in a callback, it's likely the last thing to handle the
 instruction/translation/packet/whatever, so it needs to be sure to clean
 up everything. These seem like contradictory responsibilities.

You've got the right idea there... basically initiateAcc() actually
initiates the acc when it calls read() or write(), and it can't count
on the translation being complete when read()/write() returns, but
conversely it also shouldn't count on the translation *not* being
complete when read()/write() returns.  So there really shouldn't be
any code after read()/write() that should care one way or the other.
If we get rid of the code that does, then we'll solve the problem
without adding any complexity to the control flow.

As a specific example, the one thing that appears to give us trouble
in initiateAcc() is this:

fault = xc-write((uint%(mem_acc_size)d_t)Mem, EA,
  memAccessFlags, NULL);
if (traceData) { traceData-setData(Mem); }

...and the root problem is that if the access completes in write(),
traceData may not be valid any more.  The ironic thing here is that
(1) the value of Mem doesn't depend on write(), so the line could be
moved before the call; and (2) traceData-setData() is also called in
write() itself, so this whole line is redundant.  So we can fix this
specific problem by either (1) moving the traceData line above write()
or (2) preferably, deleting the line entirely.

Similarly, read()/write() initiate the translation when they call
translateTiming(), and there shouldn't be any code in read()/write()
after that point that depends on the translation having completed *or
not*.  Interestingly there is code (the req-isUncacheable() check,
and a setData() call in read()) that seems like it requires the
translation to have completed, which is just as incorrect.  That code
should be moved somewhere else (like finishTranslation()) where it can
be guaranteed that the translation has finished.

 What I
 had been advocating before (if I remember right) was to disambiguate
 what the circumstances each function can expect by breaking up the call
 chains so that the functions don't end up nested inside each other
 unpredictably. Like I said, though, I'm making sure I respond rather
 than getting my facts straight, so please let me know if I'm missing
 your point (likely) or misunderstanding something.

That would work, but this approach is simpler since it doesn't depend
on figuring out what circumstances you're in.

Steve
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2010-03-22 Thread Gabe Black
Ok. This sounds plausible to me, and I'm glad it's not going to be so
much work. I'm sure you already thought of this, but along the lines of
what we were talking about in that other thread, something that
important and subtle will deserve a big obvious comment or two.

Gabe

Steve Reinhardt wrote:
 Hi Gabe,

 On Sun, Mar 21, 2010 at 5:22 PM, Gabe Black gbl...@eecs.umich.edu wrote:
   
 It's been a while since I've looked at this, but I want to make sure I
 remember to respond so I don't want to wait until I have a chance to
 re-research all this. Isn't the problem that initiateAcc ends up calling
 completeAcc mid-function, that cleans up after the access, then
 initateAcc gets control back and tries to finish initiating the access?
 

 Yes and no... you're right that that's how we viewed it before (so
 your memory is good), and that's one way of looking at it, and it's
 not incorrect.  I'm saying that if we look at the problem differently
 I think there's an easier solution.

   
 The problem to me seems to be that initiateAcc doesn't know whether it's
 just letting the rest of the CPU know something needs to happen later,
 or if it's actually causing that thing to happen right away. The other
 part, completeAcc, doesn't know if it's happening in the middle of
 initiateAcc, or if its in a callback. If it's in initiateAcc there are
 things it (or maybe initiateAcc, is that what you're saying?) shouldn't
 touch. If it's in a callback, it's likely the last thing to handle the
 instruction/translation/packet/whatever, so it needs to be sure to clean
 up everything. These seem like contradictory responsibilities.
 

 You've got the right idea there... basically initiateAcc() actually
 initiates the acc when it calls read() or write(), and it can't count
 on the translation being complete when read()/write() returns, but
 conversely it also shouldn't count on the translation *not* being
 complete when read()/write() returns.  So there really shouldn't be
 any code after read()/write() that should care one way or the other.
 If we get rid of the code that does, then we'll solve the problem
 without adding any complexity to the control flow.

 As a specific example, the one thing that appears to give us trouble
 in initiateAcc() is this:

 fault = xc-write((uint%(mem_acc_size)d_t)Mem, EA,
   memAccessFlags, NULL);
 if (traceData) { traceData-setData(Mem); }

 ...and the root problem is that if the access completes in write(),
 traceData may not be valid any more.  The ironic thing here is that
 (1) the value of Mem doesn't depend on write(), so the line could be
 moved before the call; and (2) traceData-setData() is also called in
 write() itself, so this whole line is redundant.  So we can fix this
 specific problem by either (1) moving the traceData line above write()
 or (2) preferably, deleting the line entirely.

 Similarly, read()/write() initiate the translation when they call
 translateTiming(), and there shouldn't be any code in read()/write()
 after that point that depends on the translation having completed *or
 not*.  Interestingly there is code (the req-isUncacheable() check,
 and a setData() call in read()) that seems like it requires the
 translation to have completed, which is just as incorrect.  That code
 should be moved somewhere else (like finishTranslation()) where it can
 be guaranteed that the translation has finished.

   
 What I
 had been advocating before (if I remember right) was to disambiguate
 what the circumstances each function can expect by breaking up the call
 chains so that the functions don't end up nested inside each other
 unpredictably. Like I said, though, I'm making sure I respond rather
 than getting my facts straight, so please let me know if I'm missing
 your point (likely) or misunderstanding something.
 

 That would work, but this approach is simpler since it doesn't depend
 on figuring out what circumstances you're in.

 Steve
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev
   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Steve Reinhardt
OK, thinking about this a bit more, it seems like the root of the problem is
that the timing translation is calling the callback right away instead of
deferring it to a later cycle.  I assume this only happens on a TLB miss for
a store conditional?  Is there any other path by which the completeAcc
callback can be invoked before the initiateAcc completes?  I agree that this
seems like a somewhat heavyweight solution for this rare case.

If this is the only scenario, what if we create a new event in the TLB to
call the completeAcc callback after delaying for the TLB access latency?
Presumably there's already an event that does this in the more common cases,
so maybe that can be tweaked to handle this case too.  I haven't actually
looked at the TLB code but thought I'd throw the idea out anyway.

Steve
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Gabriel Michael Black
This bug happens when both the translation component and the memory  
access component finish inline before returning. When that happens,  
completeAcc is effectively called from initiateAcc and you get into  
trouble.

The translation component will complete inline in all cases except TLB  
misses in X86. It will complete inline in all cases with any other  
ISA. Since this bug came up on Alpha, that part of things is  
unavoidable right now.

The second part comes in when the memory access component essentially  
doesn't happen. I'm a little fuzzy on the details, but if a store  
conditional fails, the CPU can skip talking to the memory access and  
go directly to completeDataAccess.

The overhead my change would add is two writes to a bool and two ifs,  
and I would have to imagine that's a lot more lightweight than  
creating, initializing, and scheduling a new event for all types of  
translation. My change also applies only once to each memory  
instruction where some unaligned accesses may be translated in two  
chunks.

We could probably streamline things a little and only check if we're  
in initiateAcc on code paths that skip going to memory.

Gabe

Quoting Steve Reinhardt ste...@gmail.com:

 OK, thinking about this a bit more, it seems like the root of the problem is
 that the timing translation is calling the callback right away instead of
 deferring it to a later cycle.  I assume this only happens on a TLB miss for
 a store conditional?  Is there any other path by which the completeAcc
 callback can be invoked before the initiateAcc completes?  I agree that this
 seems like a somewhat heavyweight solution for this rare case.

 If this is the only scenario, what if we create a new event in the TLB to
 call the completeAcc callback after delaying for the TLB access latency?
 Presumably there's already an event that does this in the more common cases,
 so maybe that can be tweaked to handle this case too.  I haven't actually
 looked at the TLB code but thought I'd throw the idea out anyway.

 Steve



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread nathan binkert
 This bug happens when both the translation component and the memory
 access component finish inline before returning. When that happens,
 completeAcc is effectively called from initiateAcc and you get into
 trouble.

 The translation component will complete inline in all cases except TLB
 misses in X86. It will complete inline in all cases with any other
 ISA. Since this bug came up on Alpha, that part of things is
 unavoidable right now.

 The second part comes in when the memory access component essentially
 doesn't happen. I'm a little fuzzy on the details, but if a store
 conditional fails, the CPU can skip talking to the memory access and
 go directly to completeDataAccess.

 The overhead my change would add is two writes to a bool and two ifs,
 and I would have to imagine that's a lot more lightweight than
 creating, initializing, and scheduling a new event for all types of
 translation. My change also applies only once to each memory
 instruction where some unaligned accesses may be translated in two
 chunks.

 We could probably streamline things a little and only check if we're
 in initiateAcc on code paths that skip going to memory.

What if you use my change instead?  (Get rid of the traceData function
parameter and use xc-traceData)

Incidentally, there's a bunch of if (traceData) ... in the generated
code and it's not wrapped with #if TRACING_ON  Would be nice if that
could be changed as part of this too.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Gabriel Michael Black
Quoting nathan binkert n...@binkert.org:

 I think the difference is that passing it in explicitly means that  
 you could
 keep it in a register for the duration of the function, and if you can put
 it in a callee-save then maybe you could save a few dereferences.  In
 contrast the indirection would have to be repeated after every  
 function call
 (which is exactly the behavior we'd be relying on here).

 Wouldn't we need to make it xc-traceData() to allow for all the
 redirection we've got in there?

 What redirection?  Or do you mean because traceData is protected in
 the simple CPU?  You could, but then again, you could just make it
 public.  Otherwise, I'm not sure what you're talking about.  The
 execute function is generated for each cpu model separately and it
 knows the type.

   Nate
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


I'm talking about how there's the exec context, and then the proxy  
exec context, and the thread context, and the simple thread context,  
and the dyninst, and the etc. etc., and calls to, for instance, read a  
register can travel through several layers before actually being  
serviced. By making traceData a direct member of the very first exec  
context interface, we're precluding passing it back to other layers  
that may want some input. That's not to say I don't think it will  
work, because I actually think it would, but it seems inconsistent  
with how the rest of that interface goes together.

Also this still doesn't address the fact that, one, completeAcc is  
being called before initiateAcc has fully run, and two, that the trace  
information is being printed before initiateAcc has written all its  
results.

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread nathan binkert
 What redirection?  Or do you mean because traceData is protected in
 the simple CPU?  You could, but then again, you could just make it
 public.  Otherwise, I'm not sure what you're talking about.  The
 execute function is generated for each cpu model separately and it
 knows the type.

 I'm talking about how there's the exec context, and then the proxy
 exec context, and the thread context, and the simple thread context,
 and the dyninst, and the etc. etc., and calls to, for instance, read a
 register can travel through several layers before actually being
 serviced. By making traceData a direct member of the very first exec
 context interface, we're precluding passing it back to other layers
 that may want some input. That's not to say I don't think it will
 work, because I actually think it would, but it seems inconsistent
 with how the rest of that interface goes together.
I'm specifically talking about the parameter passed into the exec
function which is either SimpleCPU or BaseDynInst.  Those both have a
traceData parameter and it can just be used directly without any fancy
indirection.  I am only talking about this code specifically.

 Also this still doesn't address the fact that, one, completeAcc is
 being called before initiateAcc has fully run, and two, that the trace
 information is being printed before initiateAcc has written all its
 results.
Can't we just ensure that foo-traceData is set to NULL after it is
deleted?  Then the setData that's done in initiateAcc after the write
won't complete.

Also, looking at the code more closely, is there some reason taht we
can't just move the setData line before the write?  Wouldn't that
solve the problem?  It seems to me that the program wouldn't change
since the contents of Mem should not change due to xc-write?
(correct?)

Sorry if I'm just being unhelpful.  I don't know this code super well.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Korey Sewell
So my original point (that obviously wasnt conveyed very well) was
more trying to re-work how this STL_C case works rather than kind of
patch it up so it works in the current framework.

Seems to be me that completeAcc should never get called before
initiateAcc in any circumstance. I know that shortcut is because you
STL_C fails early, but even so, why cant it just finish like any other
instruction. What benefit are we gettiing from early completing it
(especially in a SimpleCPU)? Am I getting this fundamental issue? If
I'm not getting that wrong, then we can just fix the SimpleCPU to not
handle this special case in this fashion and all will be fine.

So trying to play around with traceData in my opinion is the wrong way
to solve it. If you are going to do that, Nate's idea of writing the
traceData before the access sounds like it will work.But Gabe's way
works too, so whatever works there is fine with me.

Long term, I'm just thinking that if I call initiateAcc and there is a
risk of completeAcc happening before it finishes seems like it breaks
the interface that the CPU/INST should use. And then in the future,
this is going to be a nasty bug that will trip somebody up (because
completeAcc might do something that screws things up early).

On Tue, Apr 28, 2009 at 5:34 PM, nathan binkert n...@binkert.org wrote:
 What redirection?  Or do you mean because traceData is protected in
 the simple CPU?  You could, but then again, you could just make it
 public.  Otherwise, I'm not sure what you're talking about.  The
 execute function is generated for each cpu model separately and it
 knows the type.

 I'm talking about how there's the exec context, and then the proxy
 exec context, and the thread context, and the simple thread context,
 and the dyninst, and the etc. etc., and calls to, for instance, read a
 register can travel through several layers before actually being
 serviced. By making traceData a direct member of the very first exec
 context interface, we're precluding passing it back to other layers
 that may want some input. That's not to say I don't think it will
 work, because I actually think it would, but it seems inconsistent
 with how the rest of that interface goes together.
 I'm specifically talking about the parameter passed into the exec
 function which is either SimpleCPU or BaseDynInst.  Those both have a
 traceData parameter and it can just be used directly without any fancy
 indirection.  I am only talking about this code specifically.

 Also this still doesn't address the fact that, one, completeAcc is
 being called before initiateAcc has fully run, and two, that the trace
 information is being printed before initiateAcc has written all its
 results.
 Can't we just ensure that foo-traceData is set to NULL after it is
 deleted?  Then the setData that's done in initiateAcc after the write
 won't complete.

 Also, looking at the code more closely, is there some reason taht we
 can't just move the setData line before the write?  Wouldn't that
 solve the problem?  It seems to me that the program wouldn't change
 since the contents of Mem should not change due to xc-write?
 (correct?)

 Sorry if I'm just being unhelpful.  I don't know this code super well.

  Nate
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev




-- 
===
Korey L Sewell
Computer Science  Engineering
University of Michigan
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Korey Sewell
 As far as moving around the code, that would be easier said than done.
 That's all generated and I don't know how easy it would be to get the
 isa_parser to generate something structured differently that would
 still make sense in all the other instructions of all the other ISAs.
 Steve?

I was mentioning something like this earlier, but in
arch/alpha/mem.isa the initiateAcc code is generated from the
LoadStore::initiateAcc template. If you just give Store conditionals
its own template for initiateAcc (like it already has for
completeAcc), I think that will be a fix too (assuming you set the
tracedata early or handle it some other way)...

-- 
===
Korey L Sewell
Computer Science  Engineering
University of Michigan
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Gabriel Michael Black
Quoting Korey Sewell ksew...@umich.edu:

 So my original point (that obviously wasnt conveyed very well) was
 more trying to re-work how this STL_C case works rather than kind of
 patch it up so it works in the current framework.

 Seems to be me that completeAcc should never get called before
 initiateAcc in any circumstance. I know that shortcut is because you
 STL_C fails early, but even so, why cant it just finish like any other
 instruction. What benefit are we gettiing from early completing it
 (especially in a SimpleCPU)? Am I getting this fundamental issue? If
 I'm not getting that wrong, then we can just fix the SimpleCPU to not
 handle this special case in this fashion and all will be fine.

 So trying to play around with traceData in my opinion is the wrong way
 to solve it. If you are going to do that, Nate's idea of writing the
 traceData before the access sounds like it will work.But Gabe's way
 works too, so whatever works there is fine with me.

 Long term, I'm just thinking that if I call initiateAcc and there is a
 risk of completeAcc happening before it finishes seems like it breaks
 the interface that the CPU/INST should use. And then in the future,
 this is going to be a nasty bug that will trip somebody up (because
 completeAcc might do something that screws things up early).

I agree with everything you've said here, except for one half thing  
and I've got a comment on one other. First, my change is actually  
doing what your saying. It's not mucking around with traceData, it's  
actually putting off calling completeDataAccess (which calls  
completeAcc) if we're in the middle of initiateAcc until we come out  
of it. That's the half.

The other thing is the reason STC doesn't work like the other memory  
instructions. In those other cases where you're calling into memory,  
you're going to get a response some time later. When that happens,  
initiateAcc has since finished running and you can end up in  
completeAcc and be ok. When you don't have a delay in translation or  
when you go to memory, then there's no later point where you'll be  
doing anything for that instruction and you need to deal with it now.

One completely different solution would be to do what you're saying  
and just not have the timing simple CPU work like it does. I think  
it's organized a lot like the atomic simple CPU because it used to be  
a part of it, but really a more natural organization would be as a  
state machine where activity was broken by hard lines into states.  
There might be a little more overhead that way, but it would make it  
much easier for non-experts to see what the heck is going on in there.  
It would also help the experts avoid introducing bugs because there  
are so many interactions to consider.

Gabe


 On Tue, Apr 28, 2009 at 5:34 PM, nathan binkert n...@binkert.org wrote:
 What redirection?  Or do you mean because traceData is protected in
 the simple CPU?  You could, but then again, you could just make it
 public.  Otherwise, I'm not sure what you're talking about.  The
 execute function is generated for each cpu model separately and it
 knows the type.

 I'm talking about how there's the exec context, and then the proxy
 exec context, and the thread context, and the simple thread context,
 and the dyninst, and the etc. etc., and calls to, for instance, read a
 register can travel through several layers before actually being
 serviced. By making traceData a direct member of the very first exec
 context interface, we're precluding passing it back to other layers
 that may want some input. That's not to say I don't think it will
 work, because I actually think it would, but it seems inconsistent
 with how the rest of that interface goes together.
 I'm specifically talking about the parameter passed into the exec
 function which is either SimpleCPU or BaseDynInst.  Those both have a
 traceData parameter and it can just be used directly without any fancy
 indirection.  I am only talking about this code specifically.

 Also this still doesn't address the fact that, one, completeAcc is
 being called before initiateAcc has fully run, and two, that the trace
 information is being printed before initiateAcc has written all its
 results.
 Can't we just ensure that foo-traceData is set to NULL after it is
 deleted?  Then the setData that's done in initiateAcc after the write
 won't complete.

 Also, looking at the code more closely, is there some reason taht we
 can't just move the setData line before the write?  Wouldn't that
 solve the problem?  It seems to me that the program wouldn't change
 since the contents of Mem should not change due to xc-write?
 (correct?)

 Sorry if I'm just being unhelpful.  I don't know this code super well.

  Nate
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev




 --
 ===
 Korey L Sewell
 Computer Science  Engineering
 University of Michigan
 

Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread nathan binkert
 I'm specifically talking about the parameter passed into the exec
 function which is either SimpleCPU or BaseDynInst.  Those both have a
 traceData parameter and it can just be used directly without any fancy
 indirection.  I am only talking about this code specifically.

 I'm confused now. If we're passing it in, then it's a local variable
 on the stack and we can't set it to NULL externally.
Right, I don't like this.

 If it's a local
 member of the exec context, it's not transparently modifiable by
 deeper layers of the interface.
What does this mean?  Instead of doing traceData, you do
xc-traceData.  Then someone could set destroy xc-traceData and set
it to NULL and we'd be OK.

 If it's a function call, we're paying
 a bunch of function call overhead. Which of these do (do not) apply?
This does not apply.

 Also, looking at the code more closely, is there some reason taht we
 can't just move the setData line before the write?  Wouldn't that
 solve the problem?  It seems to me that the program wouldn't change
 since the contents of Mem should not change due to xc-write?
 (correct?)

 That would ideally fix the segfault, aka the symptom, but it doesn't
 fix the core issue. The second half of initiateAcc won't be able to
 participate in the trace since traceData is dumped right before being
 deleted. That portion will also actually run -after- completeAcc, but
 only sometimes. That sounds like a source for a lot of confusion and
 bugs (like this one) to me.


 As far as moving around the code, that would be easier said than done.
 That's all generated and I don't know how easy it would be to get the
 isa_parser to generate something structured differently that would
 still make sense in all the other instructions of all the other ISAs.
 Steve?
Ok, whatever the correct solution, I think we should minimally do these things:
1) Remove the parameter,
2) change traceData to xc-traceData
3) and surround uses of traceData with #if TRACING_ON


  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Gabriel Michael Black
Quoting nathan binkert n...@binkert.org:

 I'm specifically talking about the parameter passed into the exec
 function which is either SimpleCPU or BaseDynInst.  Those both have a
 traceData parameter and it can just be used directly without any fancy
 indirection.  I am only talking about this code specifically.

 I'm confused now. If we're passing it in, then it's a local variable
 on the stack and we can't set it to NULL externally.
 Right, I don't like this.

 If it's a local
 member of the exec context, it's not transparently modifiable by
 deeper layers of the interface.
 What does this mean?  Instead of doing traceData, you do
 xc-traceData.  Then someone could set destroy xc-traceData and set
 it to NULL and we'd be OK.

A parallel example would be getting a pointer to the TLBs. In that  
case, you don't just have a TLB pointer member on the exec context,  
you have a function called something like getDTBPtr() that percolates  
to the right place that actually has the current value. That's just  
how this stuff has been put together up to now, I assume for a reason,  
and we'd be diverging from that. I don't know of any calamitous  
problem that would introduce, but we'd be partially undermining the  
flexibility we've built in for other values.


 3) and surround uses of traceData with #if TRACING_ON

I meant to mention it before, but I think this bit is very doable and  
makes a lot of sense.

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread nathan binkert
 One completely different solution would be to do what you're saying
 and just not have the timing simple CPU work like it does. I think
 it's organized a lot like the atomic simple CPU because it used to be
 a part of it, but really a more natural organization would be as a
 state machine where activity was broken by hard lines into states.
 There might be a little more overhead that way, but it would make it
 much easier for non-experts to see what the heck is going on in there.
 It would also help the experts avoid introducing bugs because there
 are so many interactions to consider.

This seems pretty reasonable to me.  There are already states in the
models to deal with cache misses and such, so adding another state for
the different phases of a memory access seems reasonable.  The impact
probably wouldn't be huge given the fact that the cache is probably
pretty expensive compared to the CPU when executing memory ops.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread nathan binkert
 So a tentative plan would be:
 1. Commit my patch for now so the bug is fixed.
 2. Add the #if TRACING_ON.
 3.5 Hash out what to do with traceData
 3.5 Reorganize the timing simple CPU.

 Does anyone object?

I guess I'd rather not do #1 if it isn't the solution because things
like that have a tendency to live on.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Steve Reinhardt
Whew, I don't check my email for one afternoon and you guys make me pay for
it...

I'm still curious about the possibility of using an event to call the
completeAcc() callback in the case where the translation fails on a store
conditional.  This solution seems particularly attractive to me if it's just
a case of making this particular scenario follow the existing x86-tlb-miss
path rather than the default path, in which case there shouldn't be much new
code at all.  Maybe it's a little more complicated than that if we have to
import some of the x86-tlb-miss code into alpha, but it still doesn't seem
that bad.  I think Gabe misunderstood me; I'm not talking about doing this
for all TLB hit responses, just this one specific case.  There may well be a
subtlety that makes this not as clean as I envision it, but to me it solves
the root cause of the problem, which is that completeAcc() is getting called
before initiateAcc() terminates, and does it in a way that doesn't clutter
the logic in SimpleTimingCPU the way Gabe's current fix does.

Adding in the #if TRACING_ON lines is a no-brainer but is really independent
of this bug.

I think moving the traceData variable into the exec context is not a bad
idea (though not strongly compelling to me either), and does happen to have
the side effect of addressing this particular bug.  However I agree with
Gabe that it's really a very limited solution and doesn't address the
conceptual issue that caused this bug, which leaves us open to future
similar bugs occurring.  Maybe that will never happen, but it makes this a
less compelling solution to the bug itself.  But if we're going to do it
anyway, it could work as a short-term fix for the bug.

The SimpleTimingCPU is getting complex, but it's not obvious to me how
adding more explicit states would clean it up significantly, particularly if
we avoid adding Gabe's current patch (which does muddle up the control flow
a bit more than it already is).  I'm open to learning more about what people
envision here though.

Steve

On Tue, Apr 28, 2009 at 4:05 PM, nathan binkert n...@binkert.org wrote:

  So a tentative plan would be:
  1. Commit my patch for now so the bug is fixed.
  2. Add the #if TRACING_ON.
  3.5 Hash out what to do with traceData
  3.5 Reorganize the timing simple CPU.
 
  Does anyone object?

 I guess I'd rather not do #1 if it isn't the solution because things
 like that have a tendency to live on.

  Nate
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Gabe Black
Steve Reinhardt wrote:
 Whew, I don't check my email for one afternoon and you guys make me
 pay for it...

 I'm still curious about the possibility of using an event to call the
 completeAcc() callback in the case where the translation fails on a
 store conditional.  This solution seems particularly attractive to me
 if it's just a case of making this particular scenario follow the
 existing x86-tlb-miss path rather than the default path, in which case
 there shouldn't be much new code at all.  Maybe it's a little more
 complicated than that if we have to import some of the x86-tlb-miss
 code into alpha, but it still doesn't seem that bad.  I think Gabe
 misunderstood me; I'm not talking about doing this for all TLB hit
 responses, just this one specific case.  There may well be a subtlety
 that makes this not as clean as I envision it, but to me it solves the
 root cause of the problem, which is that completeAcc() is getting
 called before initiateAcc() terminates, and does it in a way that
 doesn't clutter the logic in SimpleTimingCPU the way Gabe's current
 fix does.

 Adding in the #if TRACING_ON lines is a no-brainer but is really
 independent of this bug.

 I think moving the traceData variable into the exec context is not a
 bad idea (though not strongly compelling to me either), and does
 happen to have the side effect of addressing this particular bug. 
 However I agree with Gabe that it's really a very limited solution and
 doesn't address the conceptual issue that caused this bug, which
 leaves us open to future similar bugs occurring.  Maybe that will
 never happen, but it makes this a less compelling solution to the bug
 itself.  But if we're going to do it anyway, it could work as a
 short-term fix for the bug.

 The SimpleTimingCPU is getting complex, but it's not obvious to me how
 adding more explicit states would clean it up significantly,
 particularly if we avoid adding Gabe's current patch (which does
 muddle up the control flow a bit more than it already is).  I'm open
 to learning more about what people envision here though.

I'm not sure if it will be hugely better. I'd have to just do it and see
how it comes out, I suppose. Wouldn't scheduling an event be similar to
what I have, just using the event queue instead of the new packet
pointer? I could imagine it being better, actually, so I'm tentatively
convinced. There shouldn't be a need to move any x86 code into Alpha to
make that work, but I wouldn't swear to that.

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Steve Reinhardt
On Tue, Apr 28, 2009 at 4:43 PM, Gabe Black gbl...@eecs.umich.edu wrote:

 I'm not sure if it will be hugely better. I'd have to just do it and see
 how it comes out, I suppose. Wouldn't scheduling an event be similar to
 what I have, just using the event queue instead of the new packet
 pointer?


Yea, at a certain level, the effect is similar--just deferring the callback
until later--which is not bad, since that's what addresses the problem.  I
think that using the existing event queue mechanism (which is explicitly
defined for deferring stuff until later) is less convoluted than adding a
handful of new variables and control code to cover just this one specific
situation.



 I could imagine it being better, actually, so I'm tentatively
 convinced. There shouldn't be a need to move any x86 code into Alpha to
 make that work, but I wouldn't swear to that.


Thanks for giving it a try.

Steve
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-27 Thread Gabriel Michael Black
Here is a hypothetical series of events that should hopefully  
illustrate what's going on.

1. The store conditional comes up for execution and its initiateAcc is called.
2. initiateAcc calls xc-write() to actually do the required store.
3. write calls into the TLB's translateTiming function to do the translation.
4. Alpha's translateTiming is just a wrapper around translateAtomic,  
so it calls the callback in place.
5. The callback in the CPU recognizes the store conditional failed and  
calls completeAcc to finish up the instruction. The callback also  
deletes extra data associated with the instruction including traceData.

At this point, the call stack looks approximately like:

initiateAcc-write-translateTiming-callback

Note that initiateAcc hasn't been returned to so it hasn't had a  
chance to finish. Then:

6. The various functions return until we're back to initiateAcc.
7. initiateAcc uses its local (and now invalid) copy of the traceData  
pointer to store the result of the instruction executing.
8. BOOM segfault.


What my patch does is set a flag right before calling initiateAcc. If  
that flag is still set once you get to the completeDataAccess  
function, which is ultimately where the instruction is finished, it  
doesn't actually do anything then. Instead it tucks away the  
arguements of the function and returns, allowing initiateAcc to  
finish. After it finishes, then it checks whether completeDataAccess  
was skipped, and if so calls it then. That way, completeDataAccess  
isn't called until after initiateAcc is done which is what  
instructions expect.

The overhead I was talking about is from two sources. The first is  
that that flag is set and cleared every time initiateAcc is called.  
Second, completeDataAccess checks that flag every time it's called.  
These are not large overheads but they're on a hot path.

Gabe

Quoting Korey Sewell ksew...@umich.edu:

 Hi Gabe,
 I'm trying to figure out what's going and help find a potentially better
 solution (if there is one), since I'm operating under the assumption that
 adding a small overhead for the common case memory access to support the
 special case (stl_c) should be avoided if possible. (SIDEBAR: In looking at
 the code another pass, I should point out it looks like timing.hh/atomic.hh
 are in need of some comments on virtually all of the function
 declarations.)   So anyway, I'll try to take another stab at helping out,
 but if nobody else thinks this is a big deal, then that's fine with me.
 ===
 From your description of the problem Gabe, you are saying that when a STL_C
 gets called, the problems stem from the initiateAcc function which tries to
 call a member function on traceData which was deleted during the cleanup.
 The patch you posted seems to fiddle around with the completionPkt more than
 the traceData so like I said I'm trying to follow you and figure out why we
 cant make this a special case solution instead of a wrapper (w/the
 inInitiateAcc flag) on every access.

 The initiateAcc code is listed here:
  Fault Stl_c::initiateAcc(AtomicSimpleCPU *xc,
   Trace::InstRecord *traceData) const
 {
 

 if (fault == NoFault) {
 fault = xc-write((uint32_t)Mem, EA,
   memAccessFlags, NULL);
 if (traceData) { traceData-setData(Mem); }
 }

 return fault;
 }

 I think you are also saying that the xc-write() triggers a split TLB
 translation (sendSplitData) that eventually calls completeDataAccess and
 then completeAcc?

 Then, I think you are saying that the code falls back to the initiateAcc and
 tries:
 if (traceData) { traceData-setData(Mem); }

 Which potentially wont be valid since you deleted in the translation fault
 function (though I think that should get caught anyway).

 Is that the correct way to interpret your problem? What am I missing here?

 Thanks,
 Korey

 On Sun, Apr 26, 2009 at 4:32 PM, Gabe Black gbl...@eecs.umich.edu wrote:

 I should add, I realize it's not very clear what's going on, but it
 would take a lot more effort to explain everything in enough detail to
 get the problem across than for you (or anyone else) to look at the
 code. If you do look into the code, it's not a functional problem as far
 as how the timing CPU works or organizes events per se. It's a C++
 problem where things may or may not end up being called from the middle
 of each other.

 Gabe

 Gabe Black wrote:
  You should read my description, read the code, and then once you
  understand the problem comment on my solution.
 
  Gabe
 
  Korey Sewell wrote:
 
  Yea, I was on that thread. Your explanation wasnt particularly in
  great detail there (files?, line #s?) so its hard to follow without
  getting your hand dirty on where this code is you are referring to.
 
   I'll comment and you can clarify my assumptions...
 
  Oooh. I see what's broken. This is a result of my changes to
  allow delaying 

Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-27 Thread Gabriel Michael Black
initiateAcc looks something like the following:

Fault initiateAcc(..., traceData,...)
{
 ...
 write();  traceData becomes invalid in here
 if (traceData) traceData-setData(Mem);
 ...
}

so I'm not sure how that would work.

Gabe

Quoting nathan binkert n...@binkert.org:

 7. initiateAcc uses its local (and now invalid) copy of the traceData
 pointer to store the result of the instruction executing.
 What if you just read this pointer again and not store a local copy?


   Nate
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-27 Thread nathan binkert
 initiateAcc looks something like the following:

 Fault initiateAcc(..., traceData,...)
 {
    ...
    write();  traceData becomes invalid in here
    if (traceData) traceData-setData(Mem);
    ...
 }

 so I'm not sure how that would work.

Why is traceData passed in as a parameter and not just grabbed from
whatever the first parameter of the function is (i.e. the dyn inst or
the CPU object)?

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-27 Thread Gabriel Michael Black
Quoting nathan binkert n...@binkert.org:

 initiateAcc looks something like the following:

 Fault initiateAcc(..., traceData,...)
 {
...
write();  traceData becomes invalid in here
if (traceData) traceData-setData(Mem);
...
 }

 so I'm not sure how that would work.

 Why is traceData passed in as a parameter and not just grabbed from
 whatever the first parameter of the function is (i.e. the dyn inst or
 the CPU object)?

   Nate
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


I don't know for sure, but I would guess that would be a lot less  
efficient and more complicated. traceData gets used for every output  
operand so you'd be calling through the exec context a lot, even when  
it's just going to return NULL. This is how the isa_parser has  
generated this code since I started working with it back with SPARC  
and probably before that. The real issue is that initiateAcc is  
getting the rug pulled out from under it before it finishes running,  
so even if it came from the exec context you're trace information  
would be printed and discarded prematurely.

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-26 Thread Gabe Black
This patch should fix the issue Geoff was having where tracing would
cause a segfault. If you could please give it a try, Geoff, let me know
if it actually solves the problem. This will add a small overhead for
every memory instruction even though the old behavior is wrong only in
certain circumstances. I considered trying to avoid that overhead in
cases where the problem can't happen, but then tracing would change the
behavior of the CPU and that seems like a bad idea. Assuming this
actually fixes the bug and nobody objects I'll commit this soon.

Gabe

gbl...@eecs.umich.edu wrote:
 # HG changeset patch
 # User Gabe Black gbl...@eecs.umich.edu
 # Date 1240729422 25200
 # Node ID 284ff3c27bbddacf5763e7b5a54e99e4dcd1912a
 # Parent  8652636856b3d24fb0088fb1af5f5dca5008d9c8
 CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc.

 diff --git a/src/cpu/simple/timing.cc b/src/cpu/simple/timing.cc
 --- a/src/cpu/simple/timing.cc
 +++ b/src/cpu/simple/timing.cc
 @@ -115,6 +115,8 @@
  ifetch_pkt = dcache_pkt = NULL;
  drainEvent = NULL;
  previousTick = 0;
 +inInitiateAcc = false;
 +completionPkt = NULL;
  changeState(SimObject::Running);
  }
  
 @@ -758,7 +760,13 @@
  if (curStaticInst 
  curStaticInst-isMemRef()  !curStaticInst-isDataPrefetch()) {
  // load or store: just send to dcache
 +inInitiateAcc = true;
  Fault fault = curStaticInst-initiateAcc(this, traceData);
 +inInitiateAcc = false;
 +if (completionPkt) {
 +completeDataAccess(completionPkt);
 +completionPkt = NULL;
 +}
  if (_status != Running) {
  // instruction will complete in dcache response callback
  assert(_status == DcacheWaitResponse ||
 @@ -856,6 +864,10 @@
  void
  TimingSimpleCPU::completeDataAccess(PacketPtr pkt)
  {
 +if (inInitiateAcc) {
 +completionPkt = pkt;
 +return;
 +}
  // received a response from the dcache: complete the load or store
  // instruction
  assert(!pkt-isError());
 diff --git a/src/cpu/simple/timing.hh b/src/cpu/simple/timing.hh
 --- a/src/cpu/simple/timing.hh
 +++ b/src/cpu/simple/timing.hh
 @@ -317,6 +317,9 @@
  
  Tick previousTick;
  
 +bool inInitiateAcc;
 +PacketPtr completionPkt;
 +
public:
  
  virtual Port *getPort(const std::string if_name, int idx = -1);
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev
   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-26 Thread Korey Sewell
Can you explain how the completeAcc ever gets called before the initiatieAcc
finishes in a SimpleCPU?

I'm thinking, why isnt the SimpleCPU just stalling if there is a delay
waiting for a memory access.If that's the right behavior, then isnt the
solution to just stall the CPU (or even sleep it) and wait for initiateAcc
to finish rather than try to work around the initiateAcc/completeAcc
simultaneous call? I guess I dont understand fully the problem but it seems
workable without a delay in overhead for all other memory accesses.

Also, would this be a issue for any other CPU models (O3 or InOrder)?

On Sun, Apr 26, 2009 at 3:22 AM, Gabe Black gbl...@eecs.umich.edu wrote:

 This patch should fix the issue Geoff was having where tracing would
 cause a segfault. If you could please give it a try, Geoff, let me know
 if it actually solves the problem. This will add a small overhead for
 every memory instruction even though the old behavior is wrong only in
 certain circumstances. I considered trying to avoid that overhead in
 cases where the problem can't happen, but then tracing would change the
 behavior of the CPU and that seems like a bad idea. Assuming this
 actually fixes the bug and nobody objects I'll commit this soon.

 Gabe

 gbl...@eecs.umich.edu wrote:
  # HG changeset patch
  # User Gabe Black gbl...@eecs.umich.edu
  # Date 1240729422 25200
  # Node ID 284ff3c27bbddacf5763e7b5a54e99e4dcd1912a
  # Parent  8652636856b3d24fb0088fb1af5f5dca5008d9c8
  CPU: Make the CPU wait until initiateAcc finishes before calling
 completeAcc.
 
  diff --git a/src/cpu/simple/timing.cc b/src/cpu/simple/timing.cc
  --- a/src/cpu/simple/timing.cc
  +++ b/src/cpu/simple/timing.cc
  @@ -115,6 +115,8 @@
   ifetch_pkt = dcache_pkt = NULL;
   drainEvent = NULL;
   previousTick = 0;
  +inInitiateAcc = false;
  +completionPkt = NULL;
   changeState(SimObject::Running);
   }
 
  @@ -758,7 +760,13 @@
   if (curStaticInst 
   curStaticInst-isMemRef() 
 !curStaticInst-isDataPrefetch()) {
   // load or store: just send to dcache
  +inInitiateAcc = true;
   Fault fault = curStaticInst-initiateAcc(this, traceData);
  +inInitiateAcc = false;
  +if (completionPkt) {
  +completeDataAccess(completionPkt);
  +completionPkt = NULL;
  +}
   if (_status != Running) {
   // instruction will complete in dcache response callback
   assert(_status == DcacheWaitResponse ||
  @@ -856,6 +864,10 @@
   void
   TimingSimpleCPU::completeDataAccess(PacketPtr pkt)
   {
  +if (inInitiateAcc) {
  +completionPkt = pkt;
  +return;
  +}
   // received a response from the dcache: complete the load or store
   // instruction
   assert(!pkt-isError());
  diff --git a/src/cpu/simple/timing.hh b/src/cpu/simple/timing.hh
  --- a/src/cpu/simple/timing.hh
  +++ b/src/cpu/simple/timing.hh
  @@ -317,6 +317,9 @@
 
   Tick previousTick;
 
  +bool inInitiateAcc;
  +PacketPtr completionPkt;
  +
 public:
 
   virtual Port *getPort(const std::string if_name, int idx = -1);
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev




-- 
--
Korey L Sewell
Graduate Student - PhD Candidate
Computer Science  Engineering
University of Michigan
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-26 Thread Gabe Black
There's an explanation already in the thread with the subject  Memory
corruption inm5devrepositorywhenusing   
--trace-flags=ExecEnable. Stalling or sleeping doesn't make sense in
this situation since it's not that sort of problem.

Gabe

Korey Sewell wrote:
 Can you explain how the completeAcc ever gets called before the
 initiatieAcc finishes in a SimpleCPU?

 I'm thinking, why isnt the SimpleCPU just stalling if there is a delay
 waiting for a memory access.If that's the right behavior, then isnt
 the solution to just stall the CPU (or even sleep it) and wait for
 initiateAcc to finish rather than try to work around the
 initiateAcc/completeAcc simultaneous call? I guess I dont understand
 fully the problem but it seems workable without a delay in overhead
 for all other memory accesses.

 Also, would this be a issue for any other CPU models (O3 or InOrder)?

 On Sun, Apr 26, 2009 at 3:22 AM, Gabe Black gbl...@eecs.umich.edu
 mailto:gbl...@eecs.umich.edu wrote:

 This patch should fix the issue Geoff was having where tracing would
 cause a segfault. If you could please give it a try, Geoff, let me
 know
 if it actually solves the problem. This will add a small overhead for
 every memory instruction even though the old behavior is wrong only in
 certain circumstances. I considered trying to avoid that overhead in
 cases where the problem can't happen, but then tracing would
 change the
 behavior of the CPU and that seems like a bad idea. Assuming this
 actually fixes the bug and nobody objects I'll commit this soon.

 Gabe

 gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote:
  # HG changeset patch
  # User Gabe Black gbl...@eecs.umich.edu
 mailto:gbl...@eecs.umich.edu
  # Date 1240729422 25200
  # Node ID 284ff3c27bbddacf5763e7b5a54e99e4dcd1912a
  # Parent  8652636856b3d24fb0088fb1af5f5dca5008d9c8
  CPU: Make the CPU wait until initiateAcc finishes before calling
 completeAcc.
 
  diff --git a/src/cpu/simple/timing.cc b/src/cpu/simple/timing.cc
  --- a/src/cpu/simple/timing.cc
  +++ b/src/cpu/simple/timing.cc
  @@ -115,6 +115,8 @@
   ifetch_pkt = dcache_pkt = NULL;
   drainEvent = NULL;
   previousTick = 0;
  +inInitiateAcc = false;
  +completionPkt = NULL;
   changeState(SimObject::Running);
   }
 
  @@ -758,7 +760,13 @@
   if (curStaticInst 
   curStaticInst-isMemRef() 
 !curStaticInst-isDataPrefetch()) {
   // load or store: just send to dcache
  +inInitiateAcc = true;
   Fault fault = curStaticInst-initiateAcc(this, traceData);
  +inInitiateAcc = false;
  +if (completionPkt) {
  +completeDataAccess(completionPkt);
  +completionPkt = NULL;
  +}
   if (_status != Running) {
   // instruction will complete in dcache response
 callback
   assert(_status == DcacheWaitResponse ||
  @@ -856,6 +864,10 @@
   void
   TimingSimpleCPU::completeDataAccess(PacketPtr pkt)
   {
  +if (inInitiateAcc) {
  +completionPkt = pkt;
  +return;
  +}
   // received a response from the dcache: complete the load
 or store
   // instruction
   assert(!pkt-isError());
  diff --git a/src/cpu/simple/timing.hh b/src/cpu/simple/timing.hh
  --- a/src/cpu/simple/timing.hh
  +++ b/src/cpu/simple/timing.hh
  @@ -317,6 +317,9 @@
 
   Tick previousTick;
 
  +bool inInitiateAcc;
  +PacketPtr completionPkt;
  +
 public:
 
   virtual Port *getPort(const std::string if_name, int idx =
 -1);
  ___
  m5-dev mailing list
  m5-dev@m5sim.org mailto:m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org mailto:m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev




 -- 
 --
 Korey L Sewell
 Graduate Student - PhD Candidate
 Computer Science  Engineering
 University of Michigan
 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev
   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-26 Thread Korey Sewell
Yea, I was on that thread. Your explanation wasnt particularly in great
detail there (files?, line #s?) so its hard to follow without getting your
hand dirty on where this code is you are referring to.

 I'll comment and you can clarify my assumptions...

Oooh. I see what's broken. This is a result of my changes to
allow delaying translation. What happens is that Stl_c goes into
initiateAcc. That function calls write on the CPU which calls into the
TLB which calls the translation callback which recognizes a failed store
conditional which completes the instruction execution with completeAcc
and cleans up. The call stack then collapses back to the initiateAcc
which is still waiting to finish and which then tries to call a member
function on traceData which was deleted during the cleanup. The problem
here is not fundamentally complicated, but the mechanisms involved are.
One solution would be to record the fact that we're still in
initiateAcc, and if we are wait for the call stack to collapse back down
to initiateAcc's caller before calling into completeAcc. That matches
the semantics an instruction would expect more, I think, where the
initiateAcc/completeAcc pair are called sequentially.

A couple of thoughts come to mind:
1. If the problem is with traceData and specifially just with store
conditionals, what prevents you from a solution that handles that is a small
overhead on that case instead of on every memory access? Particularly,  why
not add a StoreCondInitiateAcc template in mem.isa that handles this
special case? In there after the write to the CPU, you can maybe check to
see if you want to mess with traceData stuff or not depending on if the
conditional is already failed or whatever.

2. If the traceData is deleted, then why is it available to be accessed in
initiateAcc anyway? Isnt it enclosed by a if (traceData)?  Shouldnt that
return false if that is a NULL pointer?

3. Lastly, it sounds like this could be a slightly good candidate case for
having the ability to functionally split TLB access from the timing memory
access since in this case you want to do the access depending on something
that happens with the TLB...

On Sun, Apr 26, 2009 at 2:43 PM, Gabe Black gbl...@eecs.umich.edu wrote:

 There's an explanation already in the thread with the subject  Memory
 corruption inm5devrepositorywhenusing
 --trace-flags=ExecEnable. Stalling or sleeping doesn't make sense in
 this situation since it's not that sort of problem.

 Gabe

 Korey Sewell wrote:
  Can you explain how the completeAcc ever gets called before the
  initiatieAcc finishes in a SimpleCPU?
 
  I'm thinking, why isnt the SimpleCPU just stalling if there is a delay
  waiting for a memory access.If that's the right behavior, then isnt
  the solution to just stall the CPU (or even sleep it) and wait for
  initiateAcc to finish rather than try to work around the
  initiateAcc/completeAcc simultaneous call? I guess I dont understand
  fully the problem but it seems workable without a delay in overhead
  for all other memory accesses.
 
  Also, would this be a issue for any other CPU models (O3 or InOrder)?
 
  On Sun, Apr 26, 2009 at 3:22 AM, Gabe Black gbl...@eecs.umich.edu
  mailto:gbl...@eecs.umich.edu wrote:
 
  This patch should fix the issue Geoff was having where tracing would
  cause a segfault. If you could please give it a try, Geoff, let me
  know
  if it actually solves the problem. This will add a small overhead for
  every memory instruction even though the old behavior is wrong only
 in
  certain circumstances. I considered trying to avoid that overhead in
  cases where the problem can't happen, but then tracing would
  change the
  behavior of the CPU and that seems like a bad idea. Assuming this
  actually fixes the bug and nobody objects I'll commit this soon.
 
  Gabe
 
  gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote:
   # HG changeset patch
   # User Gabe Black gbl...@eecs.umich.edu
  mailto:gbl...@eecs.umich.edu
   # Date 1240729422 25200
   # Node ID 284ff3c27bbddacf5763e7b5a54e99e4dcd1912a
   # Parent  8652636856b3d24fb0088fb1af5f5dca5008d9c8
   CPU: Make the CPU wait until initiateAcc finishes before calling
  completeAcc.
  
   diff --git a/src/cpu/simple/timing.cc b/src/cpu/simple/timing.cc
   --- a/src/cpu/simple/timing.cc
   +++ b/src/cpu/simple/timing.cc
   @@ -115,6 +115,8 @@
ifetch_pkt = dcache_pkt = NULL;
drainEvent = NULL;
previousTick = 0;
   +inInitiateAcc = false;
   +completionPkt = NULL;
changeState(SimObject::Running);
}
  
   @@ -758,7 +760,13 @@
if (curStaticInst 
curStaticInst-isMemRef() 
  !curStaticInst-isDataPrefetch()) {
// load or store: just send to dcache
   +inInitiateAcc = true;
Fault fault = 

Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-26 Thread Gabe Black
You should read my description, read the code, and then once you
understand the problem comment on my solution.

Gabe

Korey Sewell wrote:
 Yea, I was on that thread. Your explanation wasnt particularly in
 great detail there (files?, line #s?) so its hard to follow without
 getting your hand dirty on where this code is you are referring to.

  I'll comment and you can clarify my assumptions...

 Oooh. I see what's broken. This is a result of my changes to
 allow delaying translation. What happens is that Stl_c goes into
 initiateAcc. That function calls write on the CPU which calls into the
 TLB which calls the translation callback which recognizes a failed store
 conditional which completes the instruction execution with completeAcc
 and cleans up. The call stack then collapses back to the initiateAcc
 which is still waiting to finish and which then tries to call a member
 function on traceData which was deleted during the cleanup. The problem
 here is not fundamentally complicated, but the mechanisms involved are.
 One solution would be to record the fact that we're still in
 initiateAcc, and if we are wait for the call stack to collapse back down
 to initiateAcc's caller before calling into completeAcc. That matches
 the semantics an instruction would expect more, I think, where the
 initiateAcc/completeAcc pair are called sequentially.

 A couple of thoughts come to mind:
 1. If the problem is with traceData and specifially just with store
 conditionals, what prevents you from a solution that handles that is a
 small overhead on that case instead of on every memory access?
 Particularly,  why not add a StoreCondInitiateAcc template in
 mem.isa that handles this special case? In there after the write to
 the CPU, you can maybe check to see if you want to mess with traceData
 stuff or not depending on if the conditional is already failed or
 whatever.

 2. If the traceData is deleted, then why is it available to be
 accessed in initiateAcc anyway? Isnt it enclosed by a if
 (traceData)?  Shouldnt that return false if that is a NULL pointer?

 3. Lastly, it sounds like this could be a slightly good candidate case
 for having the ability to functionally split TLB access from the
 timing memory access since in this case you want to do the access
 depending on something that happens with the TLB...

 On Sun, Apr 26, 2009 at 2:43 PM, Gabe Black gbl...@eecs.umich.edu
 mailto:gbl...@eecs.umich.edu wrote:

 There's an explanation already in the thread with the subject  Memory
 corruption inm5devrepositorywhenusing
 --trace-flags=ExecEnable. Stalling or sleeping doesn't make
 sense in
 this situation since it's not that sort of problem.

 Gabe

 Korey Sewell wrote:
  Can you explain how the completeAcc ever gets called before the
  initiatieAcc finishes in a SimpleCPU?
 
  I'm thinking, why isnt the SimpleCPU just stalling if there is a
 delay
  waiting for a memory access.If that's the right behavior, then isnt
  the solution to just stall the CPU (or even sleep it) and wait for
  initiateAcc to finish rather than try to work around the
  initiateAcc/completeAcc simultaneous call? I guess I dont understand
  fully the problem but it seems workable without a delay in overhead
  for all other memory accesses.
 
  Also, would this be a issue for any other CPU models (O3 or
 InOrder)?
 
  On Sun, Apr 26, 2009 at 3:22 AM, Gabe Black
 gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu
  mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu
 wrote:
 
  This patch should fix the issue Geoff was having where
 tracing would
  cause a segfault. If you could please give it a try, Geoff,
 let me
  know
  if it actually solves the problem. This will add a small
 overhead for
  every memory instruction even though the old behavior is
 wrong only in
  certain circumstances. I considered trying to avoid that
 overhead in
  cases where the problem can't happen, but then tracing would
  change the
  behavior of the CPU and that seems like a bad idea. Assuming
 this
  actually fixes the bug and nobody objects I'll commit this soon.
 
  Gabe
 
  gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu
 mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote:
   # HG changeset patch
   # User Gabe Black gbl...@eecs.umich.edu
 mailto:gbl...@eecs.umich.edu
  mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu
   # Date 1240729422 25200
   # Node ID 284ff3c27bbddacf5763e7b5a54e99e4dcd1912a
   # Parent  8652636856b3d24fb0088fb1af5f5dca5008d9c8
   CPU: Make the CPU wait until initiateAcc finishes before
 calling
  completeAcc.
  
   diff --git a/src/cpu/simple/timing.cc
 

Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-26 Thread Gabe Black
I should add, I realize it's not very clear what's going on, but it
would take a lot more effort to explain everything in enough detail to
get the problem across than for you (or anyone else) to look at the
code. If you do look into the code, it's not a functional problem as far
as how the timing CPU works or organizes events per se. It's a C++
problem where things may or may not end up being called from the middle
of each other.

Gabe

Gabe Black wrote:
 You should read my description, read the code, and then once you
 understand the problem comment on my solution.

 Gabe

 Korey Sewell wrote:
   
 Yea, I was on that thread. Your explanation wasnt particularly in
 great detail there (files?, line #s?) so its hard to follow without
 getting your hand dirty on where this code is you are referring to.

  I'll comment and you can clarify my assumptions...

 Oooh. I see what's broken. This is a result of my changes to
 allow delaying translation. What happens is that Stl_c goes into
 initiateAcc. That function calls write on the CPU which calls into the
 TLB which calls the translation callback which recognizes a failed store
 conditional which completes the instruction execution with completeAcc
 and cleans up. The call stack then collapses back to the initiateAcc
 which is still waiting to finish and which then tries to call a member
 function on traceData which was deleted during the cleanup. The problem
 here is not fundamentally complicated, but the mechanisms involved are.
 One solution would be to record the fact that we're still in
 initiateAcc, and if we are wait for the call stack to collapse back down
 to initiateAcc's caller before calling into completeAcc. That matches
 the semantics an instruction would expect more, I think, where the
 initiateAcc/completeAcc pair are called sequentially.

 A couple of thoughts come to mind:
 1. If the problem is with traceData and specifially just with store
 conditionals, what prevents you from a solution that handles that is a
 small overhead on that case instead of on every memory access?
 Particularly,  why not add a StoreCondInitiateAcc template in
 mem.isa that handles this special case? In there after the write to
 the CPU, you can maybe check to see if you want to mess with traceData
 stuff or not depending on if the conditional is already failed or
 whatever.

 2. If the traceData is deleted, then why is it available to be
 accessed in initiateAcc anyway? Isnt it enclosed by a if
 (traceData)?  Shouldnt that return false if that is a NULL pointer?

 3. Lastly, it sounds like this could be a slightly good candidate case
 for having the ability to functionally split TLB access from the
 timing memory access since in this case you want to do the access
 depending on something that happens with the TLB...

 On Sun, Apr 26, 2009 at 2:43 PM, Gabe Black gbl...@eecs.umich.edu
 mailto:gbl...@eecs.umich.edu wrote:

 There's an explanation already in the thread with the subject  Memory
 corruption inm5devrepositorywhenusing
 --trace-flags=ExecEnable. Stalling or sleeping doesn't make
 sense in
 this situation since it's not that sort of problem.

 Gabe

 Korey Sewell wrote:
  Can you explain how the completeAcc ever gets called before the
  initiatieAcc finishes in a SimpleCPU?
 
  I'm thinking, why isnt the SimpleCPU just stalling if there is a
 delay
  waiting for a memory access.If that's the right behavior, then isnt
  the solution to just stall the CPU (or even sleep it) and wait for
  initiateAcc to finish rather than try to work around the
  initiateAcc/completeAcc simultaneous call? I guess I dont understand
  fully the problem but it seems workable without a delay in overhead
  for all other memory accesses.
 
  Also, would this be a issue for any other CPU models (O3 or
 InOrder)?
 
  On Sun, Apr 26, 2009 at 3:22 AM, Gabe Black
 gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu
  mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu
 wrote:
 
  This patch should fix the issue Geoff was having where
 tracing would
  cause a segfault. If you could please give it a try, Geoff,
 let me
  know
  if it actually solves the problem. This will add a small
 overhead for
  every memory instruction even though the old behavior is
 wrong only in
  certain circumstances. I considered trying to avoid that
 overhead in
  cases where the problem can't happen, but then tracing would
  change the
  behavior of the CPU and that seems like a bad idea. Assuming
 this
  actually fixes the bug and nobody objects I'll commit this soon.
 
  Gabe
 
  gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu
 mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote:
   # HG changeset patch
   #