Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Steve Reinhardt
OK, thinking about this a bit more, it seems like the root of the problem is
that the timing translation is calling the callback right away instead of
deferring it to a later cycle.  I assume this only happens on a TLB miss for
a store conditional?  Is there any other path by which the completeAcc
callback can be invoked before the initiateAcc completes?  I agree that this
seems like a somewhat heavyweight solution for this rare case.

If this is the only scenario, what if we create a new event in the TLB to
call the completeAcc callback after delaying for the TLB access latency?
Presumably there's already an event that does this in the more common cases,
so maybe that can be tweaked to handle this case too.  I haven't actually
looked at the TLB code but thought I'd throw the idea out anyway.

Steve
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] Split Translation

2009-04-28 Thread nathan binkert
Hi Gabe,

[ I wrote this e-mail almost a month and a half ago and I just noticed
that it never got sent.  I think it is still relevant]

I've started looking at your patch and in particular some of the
translation code in the TimingSimpleCPU.  I have a couple of comments.

1) You call dcachePort.peerBlockSize() every time you enter this
function.  That call is probably actually something that you should
cache to avoid the extra virtual function call.
2) You split the translation if the address crosses blocks.  It seems
that it'd be better to check to see if the access crosses pages, not
blocks to save the extra work.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Gabriel Michael Black
This bug happens when both the translation component and the memory  
access component finish inline before returning. When that happens,  
completeAcc is effectively called from initiateAcc and you get into  
trouble.

The translation component will complete inline in all cases except TLB  
misses in X86. It will complete inline in all cases with any other  
ISA. Since this bug came up on Alpha, that part of things is  
unavoidable right now.

The second part comes in when the memory access component essentially  
doesn't happen. I'm a little fuzzy on the details, but if a store  
conditional fails, the CPU can skip talking to the memory access and  
go directly to completeDataAccess.

The overhead my change would add is two writes to a bool and two ifs,  
and I would have to imagine that's a lot more lightweight than  
creating, initializing, and scheduling a new event for all types of  
translation. My change also applies only once to each memory  
instruction where some unaligned accesses may be translated in two  
chunks.

We could probably streamline things a little and only check if we're  
in initiateAcc on code paths that skip going to memory.

Gabe

Quoting Steve Reinhardt ste...@gmail.com:

 OK, thinking about this a bit more, it seems like the root of the problem is
 that the timing translation is calling the callback right away instead of
 deferring it to a later cycle.  I assume this only happens on a TLB miss for
 a store conditional?  Is there any other path by which the completeAcc
 callback can be invoked before the initiateAcc completes?  I agree that this
 seems like a somewhat heavyweight solution for this rare case.

 If this is the only scenario, what if we create a new event in the TLB to
 call the completeAcc callback after delaying for the TLB access latency?
 Presumably there's already an event that does this in the more common cases,
 so maybe that can be tweaked to handle this case too.  I haven't actually
 looked at the TLB code but thought I'd throw the idea out anyway.

 Steve



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Split Translation

2009-04-28 Thread Gabriel Michael Black
Quoting nathan binkert n...@binkert.org:

 Hi Gabe,

 [ I wrote this e-mail almost a month and a half ago and I just noticed
 that it never got sent.  I think it is still relevant]

 I've started looking at your patch and in particular some of the
 translation code in the TimingSimpleCPU.  I have a couple of comments.

 1) You call dcachePort.peerBlockSize() every time you enter this
 function.  That call is probably actually something that you should
 cache to avoid the extra virtual function call.

Would that be something to cache when the CPU is constructed?

 2) You split the translation if the address crosses blocks.  It seems
 that it'd be better to check to see if the access crosses pages, not
 blocks to save the extra work.

I was told I should use the peer block size because otherwise the peer  
would get upset. I have no problem with adjusting it to whatever size  
works.

Gabe

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Split Translation

2009-04-28 Thread nathan binkert
 [ I wrote this e-mail almost a month and a half ago and I just noticed
 that it never got sent.  I think it is still relevant]

 I've started looking at your patch and in particular some of the
 translation code in the TimingSimpleCPU.  I have a couple of comments.

 1) You call dcachePort.peerBlockSize() every time you enter this
 function.  That call is probably actually something that you should
 cache to avoid the extra virtual function call.

 Would that be something to cache when the CPU is constructed?
I think in the constructor is too early, but all ports should be
connected by the time init() or startup() happen.

 2) You split the translation if the address crosses blocks.  It seems
 that it'd be better to check to see if the access crosses pages, not
 blocks to save the extra work.

 I was told I should use the peer block size because otherwise the peer
 would get upset. I have no problem with adjusting it to whatever size
 works.
Well, I guess it depends if you treat split translations separately
from split caches.  I don't know the code well enough, but you could
have one translation that required the cache accesses be split.  I
don't know how the translation code fits into the flow well enough to
make a determination on this.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread nathan binkert
 This bug happens when both the translation component and the memory
 access component finish inline before returning. When that happens,
 completeAcc is effectively called from initiateAcc and you get into
 trouble.

 The translation component will complete inline in all cases except TLB
 misses in X86. It will complete inline in all cases with any other
 ISA. Since this bug came up on Alpha, that part of things is
 unavoidable right now.

 The second part comes in when the memory access component essentially
 doesn't happen. I'm a little fuzzy on the details, but if a store
 conditional fails, the CPU can skip talking to the memory access and
 go directly to completeDataAccess.

 The overhead my change would add is two writes to a bool and two ifs,
 and I would have to imagine that's a lot more lightweight than
 creating, initializing, and scheduling a new event for all types of
 translation. My change also applies only once to each memory
 instruction where some unaligned accesses may be translated in two
 chunks.

 We could probably streamline things a little and only check if we're
 in initiateAcc on code paths that skip going to memory.

What if you use my change instead?  (Get rid of the traceData function
parameter and use xc-traceData)

Incidentally, there's a bunch of if (traceData) ... in the generated
code and it's not wrapped with #if TRACING_ON  Would be nice if that
could be changed as part of this too.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Gabriel Michael Black
Quoting nathan binkert n...@binkert.org:

 I think the difference is that passing it in explicitly means that  
 you could
 keep it in a register for the duration of the function, and if you can put
 it in a callee-save then maybe you could save a few dereferences.  In
 contrast the indirection would have to be repeated after every  
 function call
 (which is exactly the behavior we'd be relying on here).

 Wouldn't we need to make it xc-traceData() to allow for all the
 redirection we've got in there?

 What redirection?  Or do you mean because traceData is protected in
 the simple CPU?  You could, but then again, you could just make it
 public.  Otherwise, I'm not sure what you're talking about.  The
 execute function is generated for each cpu model separately and it
 knows the type.

   Nate
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


I'm talking about how there's the exec context, and then the proxy  
exec context, and the thread context, and the simple thread context,  
and the dyninst, and the etc. etc., and calls to, for instance, read a  
register can travel through several layers before actually being  
serviced. By making traceData a direct member of the very first exec  
context interface, we're precluding passing it back to other layers  
that may want some input. That's not to say I don't think it will  
work, because I actually think it would, but it seems inconsistent  
with how the rest of that interface goes together.

Also this still doesn't address the fact that, one, completeAcc is  
being called before initiateAcc has fully run, and two, that the trace  
information is being printed before initiateAcc has written all its  
results.

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread nathan binkert
 What redirection?  Or do you mean because traceData is protected in
 the simple CPU?  You could, but then again, you could just make it
 public.  Otherwise, I'm not sure what you're talking about.  The
 execute function is generated for each cpu model separately and it
 knows the type.

 I'm talking about how there's the exec context, and then the proxy
 exec context, and the thread context, and the simple thread context,
 and the dyninst, and the etc. etc., and calls to, for instance, read a
 register can travel through several layers before actually being
 serviced. By making traceData a direct member of the very first exec
 context interface, we're precluding passing it back to other layers
 that may want some input. That's not to say I don't think it will
 work, because I actually think it would, but it seems inconsistent
 with how the rest of that interface goes together.
I'm specifically talking about the parameter passed into the exec
function which is either SimpleCPU or BaseDynInst.  Those both have a
traceData parameter and it can just be used directly without any fancy
indirection.  I am only talking about this code specifically.

 Also this still doesn't address the fact that, one, completeAcc is
 being called before initiateAcc has fully run, and two, that the trace
 information is being printed before initiateAcc has written all its
 results.
Can't we just ensure that foo-traceData is set to NULL after it is
deleted?  Then the setData that's done in initiateAcc after the write
won't complete.

Also, looking at the code more closely, is there some reason taht we
can't just move the setData line before the write?  Wouldn't that
solve the problem?  It seems to me that the program wouldn't change
since the contents of Mem should not change due to xc-write?
(correct?)

Sorry if I'm just being unhelpful.  I don't know this code super well.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Korey Sewell
So my original point (that obviously wasnt conveyed very well) was
more trying to re-work how this STL_C case works rather than kind of
patch it up so it works in the current framework.

Seems to be me that completeAcc should never get called before
initiateAcc in any circumstance. I know that shortcut is because you
STL_C fails early, but even so, why cant it just finish like any other
instruction. What benefit are we gettiing from early completing it
(especially in a SimpleCPU)? Am I getting this fundamental issue? If
I'm not getting that wrong, then we can just fix the SimpleCPU to not
handle this special case in this fashion and all will be fine.

So trying to play around with traceData in my opinion is the wrong way
to solve it. If you are going to do that, Nate's idea of writing the
traceData before the access sounds like it will work.But Gabe's way
works too, so whatever works there is fine with me.

Long term, I'm just thinking that if I call initiateAcc and there is a
risk of completeAcc happening before it finishes seems like it breaks
the interface that the CPU/INST should use. And then in the future,
this is going to be a nasty bug that will trip somebody up (because
completeAcc might do something that screws things up early).

On Tue, Apr 28, 2009 at 5:34 PM, nathan binkert n...@binkert.org wrote:
 What redirection?  Or do you mean because traceData is protected in
 the simple CPU?  You could, but then again, you could just make it
 public.  Otherwise, I'm not sure what you're talking about.  The
 execute function is generated for each cpu model separately and it
 knows the type.

 I'm talking about how there's the exec context, and then the proxy
 exec context, and the thread context, and the simple thread context,
 and the dyninst, and the etc. etc., and calls to, for instance, read a
 register can travel through several layers before actually being
 serviced. By making traceData a direct member of the very first exec
 context interface, we're precluding passing it back to other layers
 that may want some input. That's not to say I don't think it will
 work, because I actually think it would, but it seems inconsistent
 with how the rest of that interface goes together.
 I'm specifically talking about the parameter passed into the exec
 function which is either SimpleCPU or BaseDynInst.  Those both have a
 traceData parameter and it can just be used directly without any fancy
 indirection.  I am only talking about this code specifically.

 Also this still doesn't address the fact that, one, completeAcc is
 being called before initiateAcc has fully run, and two, that the trace
 information is being printed before initiateAcc has written all its
 results.
 Can't we just ensure that foo-traceData is set to NULL after it is
 deleted?  Then the setData that's done in initiateAcc after the write
 won't complete.

 Also, looking at the code more closely, is there some reason taht we
 can't just move the setData line before the write?  Wouldn't that
 solve the problem?  It seems to me that the program wouldn't change
 since the contents of Mem should not change due to xc-write?
 (correct?)

 Sorry if I'm just being unhelpful.  I don't know this code super well.

  Nate
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev




-- 
===
Korey L Sewell
Computer Science  Engineering
University of Michigan
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Korey Sewell
 As far as moving around the code, that would be easier said than done.
 That's all generated and I don't know how easy it would be to get the
 isa_parser to generate something structured differently that would
 still make sense in all the other instructions of all the other ISAs.
 Steve?

I was mentioning something like this earlier, but in
arch/alpha/mem.isa the initiateAcc code is generated from the
LoadStore::initiateAcc template. If you just give Store conditionals
its own template for initiateAcc (like it already has for
completeAcc), I think that will be a fix too (assuming you set the
tracedata early or handle it some other way)...

-- 
===
Korey L Sewell
Computer Science  Engineering
University of Michigan
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Gabriel Michael Black
Quoting Korey Sewell ksew...@umich.edu:

 So my original point (that obviously wasnt conveyed very well) was
 more trying to re-work how this STL_C case works rather than kind of
 patch it up so it works in the current framework.

 Seems to be me that completeAcc should never get called before
 initiateAcc in any circumstance. I know that shortcut is because you
 STL_C fails early, but even so, why cant it just finish like any other
 instruction. What benefit are we gettiing from early completing it
 (especially in a SimpleCPU)? Am I getting this fundamental issue? If
 I'm not getting that wrong, then we can just fix the SimpleCPU to not
 handle this special case in this fashion and all will be fine.

 So trying to play around with traceData in my opinion is the wrong way
 to solve it. If you are going to do that, Nate's idea of writing the
 traceData before the access sounds like it will work.But Gabe's way
 works too, so whatever works there is fine with me.

 Long term, I'm just thinking that if I call initiateAcc and there is a
 risk of completeAcc happening before it finishes seems like it breaks
 the interface that the CPU/INST should use. And then in the future,
 this is going to be a nasty bug that will trip somebody up (because
 completeAcc might do something that screws things up early).

I agree with everything you've said here, except for one half thing  
and I've got a comment on one other. First, my change is actually  
doing what your saying. It's not mucking around with traceData, it's  
actually putting off calling completeDataAccess (which calls  
completeAcc) if we're in the middle of initiateAcc until we come out  
of it. That's the half.

The other thing is the reason STC doesn't work like the other memory  
instructions. In those other cases where you're calling into memory,  
you're going to get a response some time later. When that happens,  
initiateAcc has since finished running and you can end up in  
completeAcc and be ok. When you don't have a delay in translation or  
when you go to memory, then there's no later point where you'll be  
doing anything for that instruction and you need to deal with it now.

One completely different solution would be to do what you're saying  
and just not have the timing simple CPU work like it does. I think  
it's organized a lot like the atomic simple CPU because it used to be  
a part of it, but really a more natural organization would be as a  
state machine where activity was broken by hard lines into states.  
There might be a little more overhead that way, but it would make it  
much easier for non-experts to see what the heck is going on in there.  
It would also help the experts avoid introducing bugs because there  
are so many interactions to consider.

Gabe


 On Tue, Apr 28, 2009 at 5:34 PM, nathan binkert n...@binkert.org wrote:
 What redirection?  Or do you mean because traceData is protected in
 the simple CPU?  You could, but then again, you could just make it
 public.  Otherwise, I'm not sure what you're talking about.  The
 execute function is generated for each cpu model separately and it
 knows the type.

 I'm talking about how there's the exec context, and then the proxy
 exec context, and the thread context, and the simple thread context,
 and the dyninst, and the etc. etc., and calls to, for instance, read a
 register can travel through several layers before actually being
 serviced. By making traceData a direct member of the very first exec
 context interface, we're precluding passing it back to other layers
 that may want some input. That's not to say I don't think it will
 work, because I actually think it would, but it seems inconsistent
 with how the rest of that interface goes together.
 I'm specifically talking about the parameter passed into the exec
 function which is either SimpleCPU or BaseDynInst.  Those both have a
 traceData parameter and it can just be used directly without any fancy
 indirection.  I am only talking about this code specifically.

 Also this still doesn't address the fact that, one, completeAcc is
 being called before initiateAcc has fully run, and two, that the trace
 information is being printed before initiateAcc has written all its
 results.
 Can't we just ensure that foo-traceData is set to NULL after it is
 deleted?  Then the setData that's done in initiateAcc after the write
 won't complete.

 Also, looking at the code more closely, is there some reason taht we
 can't just move the setData line before the write?  Wouldn't that
 solve the problem?  It seems to me that the program wouldn't change
 since the contents of Mem should not change due to xc-write?
 (correct?)

 Sorry if I'm just being unhelpful.  I don't know this code super well.

  Nate
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev




 --
 ===
 Korey L Sewell
 Computer Science  Engineering
 University of Michigan
 

Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread nathan binkert
 I'm specifically talking about the parameter passed into the exec
 function which is either SimpleCPU or BaseDynInst.  Those both have a
 traceData parameter and it can just be used directly without any fancy
 indirection.  I am only talking about this code specifically.

 I'm confused now. If we're passing it in, then it's a local variable
 on the stack and we can't set it to NULL externally.
Right, I don't like this.

 If it's a local
 member of the exec context, it's not transparently modifiable by
 deeper layers of the interface.
What does this mean?  Instead of doing traceData, you do
xc-traceData.  Then someone could set destroy xc-traceData and set
it to NULL and we'd be OK.

 If it's a function call, we're paying
 a bunch of function call overhead. Which of these do (do not) apply?
This does not apply.

 Also, looking at the code more closely, is there some reason taht we
 can't just move the setData line before the write?  Wouldn't that
 solve the problem?  It seems to me that the program wouldn't change
 since the contents of Mem should not change due to xc-write?
 (correct?)

 That would ideally fix the segfault, aka the symptom, but it doesn't
 fix the core issue. The second half of initiateAcc won't be able to
 participate in the trace since traceData is dumped right before being
 deleted. That portion will also actually run -after- completeAcc, but
 only sometimes. That sounds like a source for a lot of confusion and
 bugs (like this one) to me.


 As far as moving around the code, that would be easier said than done.
 That's all generated and I don't know how easy it would be to get the
 isa_parser to generate something structured differently that would
 still make sense in all the other instructions of all the other ISAs.
 Steve?
Ok, whatever the correct solution, I think we should minimally do these things:
1) Remove the parameter,
2) change traceData to xc-traceData
3) and surround uses of traceData with #if TRACING_ON


  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Gabriel Michael Black
Quoting nathan binkert n...@binkert.org:

 I'm specifically talking about the parameter passed into the exec
 function which is either SimpleCPU or BaseDynInst.  Those both have a
 traceData parameter and it can just be used directly without any fancy
 indirection.  I am only talking about this code specifically.

 I'm confused now. If we're passing it in, then it's a local variable
 on the stack and we can't set it to NULL externally.
 Right, I don't like this.

 If it's a local
 member of the exec context, it's not transparently modifiable by
 deeper layers of the interface.
 What does this mean?  Instead of doing traceData, you do
 xc-traceData.  Then someone could set destroy xc-traceData and set
 it to NULL and we'd be OK.

A parallel example would be getting a pointer to the TLBs. In that  
case, you don't just have a TLB pointer member on the exec context,  
you have a function called something like getDTBPtr() that percolates  
to the right place that actually has the current value. That's just  
how this stuff has been put together up to now, I assume for a reason,  
and we'd be diverging from that. I don't know of any calamitous  
problem that would introduce, but we'd be partially undermining the  
flexibility we've built in for other values.


 3) and surround uses of traceData with #if TRACING_ON

I meant to mention it before, but I think this bit is very doable and  
makes a lot of sense.

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread nathan binkert
 One completely different solution would be to do what you're saying
 and just not have the timing simple CPU work like it does. I think
 it's organized a lot like the atomic simple CPU because it used to be
 a part of it, but really a more natural organization would be as a
 state machine where activity was broken by hard lines into states.
 There might be a little more overhead that way, but it would make it
 much easier for non-experts to see what the heck is going on in there.
 It would also help the experts avoid introducing bugs because there
 are so many interactions to consider.

This seems pretty reasonable to me.  There are already states in the
models to deal with cache misses and such, so adding another state for
the different phases of a memory access seems reasonable.  The impact
probably wouldn't be huge given the fact that the cache is probably
pretty expensive compared to the CPU when executing memory ops.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread nathan binkert
 So a tentative plan would be:
 1. Commit my patch for now so the bug is fixed.
 2. Add the #if TRACING_ON.
 3.5 Hash out what to do with traceData
 3.5 Reorganize the timing simple CPU.

 Does anyone object?

I guess I'd rather not do #1 if it isn't the solution because things
like that have a tendency to live on.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Steve Reinhardt
Whew, I don't check my email for one afternoon and you guys make me pay for
it...

I'm still curious about the possibility of using an event to call the
completeAcc() callback in the case where the translation fails on a store
conditional.  This solution seems particularly attractive to me if it's just
a case of making this particular scenario follow the existing x86-tlb-miss
path rather than the default path, in which case there shouldn't be much new
code at all.  Maybe it's a little more complicated than that if we have to
import some of the x86-tlb-miss code into alpha, but it still doesn't seem
that bad.  I think Gabe misunderstood me; I'm not talking about doing this
for all TLB hit responses, just this one specific case.  There may well be a
subtlety that makes this not as clean as I envision it, but to me it solves
the root cause of the problem, which is that completeAcc() is getting called
before initiateAcc() terminates, and does it in a way that doesn't clutter
the logic in SimpleTimingCPU the way Gabe's current fix does.

Adding in the #if TRACING_ON lines is a no-brainer but is really independent
of this bug.

I think moving the traceData variable into the exec context is not a bad
idea (though not strongly compelling to me either), and does happen to have
the side effect of addressing this particular bug.  However I agree with
Gabe that it's really a very limited solution and doesn't address the
conceptual issue that caused this bug, which leaves us open to future
similar bugs occurring.  Maybe that will never happen, but it makes this a
less compelling solution to the bug itself.  But if we're going to do it
anyway, it could work as a short-term fix for the bug.

The SimpleTimingCPU is getting complex, but it's not obvious to me how
adding more explicit states would clean it up significantly, particularly if
we avoid adding Gabe's current patch (which does muddle up the control flow
a bit more than it already is).  I'm open to learning more about what people
envision here though.

Steve

On Tue, Apr 28, 2009 at 4:05 PM, nathan binkert n...@binkert.org wrote:

  So a tentative plan would be:
  1. Commit my patch for now so the bug is fixed.
  2. Add the #if TRACING_ON.
  3.5 Hash out what to do with traceData
  3.5 Reorganize the timing simple CPU.
 
  Does anyone object?

 I guess I'd rather not do #1 if it isn't the solution because things
 like that have a tendency to live on.

  Nate
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Split Translation

2009-04-28 Thread Steve Reinhardt
On Tue, Apr 28, 2009 at 1:19 PM, nathan binkert n...@binkert.org wrote:

  2) You split the translation if the address crosses blocks.  It seems
  that it'd be better to check to see if the access crosses pages, not
  blocks to save the extra work.
 
  I was told I should use the peer block size because otherwise the peer
  would get upset. I have no problem with adjusting it to whatever size
  works.
 Well, I guess it depends if you treat split translations separately
 from split caches.  I don't know the code well enough, but you could
 have one translation that required the cache accesses be split.  I
 don't know how the translation code fits into the flow well enough to
 make a determination on this.


I don't see much benefit in doing the split twice (once for page crossings
and a second time for cache line crossings), though if there was a way to
isolate the bulk of the code in a single function that gets called twice it
wouldn't be too bad.  There is some merit in only calling the TLB once on a
single-page cache-line-split transaction though, particularly because
calling it twice in this case will make your TLB access and hit rates
artificially high.  Doing two separate split checks might be the easiest way
of doing this, or there could be a separate test on cache splits that copies
the translation from one request to the other instead of calling the TLB a
second time if the two accesses are on the same page.

Steve
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Split Translation

2009-04-28 Thread nathan binkert
 I don't see much benefit in doing the split twice (once for page crossings
 and a second time for cache line crossings), though if there was a way to
 isolate the bulk of the code in a single function that gets called twice it
 wouldn't be too bad.  There is some merit in only calling the TLB once on a
 single-page cache-line-split transaction though, particularly because
 calling it twice in this case will make your TLB access and hit rates
 artificially high.  Doing two separate split checks might be the easiest way
 of doing this, or there could be a separate test on cache splits that copies
 the translation from one request to the other instead of calling the TLB a
 second time if the two accesses are on the same page.

I was mostly referring to the realism.  If unaligned accesses are
indeed very fast on modern x86, then people might intentionally start
to use them even more.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Gabe Black
Steve Reinhardt wrote:
 Whew, I don't check my email for one afternoon and you guys make me
 pay for it...

 I'm still curious about the possibility of using an event to call the
 completeAcc() callback in the case where the translation fails on a
 store conditional.  This solution seems particularly attractive to me
 if it's just a case of making this particular scenario follow the
 existing x86-tlb-miss path rather than the default path, in which case
 there shouldn't be much new code at all.  Maybe it's a little more
 complicated than that if we have to import some of the x86-tlb-miss
 code into alpha, but it still doesn't seem that bad.  I think Gabe
 misunderstood me; I'm not talking about doing this for all TLB hit
 responses, just this one specific case.  There may well be a subtlety
 that makes this not as clean as I envision it, but to me it solves the
 root cause of the problem, which is that completeAcc() is getting
 called before initiateAcc() terminates, and does it in a way that
 doesn't clutter the logic in SimpleTimingCPU the way Gabe's current
 fix does.

 Adding in the #if TRACING_ON lines is a no-brainer but is really
 independent of this bug.

 I think moving the traceData variable into the exec context is not a
 bad idea (though not strongly compelling to me either), and does
 happen to have the side effect of addressing this particular bug. 
 However I agree with Gabe that it's really a very limited solution and
 doesn't address the conceptual issue that caused this bug, which
 leaves us open to future similar bugs occurring.  Maybe that will
 never happen, but it makes this a less compelling solution to the bug
 itself.  But if we're going to do it anyway, it could work as a
 short-term fix for the bug.

 The SimpleTimingCPU is getting complex, but it's not obvious to me how
 adding more explicit states would clean it up significantly,
 particularly if we avoid adding Gabe's current patch (which does
 muddle up the control flow a bit more than it already is).  I'm open
 to learning more about what people envision here though.

I'm not sure if it will be hugely better. I'd have to just do it and see
how it comes out, I suppose. Wouldn't scheduling an event be similar to
what I have, just using the event queue instead of the new packet
pointer? I could imagine it being better, actually, so I'm tentatively
convinced. There shouldn't be a need to move any x86 code into Alpha to
make that work, but I wouldn't swear to that.

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc

2009-04-28 Thread Steve Reinhardt
On Tue, Apr 28, 2009 at 4:43 PM, Gabe Black gbl...@eecs.umich.edu wrote:

 I'm not sure if it will be hugely better. I'd have to just do it and see
 how it comes out, I suppose. Wouldn't scheduling an event be similar to
 what I have, just using the event queue instead of the new packet
 pointer?


Yea, at a certain level, the effect is similar--just deferring the callback
until later--which is not bad, since that's what addresses the problem.  I
think that using the existing event queue mechanism (which is explicitly
defined for deferring stuff until later) is less convoluted than adding a
handful of new variables and control code to cover just this one specific
situation.



 I could imagine it being better, actually, so I'm tentatively
 convinced. There shouldn't be a need to move any x86 code into Alpha to
 make that work, but I wouldn't swear to that.


Thanks for giving it a try.

Steve
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev