Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
OK, thinking about this a bit more, it seems like the root of the problem is that the timing translation is calling the callback right away instead of deferring it to a later cycle. I assume this only happens on a TLB miss for a store conditional? Is there any other path by which the completeAcc callback can be invoked before the initiateAcc completes? I agree that this seems like a somewhat heavyweight solution for this rare case. If this is the only scenario, what if we create a new event in the TLB to call the completeAcc callback after delaying for the TLB access latency? Presumably there's already an event that does this in the more common cases, so maybe that can be tweaked to handle this case too. I haven't actually looked at the TLB code but thought I'd throw the idea out anyway. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] Split Translation
Hi Gabe, [ I wrote this e-mail almost a month and a half ago and I just noticed that it never got sent. I think it is still relevant] I've started looking at your patch and in particular some of the translation code in the TimingSimpleCPU. I have a couple of comments. 1) You call dcachePort.peerBlockSize() every time you enter this function. That call is probably actually something that you should cache to avoid the extra virtual function call. 2) You split the translation if the address crosses blocks. It seems that it'd be better to check to see if the access crosses pages, not blocks to save the extra work. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
This bug happens when both the translation component and the memory access component finish inline before returning. When that happens, completeAcc is effectively called from initiateAcc and you get into trouble. The translation component will complete inline in all cases except TLB misses in X86. It will complete inline in all cases with any other ISA. Since this bug came up on Alpha, that part of things is unavoidable right now. The second part comes in when the memory access component essentially doesn't happen. I'm a little fuzzy on the details, but if a store conditional fails, the CPU can skip talking to the memory access and go directly to completeDataAccess. The overhead my change would add is two writes to a bool and two ifs, and I would have to imagine that's a lot more lightweight than creating, initializing, and scheduling a new event for all types of translation. My change also applies only once to each memory instruction where some unaligned accesses may be translated in two chunks. We could probably streamline things a little and only check if we're in initiateAcc on code paths that skip going to memory. Gabe Quoting Steve Reinhardt ste...@gmail.com: OK, thinking about this a bit more, it seems like the root of the problem is that the timing translation is calling the callback right away instead of deferring it to a later cycle. I assume this only happens on a TLB miss for a store conditional? Is there any other path by which the completeAcc callback can be invoked before the initiateAcc completes? I agree that this seems like a somewhat heavyweight solution for this rare case. If this is the only scenario, what if we create a new event in the TLB to call the completeAcc callback after delaying for the TLB access latency? Presumably there's already an event that does this in the more common cases, so maybe that can be tweaked to handle this case too. I haven't actually looked at the TLB code but thought I'd throw the idea out anyway. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Split Translation
Quoting nathan binkert n...@binkert.org: Hi Gabe, [ I wrote this e-mail almost a month and a half ago and I just noticed that it never got sent. I think it is still relevant] I've started looking at your patch and in particular some of the translation code in the TimingSimpleCPU. I have a couple of comments. 1) You call dcachePort.peerBlockSize() every time you enter this function. That call is probably actually something that you should cache to avoid the extra virtual function call. Would that be something to cache when the CPU is constructed? 2) You split the translation if the address crosses blocks. It seems that it'd be better to check to see if the access crosses pages, not blocks to save the extra work. I was told I should use the peer block size because otherwise the peer would get upset. I have no problem with adjusting it to whatever size works. Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Split Translation
[ I wrote this e-mail almost a month and a half ago and I just noticed that it never got sent. I think it is still relevant] I've started looking at your patch and in particular some of the translation code in the TimingSimpleCPU. I have a couple of comments. 1) You call dcachePort.peerBlockSize() every time you enter this function. That call is probably actually something that you should cache to avoid the extra virtual function call. Would that be something to cache when the CPU is constructed? I think in the constructor is too early, but all ports should be connected by the time init() or startup() happen. 2) You split the translation if the address crosses blocks. It seems that it'd be better to check to see if the access crosses pages, not blocks to save the extra work. I was told I should use the peer block size because otherwise the peer would get upset. I have no problem with adjusting it to whatever size works. Well, I guess it depends if you treat split translations separately from split caches. I don't know the code well enough, but you could have one translation that required the cache accesses be split. I don't know how the translation code fits into the flow well enough to make a determination on this. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
This bug happens when both the translation component and the memory access component finish inline before returning. When that happens, completeAcc is effectively called from initiateAcc and you get into trouble. The translation component will complete inline in all cases except TLB misses in X86. It will complete inline in all cases with any other ISA. Since this bug came up on Alpha, that part of things is unavoidable right now. The second part comes in when the memory access component essentially doesn't happen. I'm a little fuzzy on the details, but if a store conditional fails, the CPU can skip talking to the memory access and go directly to completeDataAccess. The overhead my change would add is two writes to a bool and two ifs, and I would have to imagine that's a lot more lightweight than creating, initializing, and scheduling a new event for all types of translation. My change also applies only once to each memory instruction where some unaligned accesses may be translated in two chunks. We could probably streamline things a little and only check if we're in initiateAcc on code paths that skip going to memory. What if you use my change instead? (Get rid of the traceData function parameter and use xc-traceData) Incidentally, there's a bunch of if (traceData) ... in the generated code and it's not wrapped with #if TRACING_ON Would be nice if that could be changed as part of this too. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Quoting nathan binkert n...@binkert.org: I think the difference is that passing it in explicitly means that you could keep it in a register for the duration of the function, and if you can put it in a callee-save then maybe you could save a few dereferences. In contrast the indirection would have to be repeated after every function call (which is exactly the behavior we'd be relying on here). Wouldn't we need to make it xc-traceData() to allow for all the redirection we've got in there? What redirection? Or do you mean because traceData is protected in the simple CPU? You could, but then again, you could just make it public. Otherwise, I'm not sure what you're talking about. The execute function is generated for each cpu model separately and it knows the type. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev I'm talking about how there's the exec context, and then the proxy exec context, and the thread context, and the simple thread context, and the dyninst, and the etc. etc., and calls to, for instance, read a register can travel through several layers before actually being serviced. By making traceData a direct member of the very first exec context interface, we're precluding passing it back to other layers that may want some input. That's not to say I don't think it will work, because I actually think it would, but it seems inconsistent with how the rest of that interface goes together. Also this still doesn't address the fact that, one, completeAcc is being called before initiateAcc has fully run, and two, that the trace information is being printed before initiateAcc has written all its results. Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
What redirection? Or do you mean because traceData is protected in the simple CPU? You could, but then again, you could just make it public. Otherwise, I'm not sure what you're talking about. The execute function is generated for each cpu model separately and it knows the type. I'm talking about how there's the exec context, and then the proxy exec context, and the thread context, and the simple thread context, and the dyninst, and the etc. etc., and calls to, for instance, read a register can travel through several layers before actually being serviced. By making traceData a direct member of the very first exec context interface, we're precluding passing it back to other layers that may want some input. That's not to say I don't think it will work, because I actually think it would, but it seems inconsistent with how the rest of that interface goes together. I'm specifically talking about the parameter passed into the exec function which is either SimpleCPU or BaseDynInst. Those both have a traceData parameter and it can just be used directly without any fancy indirection. I am only talking about this code specifically. Also this still doesn't address the fact that, one, completeAcc is being called before initiateAcc has fully run, and two, that the trace information is being printed before initiateAcc has written all its results. Can't we just ensure that foo-traceData is set to NULL after it is deleted? Then the setData that's done in initiateAcc after the write won't complete. Also, looking at the code more closely, is there some reason taht we can't just move the setData line before the write? Wouldn't that solve the problem? It seems to me that the program wouldn't change since the contents of Mem should not change due to xc-write? (correct?) Sorry if I'm just being unhelpful. I don't know this code super well. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
So my original point (that obviously wasnt conveyed very well) was more trying to re-work how this STL_C case works rather than kind of patch it up so it works in the current framework. Seems to be me that completeAcc should never get called before initiateAcc in any circumstance. I know that shortcut is because you STL_C fails early, but even so, why cant it just finish like any other instruction. What benefit are we gettiing from early completing it (especially in a SimpleCPU)? Am I getting this fundamental issue? If I'm not getting that wrong, then we can just fix the SimpleCPU to not handle this special case in this fashion and all will be fine. So trying to play around with traceData in my opinion is the wrong way to solve it. If you are going to do that, Nate's idea of writing the traceData before the access sounds like it will work.But Gabe's way works too, so whatever works there is fine with me. Long term, I'm just thinking that if I call initiateAcc and there is a risk of completeAcc happening before it finishes seems like it breaks the interface that the CPU/INST should use. And then in the future, this is going to be a nasty bug that will trip somebody up (because completeAcc might do something that screws things up early). On Tue, Apr 28, 2009 at 5:34 PM, nathan binkert n...@binkert.org wrote: What redirection? Or do you mean because traceData is protected in the simple CPU? You could, but then again, you could just make it public. Otherwise, I'm not sure what you're talking about. The execute function is generated for each cpu model separately and it knows the type. I'm talking about how there's the exec context, and then the proxy exec context, and the thread context, and the simple thread context, and the dyninst, and the etc. etc., and calls to, for instance, read a register can travel through several layers before actually being serviced. By making traceData a direct member of the very first exec context interface, we're precluding passing it back to other layers that may want some input. That's not to say I don't think it will work, because I actually think it would, but it seems inconsistent with how the rest of that interface goes together. I'm specifically talking about the parameter passed into the exec function which is either SimpleCPU or BaseDynInst. Those both have a traceData parameter and it can just be used directly without any fancy indirection. I am only talking about this code specifically. Also this still doesn't address the fact that, one, completeAcc is being called before initiateAcc has fully run, and two, that the trace information is being printed before initiateAcc has written all its results. Can't we just ensure that foo-traceData is set to NULL after it is deleted? Then the setData that's done in initiateAcc after the write won't complete. Also, looking at the code more closely, is there some reason taht we can't just move the setData line before the write? Wouldn't that solve the problem? It seems to me that the program wouldn't change since the contents of Mem should not change due to xc-write? (correct?) Sorry if I'm just being unhelpful. I don't know this code super well. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- === Korey L Sewell Computer Science Engineering University of Michigan ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
As far as moving around the code, that would be easier said than done. That's all generated and I don't know how easy it would be to get the isa_parser to generate something structured differently that would still make sense in all the other instructions of all the other ISAs. Steve? I was mentioning something like this earlier, but in arch/alpha/mem.isa the initiateAcc code is generated from the LoadStore::initiateAcc template. If you just give Store conditionals its own template for initiateAcc (like it already has for completeAcc), I think that will be a fix too (assuming you set the tracedata early or handle it some other way)... -- === Korey L Sewell Computer Science Engineering University of Michigan ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Quoting Korey Sewell ksew...@umich.edu: So my original point (that obviously wasnt conveyed very well) was more trying to re-work how this STL_C case works rather than kind of patch it up so it works in the current framework. Seems to be me that completeAcc should never get called before initiateAcc in any circumstance. I know that shortcut is because you STL_C fails early, but even so, why cant it just finish like any other instruction. What benefit are we gettiing from early completing it (especially in a SimpleCPU)? Am I getting this fundamental issue? If I'm not getting that wrong, then we can just fix the SimpleCPU to not handle this special case in this fashion and all will be fine. So trying to play around with traceData in my opinion is the wrong way to solve it. If you are going to do that, Nate's idea of writing the traceData before the access sounds like it will work.But Gabe's way works too, so whatever works there is fine with me. Long term, I'm just thinking that if I call initiateAcc and there is a risk of completeAcc happening before it finishes seems like it breaks the interface that the CPU/INST should use. And then in the future, this is going to be a nasty bug that will trip somebody up (because completeAcc might do something that screws things up early). I agree with everything you've said here, except for one half thing and I've got a comment on one other. First, my change is actually doing what your saying. It's not mucking around with traceData, it's actually putting off calling completeDataAccess (which calls completeAcc) if we're in the middle of initiateAcc until we come out of it. That's the half. The other thing is the reason STC doesn't work like the other memory instructions. In those other cases where you're calling into memory, you're going to get a response some time later. When that happens, initiateAcc has since finished running and you can end up in completeAcc and be ok. When you don't have a delay in translation or when you go to memory, then there's no later point where you'll be doing anything for that instruction and you need to deal with it now. One completely different solution would be to do what you're saying and just not have the timing simple CPU work like it does. I think it's organized a lot like the atomic simple CPU because it used to be a part of it, but really a more natural organization would be as a state machine where activity was broken by hard lines into states. There might be a little more overhead that way, but it would make it much easier for non-experts to see what the heck is going on in there. It would also help the experts avoid introducing bugs because there are so many interactions to consider. Gabe On Tue, Apr 28, 2009 at 5:34 PM, nathan binkert n...@binkert.org wrote: What redirection? Or do you mean because traceData is protected in the simple CPU? You could, but then again, you could just make it public. Otherwise, I'm not sure what you're talking about. The execute function is generated for each cpu model separately and it knows the type. I'm talking about how there's the exec context, and then the proxy exec context, and the thread context, and the simple thread context, and the dyninst, and the etc. etc., and calls to, for instance, read a register can travel through several layers before actually being serviced. By making traceData a direct member of the very first exec context interface, we're precluding passing it back to other layers that may want some input. That's not to say I don't think it will work, because I actually think it would, but it seems inconsistent with how the rest of that interface goes together. I'm specifically talking about the parameter passed into the exec function which is either SimpleCPU or BaseDynInst. Those both have a traceData parameter and it can just be used directly without any fancy indirection. I am only talking about this code specifically. Also this still doesn't address the fact that, one, completeAcc is being called before initiateAcc has fully run, and two, that the trace information is being printed before initiateAcc has written all its results. Can't we just ensure that foo-traceData is set to NULL after it is deleted? Then the setData that's done in initiateAcc after the write won't complete. Also, looking at the code more closely, is there some reason taht we can't just move the setData line before the write? Wouldn't that solve the problem? It seems to me that the program wouldn't change since the contents of Mem should not change due to xc-write? (correct?) Sorry if I'm just being unhelpful. I don't know this code super well. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- === Korey L Sewell Computer Science Engineering University of Michigan
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
I'm specifically talking about the parameter passed into the exec function which is either SimpleCPU or BaseDynInst. Those both have a traceData parameter and it can just be used directly without any fancy indirection. I am only talking about this code specifically. I'm confused now. If we're passing it in, then it's a local variable on the stack and we can't set it to NULL externally. Right, I don't like this. If it's a local member of the exec context, it's not transparently modifiable by deeper layers of the interface. What does this mean? Instead of doing traceData, you do xc-traceData. Then someone could set destroy xc-traceData and set it to NULL and we'd be OK. If it's a function call, we're paying a bunch of function call overhead. Which of these do (do not) apply? This does not apply. Also, looking at the code more closely, is there some reason taht we can't just move the setData line before the write? Wouldn't that solve the problem? It seems to me that the program wouldn't change since the contents of Mem should not change due to xc-write? (correct?) That would ideally fix the segfault, aka the symptom, but it doesn't fix the core issue. The second half of initiateAcc won't be able to participate in the trace since traceData is dumped right before being deleted. That portion will also actually run -after- completeAcc, but only sometimes. That sounds like a source for a lot of confusion and bugs (like this one) to me. As far as moving around the code, that would be easier said than done. That's all generated and I don't know how easy it would be to get the isa_parser to generate something structured differently that would still make sense in all the other instructions of all the other ISAs. Steve? Ok, whatever the correct solution, I think we should minimally do these things: 1) Remove the parameter, 2) change traceData to xc-traceData 3) and surround uses of traceData with #if TRACING_ON Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Quoting nathan binkert n...@binkert.org: I'm specifically talking about the parameter passed into the exec function which is either SimpleCPU or BaseDynInst. Those both have a traceData parameter and it can just be used directly without any fancy indirection. I am only talking about this code specifically. I'm confused now. If we're passing it in, then it's a local variable on the stack and we can't set it to NULL externally. Right, I don't like this. If it's a local member of the exec context, it's not transparently modifiable by deeper layers of the interface. What does this mean? Instead of doing traceData, you do xc-traceData. Then someone could set destroy xc-traceData and set it to NULL and we'd be OK. A parallel example would be getting a pointer to the TLBs. In that case, you don't just have a TLB pointer member on the exec context, you have a function called something like getDTBPtr() that percolates to the right place that actually has the current value. That's just how this stuff has been put together up to now, I assume for a reason, and we'd be diverging from that. I don't know of any calamitous problem that would introduce, but we'd be partially undermining the flexibility we've built in for other values. 3) and surround uses of traceData with #if TRACING_ON I meant to mention it before, but I think this bit is very doable and makes a lot of sense. Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
One completely different solution would be to do what you're saying and just not have the timing simple CPU work like it does. I think it's organized a lot like the atomic simple CPU because it used to be a part of it, but really a more natural organization would be as a state machine where activity was broken by hard lines into states. There might be a little more overhead that way, but it would make it much easier for non-experts to see what the heck is going on in there. It would also help the experts avoid introducing bugs because there are so many interactions to consider. This seems pretty reasonable to me. There are already states in the models to deal with cache misses and such, so adding another state for the different phases of a memory access seems reasonable. The impact probably wouldn't be huge given the fact that the cache is probably pretty expensive compared to the CPU when executing memory ops. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
So a tentative plan would be: 1. Commit my patch for now so the bug is fixed. 2. Add the #if TRACING_ON. 3.5 Hash out what to do with traceData 3.5 Reorganize the timing simple CPU. Does anyone object? I guess I'd rather not do #1 if it isn't the solution because things like that have a tendency to live on. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Whew, I don't check my email for one afternoon and you guys make me pay for it... I'm still curious about the possibility of using an event to call the completeAcc() callback in the case where the translation fails on a store conditional. This solution seems particularly attractive to me if it's just a case of making this particular scenario follow the existing x86-tlb-miss path rather than the default path, in which case there shouldn't be much new code at all. Maybe it's a little more complicated than that if we have to import some of the x86-tlb-miss code into alpha, but it still doesn't seem that bad. I think Gabe misunderstood me; I'm not talking about doing this for all TLB hit responses, just this one specific case. There may well be a subtlety that makes this not as clean as I envision it, but to me it solves the root cause of the problem, which is that completeAcc() is getting called before initiateAcc() terminates, and does it in a way that doesn't clutter the logic in SimpleTimingCPU the way Gabe's current fix does. Adding in the #if TRACING_ON lines is a no-brainer but is really independent of this bug. I think moving the traceData variable into the exec context is not a bad idea (though not strongly compelling to me either), and does happen to have the side effect of addressing this particular bug. However I agree with Gabe that it's really a very limited solution and doesn't address the conceptual issue that caused this bug, which leaves us open to future similar bugs occurring. Maybe that will never happen, but it makes this a less compelling solution to the bug itself. But if we're going to do it anyway, it could work as a short-term fix for the bug. The SimpleTimingCPU is getting complex, but it's not obvious to me how adding more explicit states would clean it up significantly, particularly if we avoid adding Gabe's current patch (which does muddle up the control flow a bit more than it already is). I'm open to learning more about what people envision here though. Steve On Tue, Apr 28, 2009 at 4:05 PM, nathan binkert n...@binkert.org wrote: So a tentative plan would be: 1. Commit my patch for now so the bug is fixed. 2. Add the #if TRACING_ON. 3.5 Hash out what to do with traceData 3.5 Reorganize the timing simple CPU. Does anyone object? I guess I'd rather not do #1 if it isn't the solution because things like that have a tendency to live on. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Split Translation
On Tue, Apr 28, 2009 at 1:19 PM, nathan binkert n...@binkert.org wrote: 2) You split the translation if the address crosses blocks. It seems that it'd be better to check to see if the access crosses pages, not blocks to save the extra work. I was told I should use the peer block size because otherwise the peer would get upset. I have no problem with adjusting it to whatever size works. Well, I guess it depends if you treat split translations separately from split caches. I don't know the code well enough, but you could have one translation that required the cache accesses be split. I don't know how the translation code fits into the flow well enough to make a determination on this. I don't see much benefit in doing the split twice (once for page crossings and a second time for cache line crossings), though if there was a way to isolate the bulk of the code in a single function that gets called twice it wouldn't be too bad. There is some merit in only calling the TLB once on a single-page cache-line-split transaction though, particularly because calling it twice in this case will make your TLB access and hit rates artificially high. Doing two separate split checks might be the easiest way of doing this, or there could be a separate test on cache splits that copies the translation from one request to the other instead of calling the TLB a second time if the two accesses are on the same page. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Split Translation
I don't see much benefit in doing the split twice (once for page crossings and a second time for cache line crossings), though if there was a way to isolate the bulk of the code in a single function that gets called twice it wouldn't be too bad. There is some merit in only calling the TLB once on a single-page cache-line-split transaction though, particularly because calling it twice in this case will make your TLB access and hit rates artificially high. Doing two separate split checks might be the easiest way of doing this, or there could be a separate test on cache splits that copies the translation from one request to the other instead of calling the TLB a second time if the two accesses are on the same page. I was mostly referring to the realism. If unaligned accesses are indeed very fast on modern x86, then people might intentionally start to use them even more. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Steve Reinhardt wrote: Whew, I don't check my email for one afternoon and you guys make me pay for it... I'm still curious about the possibility of using an event to call the completeAcc() callback in the case where the translation fails on a store conditional. This solution seems particularly attractive to me if it's just a case of making this particular scenario follow the existing x86-tlb-miss path rather than the default path, in which case there shouldn't be much new code at all. Maybe it's a little more complicated than that if we have to import some of the x86-tlb-miss code into alpha, but it still doesn't seem that bad. I think Gabe misunderstood me; I'm not talking about doing this for all TLB hit responses, just this one specific case. There may well be a subtlety that makes this not as clean as I envision it, but to me it solves the root cause of the problem, which is that completeAcc() is getting called before initiateAcc() terminates, and does it in a way that doesn't clutter the logic in SimpleTimingCPU the way Gabe's current fix does. Adding in the #if TRACING_ON lines is a no-brainer but is really independent of this bug. I think moving the traceData variable into the exec context is not a bad idea (though not strongly compelling to me either), and does happen to have the side effect of addressing this particular bug. However I agree with Gabe that it's really a very limited solution and doesn't address the conceptual issue that caused this bug, which leaves us open to future similar bugs occurring. Maybe that will never happen, but it makes this a less compelling solution to the bug itself. But if we're going to do it anyway, it could work as a short-term fix for the bug. The SimpleTimingCPU is getting complex, but it's not obvious to me how adding more explicit states would clean it up significantly, particularly if we avoid adding Gabe's current patch (which does muddle up the control flow a bit more than it already is). I'm open to learning more about what people envision here though. I'm not sure if it will be hugely better. I'd have to just do it and see how it comes out, I suppose. Wouldn't scheduling an event be similar to what I have, just using the event queue instead of the new packet pointer? I could imagine it being better, actually, so I'm tentatively convinced. There shouldn't be a need to move any x86 code into Alpha to make that work, but I wouldn't swear to that. Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
On Tue, Apr 28, 2009 at 4:43 PM, Gabe Black gbl...@eecs.umich.edu wrote: I'm not sure if it will be hugely better. I'd have to just do it and see how it comes out, I suppose. Wouldn't scheduling an event be similar to what I have, just using the event queue instead of the new packet pointer? Yea, at a certain level, the effect is similar--just deferring the callback until later--which is not bad, since that's what addresses the problem. I think that using the existing event queue mechanism (which is explicitly defined for deferring stuff until later) is less convoluted than adding a handful of new variables and control code to cover just this one specific situation. I could imagine it being better, actually, so I'm tentatively convinced. There shouldn't be a need to move any x86 code into Alpha to make that work, but I wouldn't swear to that. Thanks for giving it a try. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev