Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Thanks to Brad's prompting, I took another look at this tonight. I think the fundamental problem is that the transition to split-transaction translation didn't move all the operations that depend on the translation completing into the translation callback or something that was guaranteed to run after the translation callback. The req-isUncacheable() test that Brad moved around in his recent patch (which I commented on just a few minutes ago) is one example of that. Another case is that we're calling traceData-setData() in read() even though the translation may not be done yet (how does that work? are we calling setData() again when the real data comes back, so we haven't noticed that this is broken?). The other traceData examples (like setAddr() or setData() in write()) are more like anti-dependences; there's no direct dependence on the translation finishing, but there is a dependence on it not deleting the traceData object. So in other words the problem isn't that the finishTranslation() or completeAcc() calls are happening too soon, it's that the code that runs after the translation is initiated in read()/write() (or in initiateAcc() from which it is called) should not be making assumptions about whether finishTranslation() or completeAcc() has or has not been called. I think it's pretty easy to fix this, in some cases by moving independent code up above where the translation is initiated, and in other cases by moving dependent code into finishTranslation(). Interestingly, there are a number of cases where we trace the store data twice, once in the CPU's write() function and again in the ISA's initiateAcc() function. I propose to just get rid of the latter. As I mentioned in the other email, I'm tempted to just get rid of the recordEvent() calls, unless someone actually uses these. I see that Nate created the basic recordEvent() infrastructure in 2004, and right now it's only uncached accesses and faults that get recorded, which seems very ad hoc and irregular. I'll try and make these changes and test them as soon as I can. I thought I'd send this note out first though in case anyone has feedback before I get started. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
It's been a while since I've looked at this, but I want to make sure I remember to respond so I don't want to wait until I have a chance to re-research all this. Isn't the problem that initiateAcc ends up calling completeAcc mid-function, that cleans up after the access, then initateAcc gets control back and tries to finish initiating the access? The problem to me seems to be that initiateAcc doesn't know whether it's just letting the rest of the CPU know something needs to happen later, or if it's actually causing that thing to happen right away. The other part, completeAcc, doesn't know if it's happening in the middle of initiateAcc, or if its in a callback. If it's in initiateAcc there are things it (or maybe initiateAcc, is that what you're saying?) shouldn't touch. If it's in a callback, it's likely the last thing to handle the instruction/translation/packet/whatever, so it needs to be sure to clean up everything. These seem like contradictory responsibilities. What I had been advocating before (if I remember right) was to disambiguate what the circumstances each function can expect by breaking up the call chains so that the functions don't end up nested inside each other unpredictably. Like I said, though, I'm making sure I respond rather than getting my facts straight, so please let me know if I'm missing your point (likely) or misunderstanding something. Gabe Steve Reinhardt wrote: Thanks to Brad's prompting, I took another look at this tonight. I think the fundamental problem is that the transition to split-transaction translation didn't move all the operations that depend on the translation completing into the translation callback or something that was guaranteed to run after the translation callback. The req-isUncacheable() test that Brad moved around in his recent patch (which I commented on just a few minutes ago) is one example of that. Another case is that we're calling traceData-setData() in read() even though the translation may not be done yet (how does that work? are we calling setData() again when the real data comes back, so we haven't noticed that this is broken?). The other traceData examples (like setAddr() or setData() in write()) are more like anti-dependences; there's no direct dependence on the translation finishing, but there is a dependence on it not deleting the traceData object. So in other words the problem isn't that the finishTranslation() or completeAcc() calls are happening too soon, it's that the code that runs after the translation is initiated in read()/write() (or in initiateAcc() from which it is called) should not be making assumptions about whether finishTranslation() or completeAcc() has or has not been called. I think it's pretty easy to fix this, in some cases by moving independent code up above where the translation is initiated, and in other cases by moving dependent code into finishTranslation(). Interestingly, there are a number of cases where we trace the store data twice, once in the CPU's write() function and again in the ISA's initiateAcc() function. I propose to just get rid of the latter. As I mentioned in the other email, I'm tempted to just get rid of the recordEvent() calls, unless someone actually uses these. I see that Nate created the basic recordEvent() infrastructure in 2004, and right now it's only uncached accesses and faults that get recorded, which seems very ad hoc and irregular. I'll try and make these changes and test them as soon as I can. I thought I'd send this note out first though in case anyone has feedback before I get started. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Hi Gabe, On Sun, Mar 21, 2010 at 5:22 PM, Gabe Black gbl...@eecs.umich.edu wrote: It's been a while since I've looked at this, but I want to make sure I remember to respond so I don't want to wait until I have a chance to re-research all this. Isn't the problem that initiateAcc ends up calling completeAcc mid-function, that cleans up after the access, then initateAcc gets control back and tries to finish initiating the access? Yes and no... you're right that that's how we viewed it before (so your memory is good), and that's one way of looking at it, and it's not incorrect. I'm saying that if we look at the problem differently I think there's an easier solution. The problem to me seems to be that initiateAcc doesn't know whether it's just letting the rest of the CPU know something needs to happen later, or if it's actually causing that thing to happen right away. The other part, completeAcc, doesn't know if it's happening in the middle of initiateAcc, or if its in a callback. If it's in initiateAcc there are things it (or maybe initiateAcc, is that what you're saying?) shouldn't touch. If it's in a callback, it's likely the last thing to handle the instruction/translation/packet/whatever, so it needs to be sure to clean up everything. These seem like contradictory responsibilities. You've got the right idea there... basically initiateAcc() actually initiates the acc when it calls read() or write(), and it can't count on the translation being complete when read()/write() returns, but conversely it also shouldn't count on the translation *not* being complete when read()/write() returns. So there really shouldn't be any code after read()/write() that should care one way or the other. If we get rid of the code that does, then we'll solve the problem without adding any complexity to the control flow. As a specific example, the one thing that appears to give us trouble in initiateAcc() is this: fault = xc-write((uint%(mem_acc_size)d_t)Mem, EA, memAccessFlags, NULL); if (traceData) { traceData-setData(Mem); } ...and the root problem is that if the access completes in write(), traceData may not be valid any more. The ironic thing here is that (1) the value of Mem doesn't depend on write(), so the line could be moved before the call; and (2) traceData-setData() is also called in write() itself, so this whole line is redundant. So we can fix this specific problem by either (1) moving the traceData line above write() or (2) preferably, deleting the line entirely. Similarly, read()/write() initiate the translation when they call translateTiming(), and there shouldn't be any code in read()/write() after that point that depends on the translation having completed *or not*. Interestingly there is code (the req-isUncacheable() check, and a setData() call in read()) that seems like it requires the translation to have completed, which is just as incorrect. That code should be moved somewhere else (like finishTranslation()) where it can be guaranteed that the translation has finished. What I had been advocating before (if I remember right) was to disambiguate what the circumstances each function can expect by breaking up the call chains so that the functions don't end up nested inside each other unpredictably. Like I said, though, I'm making sure I respond rather than getting my facts straight, so please let me know if I'm missing your point (likely) or misunderstanding something. That would work, but this approach is simpler since it doesn't depend on figuring out what circumstances you're in. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Ok. This sounds plausible to me, and I'm glad it's not going to be so much work. I'm sure you already thought of this, but along the lines of what we were talking about in that other thread, something that important and subtle will deserve a big obvious comment or two. Gabe Steve Reinhardt wrote: Hi Gabe, On Sun, Mar 21, 2010 at 5:22 PM, Gabe Black gbl...@eecs.umich.edu wrote: It's been a while since I've looked at this, but I want to make sure I remember to respond so I don't want to wait until I have a chance to re-research all this. Isn't the problem that initiateAcc ends up calling completeAcc mid-function, that cleans up after the access, then initateAcc gets control back and tries to finish initiating the access? Yes and no... you're right that that's how we viewed it before (so your memory is good), and that's one way of looking at it, and it's not incorrect. I'm saying that if we look at the problem differently I think there's an easier solution. The problem to me seems to be that initiateAcc doesn't know whether it's just letting the rest of the CPU know something needs to happen later, or if it's actually causing that thing to happen right away. The other part, completeAcc, doesn't know if it's happening in the middle of initiateAcc, or if its in a callback. If it's in initiateAcc there are things it (or maybe initiateAcc, is that what you're saying?) shouldn't touch. If it's in a callback, it's likely the last thing to handle the instruction/translation/packet/whatever, so it needs to be sure to clean up everything. These seem like contradictory responsibilities. You've got the right idea there... basically initiateAcc() actually initiates the acc when it calls read() or write(), and it can't count on the translation being complete when read()/write() returns, but conversely it also shouldn't count on the translation *not* being complete when read()/write() returns. So there really shouldn't be any code after read()/write() that should care one way or the other. If we get rid of the code that does, then we'll solve the problem without adding any complexity to the control flow. As a specific example, the one thing that appears to give us trouble in initiateAcc() is this: fault = xc-write((uint%(mem_acc_size)d_t)Mem, EA, memAccessFlags, NULL); if (traceData) { traceData-setData(Mem); } ...and the root problem is that if the access completes in write(), traceData may not be valid any more. The ironic thing here is that (1) the value of Mem doesn't depend on write(), so the line could be moved before the call; and (2) traceData-setData() is also called in write() itself, so this whole line is redundant. So we can fix this specific problem by either (1) moving the traceData line above write() or (2) preferably, deleting the line entirely. Similarly, read()/write() initiate the translation when they call translateTiming(), and there shouldn't be any code in read()/write() after that point that depends on the translation having completed *or not*. Interestingly there is code (the req-isUncacheable() check, and a setData() call in read()) that seems like it requires the translation to have completed, which is just as incorrect. That code should be moved somewhere else (like finishTranslation()) where it can be guaranteed that the translation has finished. What I had been advocating before (if I remember right) was to disambiguate what the circumstances each function can expect by breaking up the call chains so that the functions don't end up nested inside each other unpredictably. Like I said, though, I'm making sure I respond rather than getting my facts straight, so please let me know if I'm missing your point (likely) or misunderstanding something. That would work, but this approach is simpler since it doesn't depend on figuring out what circumstances you're in. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
OK, thinking about this a bit more, it seems like the root of the problem is that the timing translation is calling the callback right away instead of deferring it to a later cycle. I assume this only happens on a TLB miss for a store conditional? Is there any other path by which the completeAcc callback can be invoked before the initiateAcc completes? I agree that this seems like a somewhat heavyweight solution for this rare case. If this is the only scenario, what if we create a new event in the TLB to call the completeAcc callback after delaying for the TLB access latency? Presumably there's already an event that does this in the more common cases, so maybe that can be tweaked to handle this case too. I haven't actually looked at the TLB code but thought I'd throw the idea out anyway. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
This bug happens when both the translation component and the memory access component finish inline before returning. When that happens, completeAcc is effectively called from initiateAcc and you get into trouble. The translation component will complete inline in all cases except TLB misses in X86. It will complete inline in all cases with any other ISA. Since this bug came up on Alpha, that part of things is unavoidable right now. The second part comes in when the memory access component essentially doesn't happen. I'm a little fuzzy on the details, but if a store conditional fails, the CPU can skip talking to the memory access and go directly to completeDataAccess. The overhead my change would add is two writes to a bool and two ifs, and I would have to imagine that's a lot more lightweight than creating, initializing, and scheduling a new event for all types of translation. My change also applies only once to each memory instruction where some unaligned accesses may be translated in two chunks. We could probably streamline things a little and only check if we're in initiateAcc on code paths that skip going to memory. Gabe Quoting Steve Reinhardt ste...@gmail.com: OK, thinking about this a bit more, it seems like the root of the problem is that the timing translation is calling the callback right away instead of deferring it to a later cycle. I assume this only happens on a TLB miss for a store conditional? Is there any other path by which the completeAcc callback can be invoked before the initiateAcc completes? I agree that this seems like a somewhat heavyweight solution for this rare case. If this is the only scenario, what if we create a new event in the TLB to call the completeAcc callback after delaying for the TLB access latency? Presumably there's already an event that does this in the more common cases, so maybe that can be tweaked to handle this case too. I haven't actually looked at the TLB code but thought I'd throw the idea out anyway. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
This bug happens when both the translation component and the memory access component finish inline before returning. When that happens, completeAcc is effectively called from initiateAcc and you get into trouble. The translation component will complete inline in all cases except TLB misses in X86. It will complete inline in all cases with any other ISA. Since this bug came up on Alpha, that part of things is unavoidable right now. The second part comes in when the memory access component essentially doesn't happen. I'm a little fuzzy on the details, but if a store conditional fails, the CPU can skip talking to the memory access and go directly to completeDataAccess. The overhead my change would add is two writes to a bool and two ifs, and I would have to imagine that's a lot more lightweight than creating, initializing, and scheduling a new event for all types of translation. My change also applies only once to each memory instruction where some unaligned accesses may be translated in two chunks. We could probably streamline things a little and only check if we're in initiateAcc on code paths that skip going to memory. What if you use my change instead? (Get rid of the traceData function parameter and use xc-traceData) Incidentally, there's a bunch of if (traceData) ... in the generated code and it's not wrapped with #if TRACING_ON Would be nice if that could be changed as part of this too. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Quoting nathan binkert n...@binkert.org: I think the difference is that passing it in explicitly means that you could keep it in a register for the duration of the function, and if you can put it in a callee-save then maybe you could save a few dereferences. In contrast the indirection would have to be repeated after every function call (which is exactly the behavior we'd be relying on here). Wouldn't we need to make it xc-traceData() to allow for all the redirection we've got in there? What redirection? Or do you mean because traceData is protected in the simple CPU? You could, but then again, you could just make it public. Otherwise, I'm not sure what you're talking about. The execute function is generated for each cpu model separately and it knows the type. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev I'm talking about how there's the exec context, and then the proxy exec context, and the thread context, and the simple thread context, and the dyninst, and the etc. etc., and calls to, for instance, read a register can travel through several layers before actually being serviced. By making traceData a direct member of the very first exec context interface, we're precluding passing it back to other layers that may want some input. That's not to say I don't think it will work, because I actually think it would, but it seems inconsistent with how the rest of that interface goes together. Also this still doesn't address the fact that, one, completeAcc is being called before initiateAcc has fully run, and two, that the trace information is being printed before initiateAcc has written all its results. Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
What redirection? Or do you mean because traceData is protected in the simple CPU? You could, but then again, you could just make it public. Otherwise, I'm not sure what you're talking about. The execute function is generated for each cpu model separately and it knows the type. I'm talking about how there's the exec context, and then the proxy exec context, and the thread context, and the simple thread context, and the dyninst, and the etc. etc., and calls to, for instance, read a register can travel through several layers before actually being serviced. By making traceData a direct member of the very first exec context interface, we're precluding passing it back to other layers that may want some input. That's not to say I don't think it will work, because I actually think it would, but it seems inconsistent with how the rest of that interface goes together. I'm specifically talking about the parameter passed into the exec function which is either SimpleCPU or BaseDynInst. Those both have a traceData parameter and it can just be used directly without any fancy indirection. I am only talking about this code specifically. Also this still doesn't address the fact that, one, completeAcc is being called before initiateAcc has fully run, and two, that the trace information is being printed before initiateAcc has written all its results. Can't we just ensure that foo-traceData is set to NULL after it is deleted? Then the setData that's done in initiateAcc after the write won't complete. Also, looking at the code more closely, is there some reason taht we can't just move the setData line before the write? Wouldn't that solve the problem? It seems to me that the program wouldn't change since the contents of Mem should not change due to xc-write? (correct?) Sorry if I'm just being unhelpful. I don't know this code super well. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
So my original point (that obviously wasnt conveyed very well) was more trying to re-work how this STL_C case works rather than kind of patch it up so it works in the current framework. Seems to be me that completeAcc should never get called before initiateAcc in any circumstance. I know that shortcut is because you STL_C fails early, but even so, why cant it just finish like any other instruction. What benefit are we gettiing from early completing it (especially in a SimpleCPU)? Am I getting this fundamental issue? If I'm not getting that wrong, then we can just fix the SimpleCPU to not handle this special case in this fashion and all will be fine. So trying to play around with traceData in my opinion is the wrong way to solve it. If you are going to do that, Nate's idea of writing the traceData before the access sounds like it will work.But Gabe's way works too, so whatever works there is fine with me. Long term, I'm just thinking that if I call initiateAcc and there is a risk of completeAcc happening before it finishes seems like it breaks the interface that the CPU/INST should use. And then in the future, this is going to be a nasty bug that will trip somebody up (because completeAcc might do something that screws things up early). On Tue, Apr 28, 2009 at 5:34 PM, nathan binkert n...@binkert.org wrote: What redirection? Or do you mean because traceData is protected in the simple CPU? You could, but then again, you could just make it public. Otherwise, I'm not sure what you're talking about. The execute function is generated for each cpu model separately and it knows the type. I'm talking about how there's the exec context, and then the proxy exec context, and the thread context, and the simple thread context, and the dyninst, and the etc. etc., and calls to, for instance, read a register can travel through several layers before actually being serviced. By making traceData a direct member of the very first exec context interface, we're precluding passing it back to other layers that may want some input. That's not to say I don't think it will work, because I actually think it would, but it seems inconsistent with how the rest of that interface goes together. I'm specifically talking about the parameter passed into the exec function which is either SimpleCPU or BaseDynInst. Those both have a traceData parameter and it can just be used directly without any fancy indirection. I am only talking about this code specifically. Also this still doesn't address the fact that, one, completeAcc is being called before initiateAcc has fully run, and two, that the trace information is being printed before initiateAcc has written all its results. Can't we just ensure that foo-traceData is set to NULL after it is deleted? Then the setData that's done in initiateAcc after the write won't complete. Also, looking at the code more closely, is there some reason taht we can't just move the setData line before the write? Wouldn't that solve the problem? It seems to me that the program wouldn't change since the contents of Mem should not change due to xc-write? (correct?) Sorry if I'm just being unhelpful. I don't know this code super well. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- === Korey L Sewell Computer Science Engineering University of Michigan ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
As far as moving around the code, that would be easier said than done. That's all generated and I don't know how easy it would be to get the isa_parser to generate something structured differently that would still make sense in all the other instructions of all the other ISAs. Steve? I was mentioning something like this earlier, but in arch/alpha/mem.isa the initiateAcc code is generated from the LoadStore::initiateAcc template. If you just give Store conditionals its own template for initiateAcc (like it already has for completeAcc), I think that will be a fix too (assuming you set the tracedata early or handle it some other way)... -- === Korey L Sewell Computer Science Engineering University of Michigan ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Quoting Korey Sewell ksew...@umich.edu: So my original point (that obviously wasnt conveyed very well) was more trying to re-work how this STL_C case works rather than kind of patch it up so it works in the current framework. Seems to be me that completeAcc should never get called before initiateAcc in any circumstance. I know that shortcut is because you STL_C fails early, but even so, why cant it just finish like any other instruction. What benefit are we gettiing from early completing it (especially in a SimpleCPU)? Am I getting this fundamental issue? If I'm not getting that wrong, then we can just fix the SimpleCPU to not handle this special case in this fashion and all will be fine. So trying to play around with traceData in my opinion is the wrong way to solve it. If you are going to do that, Nate's idea of writing the traceData before the access sounds like it will work.But Gabe's way works too, so whatever works there is fine with me. Long term, I'm just thinking that if I call initiateAcc and there is a risk of completeAcc happening before it finishes seems like it breaks the interface that the CPU/INST should use. And then in the future, this is going to be a nasty bug that will trip somebody up (because completeAcc might do something that screws things up early). I agree with everything you've said here, except for one half thing and I've got a comment on one other. First, my change is actually doing what your saying. It's not mucking around with traceData, it's actually putting off calling completeDataAccess (which calls completeAcc) if we're in the middle of initiateAcc until we come out of it. That's the half. The other thing is the reason STC doesn't work like the other memory instructions. In those other cases where you're calling into memory, you're going to get a response some time later. When that happens, initiateAcc has since finished running and you can end up in completeAcc and be ok. When you don't have a delay in translation or when you go to memory, then there's no later point where you'll be doing anything for that instruction and you need to deal with it now. One completely different solution would be to do what you're saying and just not have the timing simple CPU work like it does. I think it's organized a lot like the atomic simple CPU because it used to be a part of it, but really a more natural organization would be as a state machine where activity was broken by hard lines into states. There might be a little more overhead that way, but it would make it much easier for non-experts to see what the heck is going on in there. It would also help the experts avoid introducing bugs because there are so many interactions to consider. Gabe On Tue, Apr 28, 2009 at 5:34 PM, nathan binkert n...@binkert.org wrote: What redirection? Or do you mean because traceData is protected in the simple CPU? You could, but then again, you could just make it public. Otherwise, I'm not sure what you're talking about. The execute function is generated for each cpu model separately and it knows the type. I'm talking about how there's the exec context, and then the proxy exec context, and the thread context, and the simple thread context, and the dyninst, and the etc. etc., and calls to, for instance, read a register can travel through several layers before actually being serviced. By making traceData a direct member of the very first exec context interface, we're precluding passing it back to other layers that may want some input. That's not to say I don't think it will work, because I actually think it would, but it seems inconsistent with how the rest of that interface goes together. I'm specifically talking about the parameter passed into the exec function which is either SimpleCPU or BaseDynInst. Those both have a traceData parameter and it can just be used directly without any fancy indirection. I am only talking about this code specifically. Also this still doesn't address the fact that, one, completeAcc is being called before initiateAcc has fully run, and two, that the trace information is being printed before initiateAcc has written all its results. Can't we just ensure that foo-traceData is set to NULL after it is deleted? Then the setData that's done in initiateAcc after the write won't complete. Also, looking at the code more closely, is there some reason taht we can't just move the setData line before the write? Wouldn't that solve the problem? It seems to me that the program wouldn't change since the contents of Mem should not change due to xc-write? (correct?) Sorry if I'm just being unhelpful. I don't know this code super well. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- === Korey L Sewell Computer Science Engineering University of Michigan
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
I'm specifically talking about the parameter passed into the exec function which is either SimpleCPU or BaseDynInst. Those both have a traceData parameter and it can just be used directly without any fancy indirection. I am only talking about this code specifically. I'm confused now. If we're passing it in, then it's a local variable on the stack and we can't set it to NULL externally. Right, I don't like this. If it's a local member of the exec context, it's not transparently modifiable by deeper layers of the interface. What does this mean? Instead of doing traceData, you do xc-traceData. Then someone could set destroy xc-traceData and set it to NULL and we'd be OK. If it's a function call, we're paying a bunch of function call overhead. Which of these do (do not) apply? This does not apply. Also, looking at the code more closely, is there some reason taht we can't just move the setData line before the write? Wouldn't that solve the problem? It seems to me that the program wouldn't change since the contents of Mem should not change due to xc-write? (correct?) That would ideally fix the segfault, aka the symptom, but it doesn't fix the core issue. The second half of initiateAcc won't be able to participate in the trace since traceData is dumped right before being deleted. That portion will also actually run -after- completeAcc, but only sometimes. That sounds like a source for a lot of confusion and bugs (like this one) to me. As far as moving around the code, that would be easier said than done. That's all generated and I don't know how easy it would be to get the isa_parser to generate something structured differently that would still make sense in all the other instructions of all the other ISAs. Steve? Ok, whatever the correct solution, I think we should minimally do these things: 1) Remove the parameter, 2) change traceData to xc-traceData 3) and surround uses of traceData with #if TRACING_ON Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Quoting nathan binkert n...@binkert.org: I'm specifically talking about the parameter passed into the exec function which is either SimpleCPU or BaseDynInst. Those both have a traceData parameter and it can just be used directly without any fancy indirection. I am only talking about this code specifically. I'm confused now. If we're passing it in, then it's a local variable on the stack and we can't set it to NULL externally. Right, I don't like this. If it's a local member of the exec context, it's not transparently modifiable by deeper layers of the interface. What does this mean? Instead of doing traceData, you do xc-traceData. Then someone could set destroy xc-traceData and set it to NULL and we'd be OK. A parallel example would be getting a pointer to the TLBs. In that case, you don't just have a TLB pointer member on the exec context, you have a function called something like getDTBPtr() that percolates to the right place that actually has the current value. That's just how this stuff has been put together up to now, I assume for a reason, and we'd be diverging from that. I don't know of any calamitous problem that would introduce, but we'd be partially undermining the flexibility we've built in for other values. 3) and surround uses of traceData with #if TRACING_ON I meant to mention it before, but I think this bit is very doable and makes a lot of sense. Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
One completely different solution would be to do what you're saying and just not have the timing simple CPU work like it does. I think it's organized a lot like the atomic simple CPU because it used to be a part of it, but really a more natural organization would be as a state machine where activity was broken by hard lines into states. There might be a little more overhead that way, but it would make it much easier for non-experts to see what the heck is going on in there. It would also help the experts avoid introducing bugs because there are so many interactions to consider. This seems pretty reasonable to me. There are already states in the models to deal with cache misses and such, so adding another state for the different phases of a memory access seems reasonable. The impact probably wouldn't be huge given the fact that the cache is probably pretty expensive compared to the CPU when executing memory ops. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
So a tentative plan would be: 1. Commit my patch for now so the bug is fixed. 2. Add the #if TRACING_ON. 3.5 Hash out what to do with traceData 3.5 Reorganize the timing simple CPU. Does anyone object? I guess I'd rather not do #1 if it isn't the solution because things like that have a tendency to live on. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Whew, I don't check my email for one afternoon and you guys make me pay for it... I'm still curious about the possibility of using an event to call the completeAcc() callback in the case where the translation fails on a store conditional. This solution seems particularly attractive to me if it's just a case of making this particular scenario follow the existing x86-tlb-miss path rather than the default path, in which case there shouldn't be much new code at all. Maybe it's a little more complicated than that if we have to import some of the x86-tlb-miss code into alpha, but it still doesn't seem that bad. I think Gabe misunderstood me; I'm not talking about doing this for all TLB hit responses, just this one specific case. There may well be a subtlety that makes this not as clean as I envision it, but to me it solves the root cause of the problem, which is that completeAcc() is getting called before initiateAcc() terminates, and does it in a way that doesn't clutter the logic in SimpleTimingCPU the way Gabe's current fix does. Adding in the #if TRACING_ON lines is a no-brainer but is really independent of this bug. I think moving the traceData variable into the exec context is not a bad idea (though not strongly compelling to me either), and does happen to have the side effect of addressing this particular bug. However I agree with Gabe that it's really a very limited solution and doesn't address the conceptual issue that caused this bug, which leaves us open to future similar bugs occurring. Maybe that will never happen, but it makes this a less compelling solution to the bug itself. But if we're going to do it anyway, it could work as a short-term fix for the bug. The SimpleTimingCPU is getting complex, but it's not obvious to me how adding more explicit states would clean it up significantly, particularly if we avoid adding Gabe's current patch (which does muddle up the control flow a bit more than it already is). I'm open to learning more about what people envision here though. Steve On Tue, Apr 28, 2009 at 4:05 PM, nathan binkert n...@binkert.org wrote: So a tentative plan would be: 1. Commit my patch for now so the bug is fixed. 2. Add the #if TRACING_ON. 3.5 Hash out what to do with traceData 3.5 Reorganize the timing simple CPU. Does anyone object? I guess I'd rather not do #1 if it isn't the solution because things like that have a tendency to live on. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Steve Reinhardt wrote: Whew, I don't check my email for one afternoon and you guys make me pay for it... I'm still curious about the possibility of using an event to call the completeAcc() callback in the case where the translation fails on a store conditional. This solution seems particularly attractive to me if it's just a case of making this particular scenario follow the existing x86-tlb-miss path rather than the default path, in which case there shouldn't be much new code at all. Maybe it's a little more complicated than that if we have to import some of the x86-tlb-miss code into alpha, but it still doesn't seem that bad. I think Gabe misunderstood me; I'm not talking about doing this for all TLB hit responses, just this one specific case. There may well be a subtlety that makes this not as clean as I envision it, but to me it solves the root cause of the problem, which is that completeAcc() is getting called before initiateAcc() terminates, and does it in a way that doesn't clutter the logic in SimpleTimingCPU the way Gabe's current fix does. Adding in the #if TRACING_ON lines is a no-brainer but is really independent of this bug. I think moving the traceData variable into the exec context is not a bad idea (though not strongly compelling to me either), and does happen to have the side effect of addressing this particular bug. However I agree with Gabe that it's really a very limited solution and doesn't address the conceptual issue that caused this bug, which leaves us open to future similar bugs occurring. Maybe that will never happen, but it makes this a less compelling solution to the bug itself. But if we're going to do it anyway, it could work as a short-term fix for the bug. The SimpleTimingCPU is getting complex, but it's not obvious to me how adding more explicit states would clean it up significantly, particularly if we avoid adding Gabe's current patch (which does muddle up the control flow a bit more than it already is). I'm open to learning more about what people envision here though. I'm not sure if it will be hugely better. I'd have to just do it and see how it comes out, I suppose. Wouldn't scheduling an event be similar to what I have, just using the event queue instead of the new packet pointer? I could imagine it being better, actually, so I'm tentatively convinced. There shouldn't be a need to move any x86 code into Alpha to make that work, but I wouldn't swear to that. Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
On Tue, Apr 28, 2009 at 4:43 PM, Gabe Black gbl...@eecs.umich.edu wrote: I'm not sure if it will be hugely better. I'd have to just do it and see how it comes out, I suppose. Wouldn't scheduling an event be similar to what I have, just using the event queue instead of the new packet pointer? Yea, at a certain level, the effect is similar--just deferring the callback until later--which is not bad, since that's what addresses the problem. I think that using the existing event queue mechanism (which is explicitly defined for deferring stuff until later) is less convoluted than adding a handful of new variables and control code to cover just this one specific situation. I could imagine it being better, actually, so I'm tentatively convinced. There shouldn't be a need to move any x86 code into Alpha to make that work, but I wouldn't swear to that. Thanks for giving it a try. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Here is a hypothetical series of events that should hopefully illustrate what's going on. 1. The store conditional comes up for execution and its initiateAcc is called. 2. initiateAcc calls xc-write() to actually do the required store. 3. write calls into the TLB's translateTiming function to do the translation. 4. Alpha's translateTiming is just a wrapper around translateAtomic, so it calls the callback in place. 5. The callback in the CPU recognizes the store conditional failed and calls completeAcc to finish up the instruction. The callback also deletes extra data associated with the instruction including traceData. At this point, the call stack looks approximately like: initiateAcc-write-translateTiming-callback Note that initiateAcc hasn't been returned to so it hasn't had a chance to finish. Then: 6. The various functions return until we're back to initiateAcc. 7. initiateAcc uses its local (and now invalid) copy of the traceData pointer to store the result of the instruction executing. 8. BOOM segfault. What my patch does is set a flag right before calling initiateAcc. If that flag is still set once you get to the completeDataAccess function, which is ultimately where the instruction is finished, it doesn't actually do anything then. Instead it tucks away the arguements of the function and returns, allowing initiateAcc to finish. After it finishes, then it checks whether completeDataAccess was skipped, and if so calls it then. That way, completeDataAccess isn't called until after initiateAcc is done which is what instructions expect. The overhead I was talking about is from two sources. The first is that that flag is set and cleared every time initiateAcc is called. Second, completeDataAccess checks that flag every time it's called. These are not large overheads but they're on a hot path. Gabe Quoting Korey Sewell ksew...@umich.edu: Hi Gabe, I'm trying to figure out what's going and help find a potentially better solution (if there is one), since I'm operating under the assumption that adding a small overhead for the common case memory access to support the special case (stl_c) should be avoided if possible. (SIDEBAR: In looking at the code another pass, I should point out it looks like timing.hh/atomic.hh are in need of some comments on virtually all of the function declarations.) So anyway, I'll try to take another stab at helping out, but if nobody else thinks this is a big deal, then that's fine with me. === From your description of the problem Gabe, you are saying that when a STL_C gets called, the problems stem from the initiateAcc function which tries to call a member function on traceData which was deleted during the cleanup. The patch you posted seems to fiddle around with the completionPkt more than the traceData so like I said I'm trying to follow you and figure out why we cant make this a special case solution instead of a wrapper (w/the inInitiateAcc flag) on every access. The initiateAcc code is listed here: Fault Stl_c::initiateAcc(AtomicSimpleCPU *xc, Trace::InstRecord *traceData) const { if (fault == NoFault) { fault = xc-write((uint32_t)Mem, EA, memAccessFlags, NULL); if (traceData) { traceData-setData(Mem); } } return fault; } I think you are also saying that the xc-write() triggers a split TLB translation (sendSplitData) that eventually calls completeDataAccess and then completeAcc? Then, I think you are saying that the code falls back to the initiateAcc and tries: if (traceData) { traceData-setData(Mem); } Which potentially wont be valid since you deleted in the translation fault function (though I think that should get caught anyway). Is that the correct way to interpret your problem? What am I missing here? Thanks, Korey On Sun, Apr 26, 2009 at 4:32 PM, Gabe Black gbl...@eecs.umich.edu wrote: I should add, I realize it's not very clear what's going on, but it would take a lot more effort to explain everything in enough detail to get the problem across than for you (or anyone else) to look at the code. If you do look into the code, it's not a functional problem as far as how the timing CPU works or organizes events per se. It's a C++ problem where things may or may not end up being called from the middle of each other. Gabe Gabe Black wrote: You should read my description, read the code, and then once you understand the problem comment on my solution. Gabe Korey Sewell wrote: Yea, I was on that thread. Your explanation wasnt particularly in great detail there (files?, line #s?) so its hard to follow without getting your hand dirty on where this code is you are referring to. I'll comment and you can clarify my assumptions... Oooh. I see what's broken. This is a result of my changes to allow delaying
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
initiateAcc looks something like the following: Fault initiateAcc(..., traceData,...) { ... write(); traceData becomes invalid in here if (traceData) traceData-setData(Mem); ... } so I'm not sure how that would work. Gabe Quoting nathan binkert n...@binkert.org: 7. initiateAcc uses its local (and now invalid) copy of the traceData pointer to store the result of the instruction executing. What if you just read this pointer again and not store a local copy? Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
initiateAcc looks something like the following: Fault initiateAcc(..., traceData,...) { ... write(); traceData becomes invalid in here if (traceData) traceData-setData(Mem); ... } so I'm not sure how that would work. Why is traceData passed in as a parameter and not just grabbed from whatever the first parameter of the function is (i.e. the dyn inst or the CPU object)? Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Quoting nathan binkert n...@binkert.org: initiateAcc looks something like the following: Fault initiateAcc(..., traceData,...) { ... write(); traceData becomes invalid in here if (traceData) traceData-setData(Mem); ... } so I'm not sure how that would work. Why is traceData passed in as a parameter and not just grabbed from whatever the first parameter of the function is (i.e. the dyn inst or the CPU object)? Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev I don't know for sure, but I would guess that would be a lot less efficient and more complicated. traceData gets used for every output operand so you'd be calling through the exec context a lot, even when it's just going to return NULL. This is how the isa_parser has generated this code since I started working with it back with SPARC and probably before that. The real issue is that initiateAcc is getting the rug pulled out from under it before it finishes running, so even if it came from the exec context you're trace information would be printed and discarded prematurely. Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
This patch should fix the issue Geoff was having where tracing would cause a segfault. If you could please give it a try, Geoff, let me know if it actually solves the problem. This will add a small overhead for every memory instruction even though the old behavior is wrong only in certain circumstances. I considered trying to avoid that overhead in cases where the problem can't happen, but then tracing would change the behavior of the CPU and that seems like a bad idea. Assuming this actually fixes the bug and nobody objects I'll commit this soon. Gabe gbl...@eecs.umich.edu wrote: # HG changeset patch # User Gabe Black gbl...@eecs.umich.edu # Date 1240729422 25200 # Node ID 284ff3c27bbddacf5763e7b5a54e99e4dcd1912a # Parent 8652636856b3d24fb0088fb1af5f5dca5008d9c8 CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc. diff --git a/src/cpu/simple/timing.cc b/src/cpu/simple/timing.cc --- a/src/cpu/simple/timing.cc +++ b/src/cpu/simple/timing.cc @@ -115,6 +115,8 @@ ifetch_pkt = dcache_pkt = NULL; drainEvent = NULL; previousTick = 0; +inInitiateAcc = false; +completionPkt = NULL; changeState(SimObject::Running); } @@ -758,7 +760,13 @@ if (curStaticInst curStaticInst-isMemRef() !curStaticInst-isDataPrefetch()) { // load or store: just send to dcache +inInitiateAcc = true; Fault fault = curStaticInst-initiateAcc(this, traceData); +inInitiateAcc = false; +if (completionPkt) { +completeDataAccess(completionPkt); +completionPkt = NULL; +} if (_status != Running) { // instruction will complete in dcache response callback assert(_status == DcacheWaitResponse || @@ -856,6 +864,10 @@ void TimingSimpleCPU::completeDataAccess(PacketPtr pkt) { +if (inInitiateAcc) { +completionPkt = pkt; +return; +} // received a response from the dcache: complete the load or store // instruction assert(!pkt-isError()); diff --git a/src/cpu/simple/timing.hh b/src/cpu/simple/timing.hh --- a/src/cpu/simple/timing.hh +++ b/src/cpu/simple/timing.hh @@ -317,6 +317,9 @@ Tick previousTick; +bool inInitiateAcc; +PacketPtr completionPkt; + public: virtual Port *getPort(const std::string if_name, int idx = -1); ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Can you explain how the completeAcc ever gets called before the initiatieAcc finishes in a SimpleCPU? I'm thinking, why isnt the SimpleCPU just stalling if there is a delay waiting for a memory access.If that's the right behavior, then isnt the solution to just stall the CPU (or even sleep it) and wait for initiateAcc to finish rather than try to work around the initiateAcc/completeAcc simultaneous call? I guess I dont understand fully the problem but it seems workable without a delay in overhead for all other memory accesses. Also, would this be a issue for any other CPU models (O3 or InOrder)? On Sun, Apr 26, 2009 at 3:22 AM, Gabe Black gbl...@eecs.umich.edu wrote: This patch should fix the issue Geoff was having where tracing would cause a segfault. If you could please give it a try, Geoff, let me know if it actually solves the problem. This will add a small overhead for every memory instruction even though the old behavior is wrong only in certain circumstances. I considered trying to avoid that overhead in cases where the problem can't happen, but then tracing would change the behavior of the CPU and that seems like a bad idea. Assuming this actually fixes the bug and nobody objects I'll commit this soon. Gabe gbl...@eecs.umich.edu wrote: # HG changeset patch # User Gabe Black gbl...@eecs.umich.edu # Date 1240729422 25200 # Node ID 284ff3c27bbddacf5763e7b5a54e99e4dcd1912a # Parent 8652636856b3d24fb0088fb1af5f5dca5008d9c8 CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc. diff --git a/src/cpu/simple/timing.cc b/src/cpu/simple/timing.cc --- a/src/cpu/simple/timing.cc +++ b/src/cpu/simple/timing.cc @@ -115,6 +115,8 @@ ifetch_pkt = dcache_pkt = NULL; drainEvent = NULL; previousTick = 0; +inInitiateAcc = false; +completionPkt = NULL; changeState(SimObject::Running); } @@ -758,7 +760,13 @@ if (curStaticInst curStaticInst-isMemRef() !curStaticInst-isDataPrefetch()) { // load or store: just send to dcache +inInitiateAcc = true; Fault fault = curStaticInst-initiateAcc(this, traceData); +inInitiateAcc = false; +if (completionPkt) { +completeDataAccess(completionPkt); +completionPkt = NULL; +} if (_status != Running) { // instruction will complete in dcache response callback assert(_status == DcacheWaitResponse || @@ -856,6 +864,10 @@ void TimingSimpleCPU::completeDataAccess(PacketPtr pkt) { +if (inInitiateAcc) { +completionPkt = pkt; +return; +} // received a response from the dcache: complete the load or store // instruction assert(!pkt-isError()); diff --git a/src/cpu/simple/timing.hh b/src/cpu/simple/timing.hh --- a/src/cpu/simple/timing.hh +++ b/src/cpu/simple/timing.hh @@ -317,6 +317,9 @@ Tick previousTick; +bool inInitiateAcc; +PacketPtr completionPkt; + public: virtual Port *getPort(const std::string if_name, int idx = -1); ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- -- Korey L Sewell Graduate Student - PhD Candidate Computer Science Engineering University of Michigan ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
There's an explanation already in the thread with the subject Memory corruption inm5devrepositorywhenusing --trace-flags=ExecEnable. Stalling or sleeping doesn't make sense in this situation since it's not that sort of problem. Gabe Korey Sewell wrote: Can you explain how the completeAcc ever gets called before the initiatieAcc finishes in a SimpleCPU? I'm thinking, why isnt the SimpleCPU just stalling if there is a delay waiting for a memory access.If that's the right behavior, then isnt the solution to just stall the CPU (or even sleep it) and wait for initiateAcc to finish rather than try to work around the initiateAcc/completeAcc simultaneous call? I guess I dont understand fully the problem but it seems workable without a delay in overhead for all other memory accesses. Also, would this be a issue for any other CPU models (O3 or InOrder)? On Sun, Apr 26, 2009 at 3:22 AM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: This patch should fix the issue Geoff was having where tracing would cause a segfault. If you could please give it a try, Geoff, let me know if it actually solves the problem. This will add a small overhead for every memory instruction even though the old behavior is wrong only in certain circumstances. I considered trying to avoid that overhead in cases where the problem can't happen, but then tracing would change the behavior of the CPU and that seems like a bad idea. Assuming this actually fixes the bug and nobody objects I'll commit this soon. Gabe gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: # HG changeset patch # User Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu # Date 1240729422 25200 # Node ID 284ff3c27bbddacf5763e7b5a54e99e4dcd1912a # Parent 8652636856b3d24fb0088fb1af5f5dca5008d9c8 CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc. diff --git a/src/cpu/simple/timing.cc b/src/cpu/simple/timing.cc --- a/src/cpu/simple/timing.cc +++ b/src/cpu/simple/timing.cc @@ -115,6 +115,8 @@ ifetch_pkt = dcache_pkt = NULL; drainEvent = NULL; previousTick = 0; +inInitiateAcc = false; +completionPkt = NULL; changeState(SimObject::Running); } @@ -758,7 +760,13 @@ if (curStaticInst curStaticInst-isMemRef() !curStaticInst-isDataPrefetch()) { // load or store: just send to dcache +inInitiateAcc = true; Fault fault = curStaticInst-initiateAcc(this, traceData); +inInitiateAcc = false; +if (completionPkt) { +completeDataAccess(completionPkt); +completionPkt = NULL; +} if (_status != Running) { // instruction will complete in dcache response callback assert(_status == DcacheWaitResponse || @@ -856,6 +864,10 @@ void TimingSimpleCPU::completeDataAccess(PacketPtr pkt) { +if (inInitiateAcc) { +completionPkt = pkt; +return; +} // received a response from the dcache: complete the load or store // instruction assert(!pkt-isError()); diff --git a/src/cpu/simple/timing.hh b/src/cpu/simple/timing.hh --- a/src/cpu/simple/timing.hh +++ b/src/cpu/simple/timing.hh @@ -317,6 +317,9 @@ Tick previousTick; +bool inInitiateAcc; +PacketPtr completionPkt; + public: virtual Port *getPort(const std::string if_name, int idx = -1); ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- -- Korey L Sewell Graduate Student - PhD Candidate Computer Science Engineering University of Michigan ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Yea, I was on that thread. Your explanation wasnt particularly in great detail there (files?, line #s?) so its hard to follow without getting your hand dirty on where this code is you are referring to. I'll comment and you can clarify my assumptions... Oooh. I see what's broken. This is a result of my changes to allow delaying translation. What happens is that Stl_c goes into initiateAcc. That function calls write on the CPU which calls into the TLB which calls the translation callback which recognizes a failed store conditional which completes the instruction execution with completeAcc and cleans up. The call stack then collapses back to the initiateAcc which is still waiting to finish and which then tries to call a member function on traceData which was deleted during the cleanup. The problem here is not fundamentally complicated, but the mechanisms involved are. One solution would be to record the fact that we're still in initiateAcc, and if we are wait for the call stack to collapse back down to initiateAcc's caller before calling into completeAcc. That matches the semantics an instruction would expect more, I think, where the initiateAcc/completeAcc pair are called sequentially. A couple of thoughts come to mind: 1. If the problem is with traceData and specifially just with store conditionals, what prevents you from a solution that handles that is a small overhead on that case instead of on every memory access? Particularly, why not add a StoreCondInitiateAcc template in mem.isa that handles this special case? In there after the write to the CPU, you can maybe check to see if you want to mess with traceData stuff or not depending on if the conditional is already failed or whatever. 2. If the traceData is deleted, then why is it available to be accessed in initiateAcc anyway? Isnt it enclosed by a if (traceData)? Shouldnt that return false if that is a NULL pointer? 3. Lastly, it sounds like this could be a slightly good candidate case for having the ability to functionally split TLB access from the timing memory access since in this case you want to do the access depending on something that happens with the TLB... On Sun, Apr 26, 2009 at 2:43 PM, Gabe Black gbl...@eecs.umich.edu wrote: There's an explanation already in the thread with the subject Memory corruption inm5devrepositorywhenusing --trace-flags=ExecEnable. Stalling or sleeping doesn't make sense in this situation since it's not that sort of problem. Gabe Korey Sewell wrote: Can you explain how the completeAcc ever gets called before the initiatieAcc finishes in a SimpleCPU? I'm thinking, why isnt the SimpleCPU just stalling if there is a delay waiting for a memory access.If that's the right behavior, then isnt the solution to just stall the CPU (or even sleep it) and wait for initiateAcc to finish rather than try to work around the initiateAcc/completeAcc simultaneous call? I guess I dont understand fully the problem but it seems workable without a delay in overhead for all other memory accesses. Also, would this be a issue for any other CPU models (O3 or InOrder)? On Sun, Apr 26, 2009 at 3:22 AM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: This patch should fix the issue Geoff was having where tracing would cause a segfault. If you could please give it a try, Geoff, let me know if it actually solves the problem. This will add a small overhead for every memory instruction even though the old behavior is wrong only in certain circumstances. I considered trying to avoid that overhead in cases where the problem can't happen, but then tracing would change the behavior of the CPU and that seems like a bad idea. Assuming this actually fixes the bug and nobody objects I'll commit this soon. Gabe gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: # HG changeset patch # User Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu # Date 1240729422 25200 # Node ID 284ff3c27bbddacf5763e7b5a54e99e4dcd1912a # Parent 8652636856b3d24fb0088fb1af5f5dca5008d9c8 CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc. diff --git a/src/cpu/simple/timing.cc b/src/cpu/simple/timing.cc --- a/src/cpu/simple/timing.cc +++ b/src/cpu/simple/timing.cc @@ -115,6 +115,8 @@ ifetch_pkt = dcache_pkt = NULL; drainEvent = NULL; previousTick = 0; +inInitiateAcc = false; +completionPkt = NULL; changeState(SimObject::Running); } @@ -758,7 +760,13 @@ if (curStaticInst curStaticInst-isMemRef() !curStaticInst-isDataPrefetch()) { // load or store: just send to dcache +inInitiateAcc = true; Fault fault =
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
You should read my description, read the code, and then once you understand the problem comment on my solution. Gabe Korey Sewell wrote: Yea, I was on that thread. Your explanation wasnt particularly in great detail there (files?, line #s?) so its hard to follow without getting your hand dirty on where this code is you are referring to. I'll comment and you can clarify my assumptions... Oooh. I see what's broken. This is a result of my changes to allow delaying translation. What happens is that Stl_c goes into initiateAcc. That function calls write on the CPU which calls into the TLB which calls the translation callback which recognizes a failed store conditional which completes the instruction execution with completeAcc and cleans up. The call stack then collapses back to the initiateAcc which is still waiting to finish and which then tries to call a member function on traceData which was deleted during the cleanup. The problem here is not fundamentally complicated, but the mechanisms involved are. One solution would be to record the fact that we're still in initiateAcc, and if we are wait for the call stack to collapse back down to initiateAcc's caller before calling into completeAcc. That matches the semantics an instruction would expect more, I think, where the initiateAcc/completeAcc pair are called sequentially. A couple of thoughts come to mind: 1. If the problem is with traceData and specifially just with store conditionals, what prevents you from a solution that handles that is a small overhead on that case instead of on every memory access? Particularly, why not add a StoreCondInitiateAcc template in mem.isa that handles this special case? In there after the write to the CPU, you can maybe check to see if you want to mess with traceData stuff or not depending on if the conditional is already failed or whatever. 2. If the traceData is deleted, then why is it available to be accessed in initiateAcc anyway? Isnt it enclosed by a if (traceData)? Shouldnt that return false if that is a NULL pointer? 3. Lastly, it sounds like this could be a slightly good candidate case for having the ability to functionally split TLB access from the timing memory access since in this case you want to do the access depending on something that happens with the TLB... On Sun, Apr 26, 2009 at 2:43 PM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: There's an explanation already in the thread with the subject Memory corruption inm5devrepositorywhenusing --trace-flags=ExecEnable. Stalling or sleeping doesn't make sense in this situation since it's not that sort of problem. Gabe Korey Sewell wrote: Can you explain how the completeAcc ever gets called before the initiatieAcc finishes in a SimpleCPU? I'm thinking, why isnt the SimpleCPU just stalling if there is a delay waiting for a memory access.If that's the right behavior, then isnt the solution to just stall the CPU (or even sleep it) and wait for initiateAcc to finish rather than try to work around the initiateAcc/completeAcc simultaneous call? I guess I dont understand fully the problem but it seems workable without a delay in overhead for all other memory accesses. Also, would this be a issue for any other CPU models (O3 or InOrder)? On Sun, Apr 26, 2009 at 3:22 AM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: This patch should fix the issue Geoff was having where tracing would cause a segfault. If you could please give it a try, Geoff, let me know if it actually solves the problem. This will add a small overhead for every memory instruction even though the old behavior is wrong only in certain circumstances. I considered trying to avoid that overhead in cases where the problem can't happen, but then tracing would change the behavior of the CPU and that seems like a bad idea. Assuming this actually fixes the bug and nobody objects I'll commit this soon. Gabe gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: # HG changeset patch # User Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu # Date 1240729422 25200 # Node ID 284ff3c27bbddacf5763e7b5a54e99e4dcd1912a # Parent 8652636856b3d24fb0088fb1af5f5dca5008d9c8 CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc. diff --git a/src/cpu/simple/timing.cc
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
I should add, I realize it's not very clear what's going on, but it would take a lot more effort to explain everything in enough detail to get the problem across than for you (or anyone else) to look at the code. If you do look into the code, it's not a functional problem as far as how the timing CPU works or organizes events per se. It's a C++ problem where things may or may not end up being called from the middle of each other. Gabe Gabe Black wrote: You should read my description, read the code, and then once you understand the problem comment on my solution. Gabe Korey Sewell wrote: Yea, I was on that thread. Your explanation wasnt particularly in great detail there (files?, line #s?) so its hard to follow without getting your hand dirty on where this code is you are referring to. I'll comment and you can clarify my assumptions... Oooh. I see what's broken. This is a result of my changes to allow delaying translation. What happens is that Stl_c goes into initiateAcc. That function calls write on the CPU which calls into the TLB which calls the translation callback which recognizes a failed store conditional which completes the instruction execution with completeAcc and cleans up. The call stack then collapses back to the initiateAcc which is still waiting to finish and which then tries to call a member function on traceData which was deleted during the cleanup. The problem here is not fundamentally complicated, but the mechanisms involved are. One solution would be to record the fact that we're still in initiateAcc, and if we are wait for the call stack to collapse back down to initiateAcc's caller before calling into completeAcc. That matches the semantics an instruction would expect more, I think, where the initiateAcc/completeAcc pair are called sequentially. A couple of thoughts come to mind: 1. If the problem is with traceData and specifially just with store conditionals, what prevents you from a solution that handles that is a small overhead on that case instead of on every memory access? Particularly, why not add a StoreCondInitiateAcc template in mem.isa that handles this special case? In there after the write to the CPU, you can maybe check to see if you want to mess with traceData stuff or not depending on if the conditional is already failed or whatever. 2. If the traceData is deleted, then why is it available to be accessed in initiateAcc anyway? Isnt it enclosed by a if (traceData)? Shouldnt that return false if that is a NULL pointer? 3. Lastly, it sounds like this could be a slightly good candidate case for having the ability to functionally split TLB access from the timing memory access since in this case you want to do the access depending on something that happens with the TLB... On Sun, Apr 26, 2009 at 2:43 PM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: There's an explanation already in the thread with the subject Memory corruption inm5devrepositorywhenusing --trace-flags=ExecEnable. Stalling or sleeping doesn't make sense in this situation since it's not that sort of problem. Gabe Korey Sewell wrote: Can you explain how the completeAcc ever gets called before the initiatieAcc finishes in a SimpleCPU? I'm thinking, why isnt the SimpleCPU just stalling if there is a delay waiting for a memory access.If that's the right behavior, then isnt the solution to just stall the CPU (or even sleep it) and wait for initiateAcc to finish rather than try to work around the initiateAcc/completeAcc simultaneous call? I guess I dont understand fully the problem but it seems workable without a delay in overhead for all other memory accesses. Also, would this be a issue for any other CPU models (O3 or InOrder)? On Sun, Apr 26, 2009 at 3:22 AM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: This patch should fix the issue Geoff was having where tracing would cause a segfault. If you could please give it a try, Geoff, let me know if it actually solves the problem. This will add a small overhead for every memory instruction even though the old behavior is wrong only in certain circumstances. I considered trying to avoid that overhead in cases where the problem can't happen, but then tracing would change the behavior of the CPU and that seems like a bad idea. Assuming this actually fixes the bug and nobody objects I'll commit this soon. Gabe gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: # HG changeset patch #