Anybody? Gabe Black wrote: > If nobody has anything to say about the issue itself, letting me know > which part of my ramblings is the least comprehensible would also be > helpful. > > Gabe > > Gabriel Michael Black wrote: > >> My little diagram was missing a few "*"s. Here's a corrected version. >> The "*"s after Exec and completeAcc are for faults that would happen >> on the way back to PreInst. >> >> Gabe >> >> (PreInst)-*->(Fetch)/--------------------------->\/-(Exec)* >> \--->(ITLB)-*|->(I cache)-|->/\-(initiateAcc)-*-> >> >> (DTLB)-*|->(D cache)-|->(completeAcc)* >> >> >> Quoting Gabriel Michael Black <gbl...@eecs.umich.edu>: >> >> >> >>> Quoting Steve Reinhardt <ste...@gmail.com>: >>> >>> >>> >>>> On Sun, May 3, 2009 at 12:09 AM, Gabe Black <gbl...@eecs.umich.edu> wrote: >>>> >>>> >>>> >>>>> While this does avoid the segfault, it also causes some other bug which >>>>> crashes just about any of the simple timing regressions. I hadn't >>>>> actually tried any of the quick regressions when I sent that out since >>>>> my other testing had tricked me into thinking everything was fine. I >>>>> think it has something to do with faulting accesses not dealing with the >>>>> fault right away and instead continuing into the remainder of >>>>> completeIfetch. Rather than try to bandaid this into working, I'm >>>>> thinking I'll just going to go for it and try to see what reorganizing >>>>> the code buys me. >>>>> >>>>> >>>>> >>>> It seems like anything that uses the timing-mode translation would have to >>>> be prepared to not know whether a translation succeeds or not until a later >>>> event is scheduled.... are you saying that this change exposes a >>>> fundamental >>>> problem in the structure of the simple timing cpu with regard to how it >>>> deals with timing-mode translation? That's what it sounds like to me, but >>>> I >>>> just wanted to clarify. >>>> >>>> Thanks, >>>> >>>> Steve >>>> >>>> >>>> >>> Fundemental is probably too strong a word. Ingrained is probably >>> better. The simple timing CPU is now pretty different from how it >>> started life and I think that shows a bit. It's been split off of >>> atomic, has delayable translation, microcode, unaligned accesses, >>> variable instruction sizes, memory mapped registers, and there may be >>> other things I'm forgetting. Those things have been folded in and are >>> working, but I think a lot of the complexity is that the code wasn't >>> designed to accomadate them originally. >>> >>> This is actually a good opportunity to discuss how the timing CPU is >>> put together and what I'd like to do with it. To start, this is what >>> the lifetime of an instruction looks like. Points where the flow may >>> be delayed using the event queue are marked with "|". Points where the >>> flow may be halted by a fault are marked with a "*". This will >>> probably also look like garbage without fixed width fonts. >>> >>> (PreInst)-*->(Fetch)/--------------------------->\/-(Exec) >>> \--->(ITLB)-*|->(I cache)-|->/\-(initiateAcc)---> >>> >>> (DTLB)-*|->(D cache)-|->(completeAcc) >>> >>> The problem we started with is from initiateAcc going directly into >>> the DTLB portion without finish. Generally, we can run into problems >>> where we can go through this process avoiding all the "|"s or by >>> coincidence not delaying on them and get farther and farther ahead of >>> ourselves and/or build up a deeper and deeper pile of cruft on the >>> stack. If a macroop is implemented, for some reason, to loop around >>> and around inside itself waiting for, for instance, an interrupt to >>> happen, all "|"s would be skipped and the call stack would build until >>> it overflowed. What I would like to do, then, is structure the code so >>> that calls never venture too far from their origin and return home >>> before starting the next task. >>> >>> To get there, there are several types of control flow to consider. >>> 1. The end of the instruction where control loops back to PreInst >>> (which checks for interrupts and pc related events) >>> 2. A fault which is invoked and returns to PreInst. >>> 3. A potential delay which doesn't happen which needs to fall back to >>> the flow so that it can continue to the next step. >>> 4. A potential delay which -does- happen which needs to fall back to >>> the flow and then fall out of it so that the delay can happen in the >>> event queue. >>> 5. The flow being continued because whatever the CPU was waiting for >>> has happened. >>> >>> As I said, the way that this works now is that each step calls the >>> next if it should happen immediately, and otherwise the callback after >>> the delay starts things up again. That has the nice property of >>> localizing a lot of things to the point where they're relevant, like >>> checking for interrupts, and that the different pieces can be started >>> whenever is convenient. I've talked about the problems at length. >>> >>> Instead of having every part call the following parts, what I'd like >>> to do instead is have a function which can be stopped and started will >>> and which calls all the component operations as child peers. >>> Unfortunately, it's been really difficult coming up with something >>> that can efficiently do and which provides an efficient mechanism for >>> all the possible forms of control flow I listed above. >>> >>> One idea I had was to set up a switch statement where each phase of >>> the execution flow was a case. Cases would not have breaks between >>> them so that if execution should continue it would flow right into the >>> next. The individual phases could be skipped to directly to allow >>> restarting things after some sort of delay. >>> >>> There are three major problems with this approach though. First, the >>> execution flow as shown is not linear, so it can't be implemented >>> directly as a single chain of events with no control flow. Second, it >>> pulls decisions away from where they'd be made locally, ie checking >>> whether to actually do a fetch for whatever reason from where the >>> fetch would start. Third, it provides no easy way to stop in the >>> middle of things to handle a fault without constantly checking if >>> there's one to deal with. >>> >>> In order to allow faults I was thinking of some sort of try/catch >>> mechanism, but that just seems ugly. >>> >>> The point of all this is, I think the way the CPU is build is broken >>> as a result of significant feature creep. I think conceptually my way >>> is better, but I'm having a hard time figuring out how to actually >>> implement it without it turning into a big ugly mess. If anybody has a >>> suggestion for how to make this work, please let me know. >>> >>> Gabe >>> _______________________________________________ >>> m5-dev mailing list >>> m5-dev@m5sim.org >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >>> >>> >> _______________________________________________ >> m5-dev mailing list >> m5-dev@m5sim.org >> http://m5sim.org/mailman/listinfo/m5-dev >> >> > > _______________________________________________ > m5-dev mailing list > m5-dev@m5sim.org > http://m5sim.org/mailman/listinfo/m5-dev >
_______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev