Re: [m5-dev] [PATCH] CPU: Defer completing an access until we're no longer running out of initiateAcc

Gabe Black Tue, 05 May 2009 23:39:11 -0700

Anybody?

Gabe Black wrote:
> If nobody has anything to say about the issue itself, letting me know
> which part of my ramblings is the least comprehensible would also be
> helpful.
>
> Gabe
>
> Gabriel Michael Black wrote:
>   
>> My little diagram was missing a few "*"s. Here's a corrected version.  
>> The "*"s after Exec and completeAcc are for faults that would happen  
>> on the way back to PreInst.
>>
>> Gabe
>>
>> (PreInst)-*->(Fetch)/--------------------------->\/-(Exec)*
>>                      \--->(ITLB)-*|->(I cache)-|->/\-(initiateAcc)-*->
>>
>> (DTLB)-*|->(D cache)-|->(completeAcc)*
>>
>>
>> Quoting Gabriel Michael Black <gbl...@eecs.umich.edu>:
>>
>>   
>>     
>>> Quoting Steve Reinhardt <ste...@gmail.com>:
>>>
>>>     
>>>       
>>>> On Sun, May 3, 2009 at 12:09 AM, Gabe Black <gbl...@eecs.umich.edu> wrote:
>>>>
>>>>       
>>>>         
>>>>> While this does avoid the segfault, it also causes some other bug which
>>>>> crashes just about any of the simple timing regressions. I hadn't
>>>>> actually tried any of the quick regressions when I sent that out since
>>>>> my other testing had tricked me into thinking everything was fine. I
>>>>> think it has something to do with faulting accesses not dealing with the
>>>>> fault right away and instead continuing into the remainder of
>>>>> completeIfetch. Rather than try to bandaid this into working, I'm
>>>>> thinking I'll just going to go for it and try to see what reorganizing
>>>>> the code buys me.
>>>>>
>>>>>         
>>>>>           
>>>> It seems like anything that uses the timing-mode translation would have to
>>>> be prepared to not know whether a translation succeeds or not until a later
>>>> event is scheduled.... are you saying that this change exposes a 
>>>> fundamental
>>>> problem in the structure of the simple timing cpu with regard to how it
>>>> deals with timing-mode translation?  That's what it sounds like to me, but 
>>>> I
>>>> just wanted to clarify.
>>>>
>>>> Thanks,
>>>>
>>>> Steve
>>>>
>>>>       
>>>>         
>>> Fundemental is probably too strong a word. Ingrained is probably
>>> better. The simple timing CPU is now pretty different from how it
>>> started life and I think that shows a bit. It's been split off of
>>> atomic, has delayable translation, microcode, unaligned accesses,
>>> variable instruction sizes, memory mapped registers, and there may be
>>> other things I'm forgetting. Those things have been folded in and are
>>> working, but I think a lot of the complexity is that the code wasn't
>>> designed to accomadate them originally.
>>>
>>> This is actually a good opportunity to discuss how the timing CPU is
>>> put together and what I'd like to do with it. To start, this is what
>>> the lifetime of an instruction looks like. Points where the flow may
>>> be delayed using the event queue are marked with "|". Points where the
>>> flow may be halted by a fault are marked with a "*". This will
>>> probably also look like garbage without fixed width fonts.
>>>
>>> (PreInst)-*->(Fetch)/--------------------------->\/-(Exec)
>>>                      \--->(ITLB)-*|->(I cache)-|->/\-(initiateAcc)--->
>>>
>>> (DTLB)-*|->(D cache)-|->(completeAcc)
>>>
>>> The problem we started with is from initiateAcc going directly into
>>> the DTLB portion without finish. Generally, we can run into problems
>>> where we can go through this process avoiding all the "|"s or by
>>> coincidence not delaying on them and get farther and farther ahead of
>>> ourselves and/or build up a deeper and deeper pile of cruft on the
>>> stack. If a macroop is implemented, for some reason, to loop around
>>> and around inside itself waiting for, for instance, an interrupt to
>>> happen, all "|"s would be skipped and the call stack would build until
>>> it overflowed. What I would like to do, then, is structure the code so
>>> that calls never venture too far from their origin and return home
>>> before starting the next task.
>>>
>>> To get there, there are several types of control flow to consider.
>>> 1. The end of the instruction where control loops back to PreInst
>>> (which checks for interrupts and pc related events)
>>> 2. A fault which is invoked and returns to PreInst.
>>> 3. A potential delay which doesn't happen which needs to fall back to
>>> the flow so that it can continue to the next step.
>>> 4. A potential delay which -does- happen which needs to fall back to
>>> the flow and then fall out of it so that the delay can happen in the
>>> event queue.
>>> 5. The flow being continued because whatever the CPU was waiting for
>>> has happened.
>>>
>>> As I said, the way that this works now is that each step calls the
>>> next if it should happen immediately, and otherwise the callback after
>>> the delay starts things up again. That has the nice property of
>>> localizing a lot of things to the point where they're relevant, like
>>> checking for interrupts, and that the different pieces can be started
>>> whenever is convenient. I've talked about the problems at length.
>>>
>>> Instead of having every part call the following parts, what I'd like
>>> to do instead is have a function which can be stopped and started will
>>> and which calls all the component operations as child peers.
>>> Unfortunately, it's been really difficult coming up with something
>>> that can efficiently do and which provides an efficient mechanism for
>>> all the possible forms of control flow I listed above.
>>>
>>> One idea I had was to set up a switch statement where each phase of
>>> the execution flow was a case. Cases would not have breaks between
>>> them so that if execution should continue it would flow right into the
>>> next. The individual phases could be skipped to directly to allow
>>> restarting things after some sort of delay.
>>>
>>> There are three major problems with this approach though. First, the
>>> execution flow as shown is not linear, so it can't be implemented
>>> directly as a single chain of events with no control flow. Second, it
>>> pulls decisions away from where they'd be made locally, ie checking
>>> whether to actually do a fetch for whatever reason from where the
>>> fetch would start. Third, it provides no easy way to stop in the
>>> middle of things to handle a fault without constantly checking if
>>> there's one to deal with.
>>>
>>> In order to allow faults I was thinking of some sort of try/catch
>>> mechanism, but that just seems ugly.
>>>
>>> The point of all this is, I think the way the CPU is build is broken
>>> as a result of significant feature creep. I think conceptually my way
>>> is better, but I'm having a hard time figuring out how to actually
>>> implement it without it turning into a big ugly mess. If anybody has a
>>> suggestion for how to make this work, please let me know.
>>>
>>> Gabe
>>> _______________________________________________
>>> m5-dev mailing list
>>> m5-dev@m5sim.org
>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>
>>>     
>>>       
>> _______________________________________________
>> m5-dev mailing list
>> m5-dev@m5sim.org
>> http://m5sim.org/mailman/listinfo/m5-dev
>>   
>>     
>
> _______________________________________________
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>


_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] [PATCH] CPU: Defer completing an access until we're no longer running out of initiateAcc

Reply via email to