Steve Reinhardt wrote:
> All the discussion of different extensions to TimingSimpleCPU got me
> thinking again about what a mess it is.  I walked through the code
> with Brad & Joel a few weeks ago, and it's still the same basic
> structure of everything being driven by callbacks, with numerous cases
> where we call the next callback directly because some stage is getting
> bypassed.  That was confusing enough already, but now we have about
> twice as many of these situations, and several different ways of
> implementing them (some callbacks come via ports, then there's
> WholeTranslationState::finish() which uses a virtual function override
> (that's redirected in simple/timing.hh just to keep you on your toes),
> then there's DataTranslation which derives from WholeTranslationState
> and catches the finish() method and redirects it to
> finishTranslation() using a template...).
>
> I'm not sure there's a good solution to the
> sometimes-bypassed-chained-callbacks structure (it seems inherent in
> the way it needs to work) other than good documentation.  But if we
> regularize how those callbacks are handled that would help a lot. One
> way to do this is to pass translation requests to the TLBs via ports
> (e.g., dtb->sendTiming(rqst)).  Then everything would be
> message-driven, and all the callbacks would come through different
> ports.  Once you understand how ports work then you could figure it
> out yourself.
>
> A second step that's somewhat independent but still seems nicely
> complementary is to push all the unaligned access ugliness out of the
> CPU.  The basic steps wouldn't change much, but the complexity would
> be hidden from the CPU, and could be omitted for ISAs that don't have
> to deal with it.  The cleanest way would be to create a shim object
> that takes a potentially unaligned request from the CPU and does the
> split/recombine if it is a line/page crosser but just forwards it
> otherwise.  I think we'd definitely want to go this way for the
> caches, since we don't really want to push the complexity into the
> cache either, but I could see skipping the shim and just embedding the
> logic in the TLB for the ISAs that need it, since the TLB is already
> ISA-specific (though we'd still want to use a common mechanism like
> the WholeTranslationState thing).
>
> This mechanism could then work for all the CPU models... was there a
> reason we didn't do it this way in the first place?  If we thought it
> would be too much overhead, I say forget it, at this point I'm willing
> to pay a little runtime overhead to clean up this code.  And I'm not
> sure it would be any more overhead than what we already have anyway.
>
> Thoughts?  Volunteers?  :-)
>
> Steve
> _______________________________________________
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>   

I can't say it was -the- reason, but one reason is that the TLBs as is
don't actually send the packets for the CPU, so they can't split
anything into multiple transactions easily. I'm intrigued by the idea of
putting the TLB behind a port or port like interface, maybe even
exporting the TLB outside of the CPU's guts and putting it inline with
external accesses. There are three problems with that, though. First,
the TLB would likely need some alternative way to pass a fault back to
the CPU. Maybe the request would have a fault pointer field? Second, the
TLB is the thing that recognizes when an access is to memory mapped
control state within the CPU. It would need a way to communicate with
the CPU to get/set those values. Third, the control state that actually
-runs- the TLB is maintained by the CPU, namely what mode it's in, etc.

This also brings up another idea I've been rolling around for a while.
Why is all the control state local to the miscregfile/it's decendant the
ISA object? Why don't we put control state that matters to the TLB, or
at least a copy of it, in the TLB itself and then communicate it back
and forth as necessary? That would be easier to code (or at least I'm
guessing) since you'd just have the state right there, faster since it
avoids calling out for it, and would more conceptually match real
hardware where all the control state isn't put in one huge blob
someplace. The same thing could be done for other structures like the
interrupt controller, and maybe the decoder and/or predecoder. Speaking
of the decoder, it would be nice to make that a little stateful as well.
As it is in, say, ARM, the decoder has to rediscover what mode it's in
over and over. I'm guessing it would be better to explicitly switch it's
state (or it entirely) when changing modes instead, although that might
add a fair amount of complexity. Perhaps the decoder should be an object
instead of a bare function? I'm less sure how that would work. It could,
hypothetically, allow us to return the two PC bits commandeered to
signal the mode.

Gabe

Gabe
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to