All the discussion of different extensions to TimingSimpleCPU got me
thinking again about what a mess it is.  I walked through the code
with Brad & Joel a few weeks ago, and it's still the same basic
structure of everything being driven by callbacks, with numerous cases
where we call the next callback directly because some stage is getting
bypassed.  That was confusing enough already, but now we have about
twice as many of these situations, and several different ways of
implementing them (some callbacks come via ports, then there's
WholeTranslationState::finish() which uses a virtual function override
(that's redirected in simple/timing.hh just to keep you on your toes),
then there's DataTranslation which derives from WholeTranslationState
and catches the finish() method and redirects it to
finishTranslation() using a template...).

I'm not sure there's a good solution to the
sometimes-bypassed-chained-callbacks structure (it seems inherent in
the way it needs to work) other than good documentation.  But if we
regularize how those callbacks are handled that would help a lot. One
way to do this is to pass translation requests to the TLBs via ports
(e.g., dtb->sendTiming(rqst)).  Then everything would be
message-driven, and all the callbacks would come through different
ports.  Once you understand how ports work then you could figure it
out yourself.

A second step that's somewhat independent but still seems nicely
complementary is to push all the unaligned access ugliness out of the
CPU.  The basic steps wouldn't change much, but the complexity would
be hidden from the CPU, and could be omitted for ISAs that don't have
to deal with it.  The cleanest way would be to create a shim object
that takes a potentially unaligned request from the CPU and does the
split/recombine if it is a line/page crosser but just forwards it
otherwise.  I think we'd definitely want to go this way for the
caches, since we don't really want to push the complexity into the
cache either, but I could see skipping the shim and just embedding the
logic in the TLB for the ISAs that need it, since the TLB is already
ISA-specific (though we'd still want to use a common mechanism like
the WholeTranslationState thing).

This mechanism could then work for all the CPU models... was there a
reason we didn't do it this way in the first place?  If we thought it
would be too much overhead, I say forget it, at this point I'm willing
to pay a little runtime overhead to clean up this code.  And I'm not
sure it would be any more overhead than what we already have anyway.

Thoughts?  Volunteers?  :-)

Steve
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to