All the discussion of different extensions to TimingSimpleCPU got me thinking again about what a mess it is. I walked through the code with Brad & Joel a few weeks ago, and it's still the same basic structure of everything being driven by callbacks, with numerous cases where we call the next callback directly because some stage is getting bypassed. That was confusing enough already, but now we have about twice as many of these situations, and several different ways of implementing them (some callbacks come via ports, then there's WholeTranslationState::finish() which uses a virtual function override (that's redirected in simple/timing.hh just to keep you on your toes), then there's DataTranslation which derives from WholeTranslationState and catches the finish() method and redirects it to finishTranslation() using a template...).
I'm not sure there's a good solution to the sometimes-bypassed-chained-callbacks structure (it seems inherent in the way it needs to work) other than good documentation. But if we regularize how those callbacks are handled that would help a lot. One way to do this is to pass translation requests to the TLBs via ports (e.g., dtb->sendTiming(rqst)). Then everything would be message-driven, and all the callbacks would come through different ports. Once you understand how ports work then you could figure it out yourself. A second step that's somewhat independent but still seems nicely complementary is to push all the unaligned access ugliness out of the CPU. The basic steps wouldn't change much, but the complexity would be hidden from the CPU, and could be omitted for ISAs that don't have to deal with it. The cleanest way would be to create a shim object that takes a potentially unaligned request from the CPU and does the split/recombine if it is a line/page crosser but just forwards it otherwise. I think we'd definitely want to go this way for the caches, since we don't really want to push the complexity into the cache either, but I could see skipping the shim and just embedding the logic in the TLB for the ISAs that need it, since the TLB is already ISA-specific (though we'd still want to use a common mechanism like the WholeTranslationState thing). This mechanism could then work for all the CPU models... was there a reason we didn't do it this way in the first place? If we thought it would be too much overhead, I say forget it, at this point I'm willing to pay a little runtime overhead to clean up this code. And I'm not sure it would be any more overhead than what we already have anyway. Thoughts? Volunteers? :-) Steve _______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev