On Mon, 24 Jan 2011, Nilay Vaish wrote:

On Mon, 24 Jan 2011, Steve Reinhardt wrote:

Yes, that's right.  So there's probably no big win in trying to further
reduce the number of calls to lookup() in Ruby; the possibilities I see for
improvement are:
1. Adding an instruction buffer to SimpleCPU so we don't do a cache lookup
on *every* instruction fetch
2. Trying again to make the lookup() calls themselves faster (for example, a
lookup that hits the MRU block should really only take a handful of
instructions, while IIRC we were seeing much larger costs for the hash table
lookup)
3. Moving on to some other area (like the Histogram thing)

#1 is not a Ruby issue, and could well be different under x86 since (1) x86
has a byte-stream-oriented predecoder so it doesn't do a fetch per
instruction anyway and (2) you may have to worry about self-modifying code.
Gabe, how many bytes at a time does the x86 predecoder fetch?  If it
doesn't currently grab a cache line at a time, could it be made to do so,
and do you know if that would cause any issues with SMC?

Nilay, I'd appreciate your comments on #2 and whether you think that's worth
pursuing or should we move on to #3.

Steve


I now understand why the ratio is 2:1. Before every instruction fetch, the data cache is looked up to make sure that it does not contain the cache block. After that the instruction cache is looked up. Similarly, before any data access, the instruction cache is looked up. This is probably for correctly handling self modifying code.

Steve, we can try caching MRU cache block. We can also try replacing hash table with a two dimensional array indexed using cache set and cache way.

There are calls to CacheMemory::isTagPresent() in Sequencer.cc. These calls are made just before calls to setMRU(). I am thinking of folding these calls to isTagPresent() within setMRU() which calls CacheMemory::findTagInSet() anyway.

--
Nilay


I tried caching the index for the MRU block, so that the hash table need
not be looked up. It is hard to point if there is a speed up or not. When
I run m5.prof, profile results show that time taken by
CacheMemory::lookup() drops from ~ 5.5% to ~ 4%. But when I run m5.fast,
the time of execution increases by 2%.

Earlier when we had the same address being looked up multiple times in
succession, this change made sense. Right now, I uncertain about whether
this change improves performance.

--
Nilay
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to