Re: [m5-dev] Profile Results for Mesh Network

Nilay Vaish Thu, 27 Jan 2011 04:36:51 -0800

On Mon, 24 Jan 2011, Nilay Vaish wrote:

On Mon, 24 Jan 2011, Steve Reinhardt wrote:
Yes, that's right.  So there's probably no big win in trying to further
reduce the number of calls to lookup() in Ruby; the possibilities I see for
improvement are:
1. Adding an instruction buffer to SimpleCPU so we don't do a cache lookup
on *every* instruction fetch
2. Trying again to make the lookup() calls themselves faster (for example,a
lookup that hits the MRU block should really only take a handful of
instructions, while IIRC we were seeing much larger costs for the hashtable
lookup)
3. Moving on to some other area (like the Histogram thing)

#1 is not a Ruby issue, and could well be different under x86 since (1) x86
has a byte-stream-oriented predecoder so it doesn't do a fetch per
instruction anyway and (2) you may have to worry about self-modifying code.
Gabe, how many bytes at a time does the x86 predecoder fetch?  If it
doesn't currently grab a cache line at a time, could it be made to do so,
and do you know if that would cause any issues with SMC?
Nilay, I'd appreciate your comments on #2 and whether you think that'sworth
pursuing or should we move on to #3.

Steve
I now understand why the ratio is 2:1. Before every instruction fetch, thedata cache is looked up to make sure that it does not contain the cacheblock. After that the instruction cache is looked up. Similarly, before anydata access, the instruction cache is looked up. This is probably forcorrectly handling self modifying code.
Steve, we can try caching MRU cache block. We can also try replacing hashtable with a two dimensional array indexed using cache set and cache way.
There are calls to CacheMemory::isTagPresent() in Sequencer.cc. These callsare made just before calls to setMRU(). I am thinking of folding these callsto isTagPresent() within setMRU() which calls CacheMemory::findTagInSet()anyway.
--
Nilay


I tried caching the index for the MRU block, so that the hash table need
not be looked up. It is hard to point if there is a speed up or not. When
I run m5.prof, profile results show that time taken by
CacheMemory::lookup() drops from ~ 5.5% to ~ 4%. But when I run m5.fast,
the time of execution increases by 2%.

Earlier when we had the same address being looked up multiple times in
succession, this change made sense. Right now, I uncertain about whether
this change improves performance.

--
Nilay
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] Profile Results for Mesh Network

Reply via email to