On Thu, Jan 27, 2011 at 4:36 AM, Nilay Vaish <ni...@cs.wisc.edu> wrote:

> I tried caching the index for the MRU block, so that the hash table need
> not be looked up. It is hard to point if there is a speed up or not. When
> I run m5.prof, profile results show that time taken by
> CacheMemory::lookup() drops from ~ 5.5% to ~ 4%. But when I run m5.fast,
> the time of execution increases by 2%.
>
> Earlier when we had the same address being looked up multiple times in
> succession, this change made sense. Right now, I uncertain about whether
> this change improves performance.



Hi Nilay,

Are you talking about caching the MRU block for the entire cache?  I agree
that that isn't likely to help much now that we've eliminated the redundant
lookup() calls.  I was thinking of a conventional set-associative cache
lookup (a 2D array, or a 1D array of sets where each set is represented by a
linked list) where the MRU block of each set is moved to the head of the
list.  For an L1 cache in particular, most of the accesses to each set will
be to the MRU block within that set, so even though you have a linear search
within the set, you will often only have to do one comparison.  See
accessBlock() in mem/cache/tags/lru.cc for an example of this.  And even if
you don't hit the MRU block, most L1s have relatively small associativity so
a linear search is not that bad.  I have a hard time believing that this
wouldn't be substantially faster than a hash table lookup.

For caches that are more highly associative, or L2/L3 caches where the
probability of hitting the MRU block is not as high, or for snoop lookups
where you are typically searching the whole set to verify that a block is
*not* there, then a hash table *may* make more sense.  (Though even in that
case I wonder whether a whole-cache hash table vs. per-set hash tables is a
better idea.)  But these should all be much less frequent than L1 hit
lookups, so I would think the set-associative lookup would still be faster
overall.  The optimal strategy in the long run may be to have both types of
lookup implemented and to choose the right one at configuration time based
on the caches associativity and position... we sort of tried to do this in
the original M5 code with the LRU and FALRU tags, but didn't really push the
concept all the way to figuring out the optimal crossover point.

My mention above about snooping being a bad case for the set-associative
linear-search lookup makes me wonder about the double icache/dcache lookup
you mentioned... that could be part of the problem, and it's not necessary.
 There should be a real coherence protocol between the two caches, so that
you only have to snoop the other cache in the same situations where you'd
have to snoop another core's data cache... i.e., if either cache has an
exclusive copy, or is only reading a shared copy, there's no point in doing
that second lookup.  Is there any notion in the protocol of having different
protocol states for the icache and dcache?

Steve
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to