On Thu, Jan 27, 2011 at 4:36 AM, Nilay Vaish <ni...@cs.wisc.edu> wrote:
> I tried caching the index for the MRU block, so that the hash table need > not be looked up. It is hard to point if there is a speed up or not. When > I run m5.prof, profile results show that time taken by > CacheMemory::lookup() drops from ~ 5.5% to ~ 4%. But when I run m5.fast, > the time of execution increases by 2%. > > Earlier when we had the same address being looked up multiple times in > succession, this change made sense. Right now, I uncertain about whether > this change improves performance. Hi Nilay, Are you talking about caching the MRU block for the entire cache? I agree that that isn't likely to help much now that we've eliminated the redundant lookup() calls. I was thinking of a conventional set-associative cache lookup (a 2D array, or a 1D array of sets where each set is represented by a linked list) where the MRU block of each set is moved to the head of the list. For an L1 cache in particular, most of the accesses to each set will be to the MRU block within that set, so even though you have a linear search within the set, you will often only have to do one comparison. See accessBlock() in mem/cache/tags/lru.cc for an example of this. And even if you don't hit the MRU block, most L1s have relatively small associativity so a linear search is not that bad. I have a hard time believing that this wouldn't be substantially faster than a hash table lookup. For caches that are more highly associative, or L2/L3 caches where the probability of hitting the MRU block is not as high, or for snoop lookups where you are typically searching the whole set to verify that a block is *not* there, then a hash table *may* make more sense. (Though even in that case I wonder whether a whole-cache hash table vs. per-set hash tables is a better idea.) But these should all be much less frequent than L1 hit lookups, so I would think the set-associative lookup would still be faster overall. The optimal strategy in the long run may be to have both types of lookup implemented and to choose the right one at configuration time based on the caches associativity and position... we sort of tried to do this in the original M5 code with the LRU and FALRU tags, but didn't really push the concept all the way to figuring out the optimal crossover point. My mention above about snooping being a bad case for the set-associative linear-search lookup makes me wonder about the double icache/dcache lookup you mentioned... that could be part of the problem, and it's not necessary. There should be a real coherence protocol between the two caches, so that you only have to snoop the other cache in the same situations where you'd have to snoop another core's data cache... i.e., if either cache has an exclusive copy, or is only reading a shared copy, there's no point in doing that second lookup. Is there any notion in the protocol of having different protocol states for the icache and dcache? Steve
_______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev