On Sun, Jan 23, 2011 at 4:08 PM, Nilay Vaish <ni...@cs.wisc.edu> wrote:
> On Sun, 23 Jan 2011, Korey Sewell wrote: > > In sendFetch(), it calls sendTiming() which would then call the recvTiming >> on the cache port since those two should be binded as peers. >> >> I'm a little unsure of how the RubyPort, Sequencer, CacheMemory, and >> CacheController (?) relationship is working (right now at least), but the >> relationship between sendTiming and recvTiming is the key concept that >> connects 2 memory objects unless things have changed. >> >> On Sun, Jan 23, 2011 at 3:51 PM, Nilay Vaish <ni...@cs.wisc.edu> wrote: >> >> I dug more in to the code today. There are three paths along which calls >>> are made to the RubyPort::M5Port::recvTiming(), which eventually results >>> in >>> calls to CacheMemory::lookup(). >>> >>> 1. TimingSimpleCPU::sendFetch() - 140 million >>> 2. TimingSimpleCPU::handleReadPacket() - 30 million >>> 3. TimingSimpleCPU::handleWritePacket() - 18 million >>> >>> The number of times last two functions are called is very close to the >>> total number of memory references (48 million) for all the cpus together. >>> The number of lookup() calls is about 392 million. If we take into >>> account >>> the calls to sendFetch(), then the ratio of number of lookup() calls to >>> that >>> of the number of requests pushed in to ruby reduces to 2 to 1, from an >>> earlier estimate of 8 to 1. >>> >>> My question would be why does sendFetch() makes calls to recvTiming()? >>> >>> > Some more reading revealed that that sendFetch() is calling recvTiming for > instruction cache accesses. Whereas the other two calls (handleReadPacket > and handleWritePacket) are for data cache accesses. Yes, that's right. So there's probably no big win in trying to further reduce the number of calls to lookup() in Ruby; the possibilities I see for improvement are: 1. Adding an instruction buffer to SimpleCPU so we don't do a cache lookup on *every* instruction fetch 2. Trying again to make the lookup() calls themselves faster (for example, a lookup that hits the MRU block should really only take a handful of instructions, while IIRC we were seeing much larger costs for the hash table lookup) 3. Moving on to some other area (like the Histogram thing) #1 is not a Ruby issue, and could well be different under x86 since (1) x86 has a byte-stream-oriented predecoder so it doesn't do a fetch per instruction anyway and (2) you may have to worry about self-modifying code. Gabe, how many bytes at a time does the x86 predecoder fetch? If it doesn't currently grab a cache line at a time, could it be made to do so, and do you know if that would cause any issues with SMC? Nilay, I'd appreciate your comments on #2 and whether you think that's worth pursuing or should we move on to #3. Steve
_______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev