On Sun, Jan 23, 2011 at 4:08 PM, Nilay Vaish <ni...@cs.wisc.edu> wrote:

> On Sun, 23 Jan 2011, Korey Sewell wrote:
>
>  In sendFetch(), it calls sendTiming() which would then call the recvTiming
>> on the cache port since those two should be binded as peers.
>>
>> I'm a little unsure of how the RubyPort, Sequencer, CacheMemory, and
>> CacheController (?) relationship is working (right now at least), but the
>> relationship between sendTiming and recvTiming is the key concept that
>> connects 2 memory objects unless things have changed.
>>
>> On Sun, Jan 23, 2011 at 3:51 PM, Nilay Vaish <ni...@cs.wisc.edu> wrote:
>>
>>  I dug more in to the code today. There are three paths along which calls
>>> are made to the RubyPort::M5Port::recvTiming(), which eventually results
>>> in
>>> calls to CacheMemory::lookup().
>>>
>>> 1. TimingSimpleCPU::sendFetch() - 140 million
>>> 2. TimingSimpleCPU::handleReadPacket() - 30 million
>>> 3. TimingSimpleCPU::handleWritePacket() - 18 million
>>>
>>> The number of times last two functions are called is very close to the
>>> total number of memory references (48 million) for all the cpus together.
>>> The number of lookup() calls is about 392 million. If we take into
>>> account
>>> the calls to sendFetch(), then the ratio of number of lookup() calls to
>>> that
>>> of the number of requests pushed in to ruby reduces to 2 to 1, from an
>>> earlier estimate of 8 to 1.
>>>
>>> My question would be why does sendFetch() makes calls to recvTiming()?
>>>
>>>
> Some more reading revealed that that sendFetch() is calling recvTiming for
> instruction cache accesses. Whereas the other two calls (handleReadPacket
> and handleWritePacket) are for data cache accesses.


Yes, that's right.  So there's probably no big win in trying to further
reduce the number of calls to lookup() in Ruby; the possibilities I see for
improvement are:
1. Adding an instruction buffer to SimpleCPU so we don't do a cache lookup
on *every* instruction fetch
2. Trying again to make the lookup() calls themselves faster (for example, a
lookup that hits the MRU block should really only take a handful of
instructions, while IIRC we were seeing much larger costs for the hash table
lookup)
3. Moving on to some other area (like the Histogram thing)

#1 is not a Ruby issue, and could well be different under x86 since (1) x86
has a byte-stream-oriented predecoder so it doesn't do a fetch per
instruction anyway and (2) you may have to worry about self-modifying code.
 Gabe, how many bytes at a time does the x86 predecoder fetch?  If it
doesn't currently grab a cache line at a time, could it be made to do so,
and do you know if that would cause any issues with SMC?

Nilay, I'd appreciate your comments on #2 and whether you think that's worth
pursuing or should we move on to #3.

Steve
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to