Some more data from the same simulation, this time from m5out/stats.txt. The number of memory references is slightly more than 30,000,000 where as the number of lookups in the cache is about 256,000,000. So that would be a ratio of 1 : 8.5. I suspect that the reason for this might be that every cache controller performs lookup for every memory reference.

It seems to me that only one thread was running all the time which executed mostly on processors 0 and 4. Is the rate at which sim tick is incremented different from the processor clock rate? The simulation was allowed to run for 200 billion sim ticks while the processor cycles is just 400 million.

Given the way the processors behaved during this simulation, I think we should profile a simulation in which all the processors are active most of the time.

--
Nilay


system.cpu0.not_idle_fraction 0.170783 # Percentage of non-idle cycles system.cpu0.numCycles 399995398 # number of cpu cycles simulated system.cpu0.num_insts 9504409 # Number of instructions executed system.cpu0.num_refs 2258648 # Number of memory references

system.cpu1.not_idle_fraction 0.002503 # Percentage of non-idle cycles system.cpu1.numCycles 398441993 # number of cpu cycles simulated system.cpu1.num_insts 212019 # Number of instructions executed system.cpu1.num_refs 71465 # Number of memory references

CPUs 2 and 3 behavior is same as 1.

system.cpu4.not_idle_fraction 0.823388 # Percentage of non-idle cycles system.cpu4.numCycles 400000000 # number of cpu cycles simulated system.cpu4.num_insts 80686910 # Number of instructions executed system.cpu4.num_refs 28047398 # Number of memory references

system.cpu5.not_idle_fraction 0.000185 # Percentage of non-idle cycles system.cpu5.numCycles 399900914 # number of cpu cycles simulated system.cpu5.num_insts 8413 # Number of instructions executed system.cpu5.num_refs 2862 # Number of memory references

CPUs 6 and 7 also have stats similar to 5.



On Wed, 19 Jan 2011, Steve Reinhardt wrote:

What's the deal with Histogram::add()?  Either it's too slow or it's being
called too much, I'd say, unless we're tracking some incredibly vital
statistics there.  Can you use the call graph part of the profile to find
where most of the calls are coming from?

Also, can you look at the stats and verify that the number of calls to
CacheMemory::lookup() is not much larger than the total number of CPU memory
references?

Another thing to note is that even though PerfectSwitch::wakeup() is not a
huge amount of time, the time per call is pretty large, so there may be
opportunities for a lot of improvement there.

Thanks,

Steve

On Wed, Jan 19, 2011 at 9:20 AM, Nilay Vaish <ni...@cs.wisc.edu> wrote:

I profiled m5 again, using the following command.

./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py
--maxtick 200000000000 -n 8 --topology Mesh --mesh-rows 2 --num-l2cache 8
--num-dir 8

Results have been copied below. CacheMemory::lookup() still consumes some
time but is lot less than before. PrefectSwitch can be a candidate for
re-design. But given that, PerfectSwitch is taking only 3% of the time, it
would not yield that much gain. May be we need to look at these different
functions in a more holistic fashion.

--
Nilay


 %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 7.29     35.89    35.89 606750284     0.00     0.00  Histogram::add(long
long)
 5.84     64.62    28.73 256533483     0.00     0.00
CacheMemory::lookup(Address const&)
 4.56     87.06    22.44 124360139     0.00     0.00
L1Cache_Controller::wakeup()
 3.88    106.18    19.12 121283110     0.00     0.00
RubyPort::M5Port::recvTiming(Packet*)
 3.13    121.60    15.42  6875704     0.00     0.00 PerfectSwitch::wakeup()
 3.00    136.38    14.78 39527382     0.00     0.00
MemoryControl::executeCycle()
 2.95    150.91    14.53 90855686     0.00     0.00
BaseSimpleCPU::preExecute()
 2.68    164.10    13.19 147750111     0.00     0.00
MessageBuffer::enqueue(RefCountingPtr<Message>, long long)
 2.41    175.96    11.86 180281626     0.00     0.00
RubyEventQueueNode::process()
 2.07    186.17    10.21 302054741     0.00     0.00
EventQueue::serviceOne()
 2.03    196.15     9.98 121176116     0.00     0.00
Sequencer::getRequestStatus(RubyRequest const&)


_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to