Re: [m5-dev] Profile Results for Mesh Network

Nilay Vaish Wed, 19 Jan 2011 15:56:57 -0800

Some more data from the same simulation, this time from m5out/stats.txt.The number of memory references is slightly more than 30,000,000 where asthe number of lookups in the cache is about 256,000,000. So that would bea ratio of 1 : 8.5. I suspect that the reason for this might be that everycache controller performs lookup for every memory reference.

It seems to me that only one thread was running all the time whichexecuted mostly on processors 0 and 4. Is the rate at which sim tick isincremented different from the processor clock rate? The simulation wasallowed to run for 200 billion sim ticks while the processor cycles isjust 400 million.

Given the way the processors behaved during this simulation, I think weshould profile a simulation in which all the processors are active most ofthe time.


--
Nilay

system.cpu0.not_idle_fraction 0.170783# Percentage of non-idle cyclessystem.cpu0.numCycles 399995398# number of cpu cycles simulatedsystem.cpu0.num_insts 9504409# Number of instructions executedsystem.cpu0.num_refs 2258648# Number of memory references

system.cpu1.not_idle_fraction 0.002503# Percentage of non-idle cyclessystem.cpu1.numCycles 398441993# number of cpu cycles simulatedsystem.cpu1.num_insts 212019# Number of instructions executedsystem.cpu1.num_refs 71465# Number of memory references


CPUs 2 and 3 behavior is same as 1.

system.cpu4.not_idle_fraction 0.823388# Percentage of non-idle cyclessystem.cpu4.numCycles 400000000# number of cpu cycles simulatedsystem.cpu4.num_insts 80686910# Number of instructions executedsystem.cpu4.num_refs 28047398# Number of memory references

system.cpu5.not_idle_fraction 0.000185# Percentage of non-idle cyclessystem.cpu5.numCycles 399900914# number of cpu cycles simulatedsystem.cpu5.num_insts 8413# Number of instructions executedsystem.cpu5.num_refs 2862# Number of memory references


CPUs 6 and 7 also have stats similar to 5.



On Wed, 19 Jan 2011, Steve Reinhardt wrote:

What's the deal with Histogram::add()?  Either it's too slow or it's being
called too much, I'd say, unless we're tracking some incredibly vital
statistics there.  Can you use the call graph part of the profile to find
where most of the calls are coming from?

Also, can you look at the stats and verify that the number of calls to
CacheMemory::lookup() is not much larger than the total number of CPU memory
references?

Another thing to note is that even though PerfectSwitch::wakeup() is not a
huge amount of time, the time per call is pretty large, so there may be
opportunities for a lot of improvement there.

Thanks,

Steve

On Wed, Jan 19, 2011 at 9:20 AM, Nilay Vaish <ni...@cs.wisc.edu> wrote:

I profiled m5 again, using the following command.

./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py
--maxtick 200000000000 -n 8 --topology Mesh --mesh-rows 2 --num-l2cache 8
--num-dir 8

Results have been copied below. CacheMemory::lookup() still consumes some
time but is lot less than before. PrefectSwitch can be a candidate for
re-design. But given that, PerfectSwitch is taking only 3% of the time, it
would not yield that much gain. May be we need to look at these different
functions in a more holistic fashion.

--
Nilay


 %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 7.29     35.89    35.89 606750284     0.00     0.00  Histogram::add(long
long)
 5.84     64.62    28.73 256533483     0.00     0.00
CacheMemory::lookup(Address const&)
 4.56     87.06    22.44 124360139     0.00     0.00
L1Cache_Controller::wakeup()
 3.88    106.18    19.12 121283110     0.00     0.00
RubyPort::M5Port::recvTiming(Packet*)
 3.13    121.60    15.42  6875704     0.00     0.00 PerfectSwitch::wakeup()
 3.00    136.38    14.78 39527382     0.00     0.00
MemoryControl::executeCycle()
 2.95    150.91    14.53 90855686     0.00     0.00
BaseSimpleCPU::preExecute()
 2.68    164.10    13.19 147750111     0.00     0.00
MessageBuffer::enqueue(RefCountingPtr<Message>, long long)
 2.41    175.96    11.86 180281626     0.00     0.00
RubyEventQueueNode::process()
 2.07    186.17    10.21 302054741     0.00     0.00
EventQueue::serviceOne()
 2.03    196.15     9.98 121176116     0.00     0.00
Sequencer::getRequestStatus(RubyRequest const&)

_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] Profile Results for Mesh Network

Reply via email to