Hi,

   We are a running a couple of mobile benchmarks on the 'arm_detailed' cpu 
model. We were trying to understand why the fetch rate is much lower than the 
peak value of 3, had a few doubts regarding the cpu stats.


The stats show the fetch rate to be 0.68 and the 'fetch_rate:dist' gives a mean 
value of 0.79. I'm guessing this should be a cumulative effect of the 
Icache/ITLB Misses and any stalls down the pipeline propagated back. The 
rate:dist says that for 62% of active cycles the fetch is stalled, and this 
number comes to be 10715080955.


When I add up the Icache/ITLB stalls, squash cycles and sum up all of the 
ROB/IQ/Reg full events which I think should be an upper bound over stalls in 
issue, the number comes to be 3002034153 , which falls short of the fetch 
stalled cycles. I'm not able to understand the reason for the gap and also to 
understand which is a bigger bottleneck for the system. Should I be summing 
some other factors also, Also is it right to assume ROB etc full event counts 
to be same as cycles for which these structures were full and blocking issue?


below is a snippet of the stats I'm seeing:


system.cpu.fetch.icacheStallCycles          609617403                       # 
Number of cycles fetch is stalled on an Icache miss

system.cpu.fetch.Cycles                   16092927838                       # 
Number of cycles fetch has run and was not squashing or locked
system.cpu.fetch.SquashCycles               172209058                       # 
Number of cycles fetch has spent squashing
system.cpu.fetch.TlbCycles                   93020890                       # 
Number of cycles fetch has spent waiting for tlb
system.cpu.fetch.MiscStallCycles              3247256                       # 
Number of cycles fetch has spent waiting on interrupts, or bad addresses, or 
out of MSHRs

system.cpu.fetch.PendingTrapStallCycles      12272540                       # 
Number of stall cycles due to pending traps
system.cpu.fetch.PendingQuiesceStallCycles    130461040                       # 
Number of stall cycles due to pending quiesce instructions
system.cpu.fetch.IcacheWaitRetryStallCycles      5776185                       
# Number of stall cycles due to full MSHR

system.cpu.fetch.IcacheSquashes               8472461                       # 
Number of outstanding Icache misses that were squashed
system.cpu.fetch.ItlbSquashes                 1248005                       # 
Number of outstanding ITLB misses that were squashed


system.cpu.fetch.rateDist::samples        17033427681                       # 
Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::mean              0.791584                       # 
Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::stdev             1.174687                       # 
Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::0              10715080955     62.91%     62.91% # 
Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::1               2335316944     13.71%     76.62% # 
Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::2                801023396      4.70%     81.32% # 
Number of instructions fetched each cycle (Total)
system.cpu.fetch.rateDist::3               3182006386     18.68%    100.00% # 
Number of instructions fetched each cycle (Total)

system.cpu.fetch.rateDist::total          17033427681                       # 
Number of instructions fetched each cycle (Total)
system.cpu.fetch.branchRate                  0.086368                       # 
Number of branch fetches per cycle

system.cpu.fetch.rate                        0.680447                       # 
Number of inst fetches per cycle


system.cpu.rename.IdleCycles               1667481057                       # 
Number of cycles rename is idle
system.cpu.rename.BlockCycles              6024965503                       # 
Number of cycles rename is blocking
system.cpu.rename.serializeStallCycles     1032188263                       # 
count of cycles rename stalled for serializing inst
system.cpu.rename.RunCycles                4524460126                       # 
Number of cycles rename is running
system.cpu.rename.UnblockCycles            3710459723                       # 
Number of cycles rename is unblocking
system.cpu.rename.RenamedInsts            11991651016                       # 
Number of instructions processed by rename
system.cpu.rename.SquashedInsts             116686316                       # 
Number of squashed instructions processed by rename
system.cpu.rename.ROBFullEvents            1128354270                       # 
Number of times rename has blocked due to ROB full
system.cpu.rename.IQFullEvents              720135276                       # 
Number of times rename has blocked due to IQ full
system.cpu.rename.LQFullEvents             2532836436                       # 
Number of times rename has blocked due to LQ full
system.cpu.rename.SQFullEvents              380062496                       # 
Number of times rename has blocked due to SQ full
system.cpu.rename.FullRegisterEvents        269673815                       # 
Number of times there has been no free register


(sorry about the verbose stats.)

Would be glad if somebody could help me understand how the stalls split up.


Thanks,


Best Regards,

Brian Coutinho
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to