Hello, Depending on the ILP of the code and the depth of your EXE stage (indirectly modeled in gem5 through instruction latencies) an IPC around 0.7 can be a good performance. Sometimes the issue is blocking just because it is waiting for results. I'm not sure if such stalls explicitly appear somewhere in the stats.txt, but it could be an interesting stat to be added.
Regards, -- Fernando A. Endo, PhD student and researcher Université de Grenoble, UJF France 2015-04-12 14:49 GMT+02:00 Gedare Bloom <[email protected]>: > Events != cycles. > > And you may count rename.block cycles + rename.unblockcycles + > rename.serializecycles > > > On April 12, 2015, at 1:01 AM, Brian Coutinho <[email protected]> wrote: > > > > Hi, > > > We are a running a couple of mobile benchmarks on the 'arm_detailed' > cpu model. We were trying to understand why the fetch rate is much lower > than the peak value of 3, had a few doubts regarding the cpu stats. > > > The stats show the fetch rate to be 0.68 and the 'fetch_rate:dist' gives > a mean value of 0.79. I'm guessing this should be a cumulative effect of > the Icache/ITLB Misses and any stalls down the pipeline propagated back. > The rate:dist says that for 62% of active cycles the fetch is stalled, and > this number comes to be 10715080955. > > > When I add up the Icache/ITLB stalls, squash cycles and sum up all of > the ROB/IQ/Reg full events which I think should be an upper bound over > stalls in issue, the number comes to be 3002034153 , which falls short of > the fetch stalled cycles. I'm not able to understand the reason for the gap > and also to understand which is a bigger bottleneck for the > system. Should I be summing some other factors also, Also is it right to > assume ROB etc full event counts to be same as cycles for which these > structures were full and blocking issue? > > > below is a snippet of the stats I'm seeing: > > > system.cpu.fetch.icacheStallCycles > 609617403 # Number of cycles fetch is stalled on an > Icache miss > > system.cpu.fetch.Cycles 16092927838 > # Number of cycles fetch has run and was not squashing or > locked > > system.cpu.fetch.SquashCycles > 172209058 # Number of cycles fetch has spent squashing > system.cpu.fetch.TlbCycles > 93020890 # Number of cycles fetch has spent waiting > for tlb > system.cpu.fetch.MiscStallCycles > 3247256 # Number of cycles fetch has spent waiting on > interrupts, or bad addresses, or out of MSHRs > > system.cpu.fetch.PendingTrapStallCycles > 12272540 # Number of stall cycles due to pending traps > system.cpu.fetch.PendingQuiesceStallCycles > 130461040 # Number of stall cycles due to pending > quiesce instructions > system.cpu.fetch.IcacheWaitRetryStallCycles > 5776185 # Number of stall cycles due to full MSHR > > system.cpu.fetch.IcacheSquashes > 8472461 # Number of outstanding Icache misses that > were squashed > system.cpu.fetch.ItlbSquashes > 1248005 # Number of outstanding ITLB misses that were > squashed > > > system.cpu.fetch.rateDist::samples 17033427681 > # Number of instructions fetched each cycle (Total) > system.cpu.fetch.rateDist::mean > 0.791584 # Number of instructions fetched each cycle > (Total) > system.cpu.fetch.rateDist::stdev > 1.174687 # Number of instructions fetched each cycle > (Total) > system.cpu.fetch.rateDist::0 10715080955 62.91% > 62.91% # Number of instructions fetched each cycle (Total) > system.cpu.fetch.rateDist::1 2335316944 13.71% > 76.62% # Number of instructions fetched each cycle (Total) > system.cpu.fetch.rateDist::2 801023396 4.70% > 81.32% # Number of instructions fetched each cycle (Total) > system.cpu.fetch.rateDist::3 3182006386 18.68% > 100.00% # Number of instructions fetched each cycle (Total) > > system.cpu.fetch.rateDist::total 17033427681 > # Number of instructions fetched each cycle (Total) > system.cpu.fetch.branchRate > 0.086368 # Number of branch fetches per cycle > > system.cpu.fetch.rate > 0.680447 # Number of inst fetches per cycle > > > system.cpu.rename.IdleCycles > 1667481057 # Number of cycles rename is idle > system.cpu.rename.BlockCycles 6024965503 > # Number of cycles rename is blocking > system.cpu.rename.serializeStallCycles > 1032188263 # count of cycles rename stalled for > serializing inst > system.cpu.rename.RunCycles > 4524460126 # Number of cycles rename is running > system.cpu.rename.UnblockCycles > 3710459723 # Number of cycles rename is unblocking > system.cpu.rename.RenamedInsts > 11991651016 # Number of instructions processed by > rename > system.cpu.rename.SquashedInsts > 116686316 # Number of squashed instructions processed > by rename > system.cpu.rename.ROBFullEvents > 1128354270 # Number of times rename has blocked due > to ROB full > system.cpu.rename.IQFullEvents > 720135276 # Number of times rename has blocked due to > IQ full > system.cpu.rename.LQFullEvents 2532836436 > # Number of times rename has blocked due to LQ full > system.cpu.rename.SQFullEvents > 380062496 # Number of times rename has blocked due to > SQ full > system.cpu.rename.FullRegisterEvents > 269673815 # Number of times there has been no free > register > > > (sorry about the verbose stats.) > > Would be glad if somebody could help me understand how the stalls split up. > > > Thanks, > > > Best Regards, > > Brian Coutinho > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
