I am using sim_ticks for my experiments, and other metrics derived from
that. I am calculating DRAM power and latency. In fact, the "cycles*"
(like cycles_all_precharge_nCKE, etc)
metrics used in the dram.cc are incorrect, since currently they calculate
the difference between curTick-busy_until[current_bank], (or something
similar) which are in
"Ticks", running on the sim_ticks clocks. Whereas they should actually be
measuring the number of memory clock (or system) cycles.
A simple fix is:
sys_freq = cpu_freq / cpu_ratio;
sys_cycles = (cycles * sys_freq) / 1E12; //since sim_ticks is for 1THz
Let me know if this is incorrect.
- Sujay
On Wed, 26 Sep 2007, Jonas Diemer wrote:
Hi,
Thanks for the fix, Kevin. The stats look much better now. Here are the
results of the very same experiments with your patch included:
Run 1:
sim_insts 13149912
sim_ticks 7559300000
system.cpu0.cpi_total 1.149828
system.cpu0.numCycles 15120138
Run 2:
sim_insts 13146947
sim_ticks 5679495000
system.cpu0.cpi_total 0.864125
system.cpu0.numCycles 11360602
Now, CPI and numCycles (almost) exactly correlate with sim_ticks.
I wouldn't call that bug a slight descripancy, though. It basically rendered
the CPI (and IPC) metrics invalid up to now - correct me if I judge this
wrong. I just want to point this out clearly since other people may have
relied on these metrics in their experiments.
Also, can someone please make sure that this goes into the main tree?
Thanks again.
- Jonas
On Tuesday 25 September 2007 21:48:21 Kevin Lim wrote:
Actually my math is still a little off. It'll give you ticks and not
necessarily cycles. I've attached a patch that should give you cycles
instead.
%: diff -u m5-2.0b3/src/cpu/o3/ m5-2.0b3-fixed/src/cpu/o3/
diff -u m5-2.0b3/src/cpu/o3/cpu.cc m5-2.0b3-fixed/src/cpu/o3/cpu.cc
--- m5-2.0b3/src/cpu/o3/cpu.cc 2007-04-18 17:55:27.000000000 -0400
+++ m5-2.0b3-fixed/src/cpu/o3/cpu.cc 2007-09-25 15:42:56.000000000 -0400
@@ -1393,7 +1393,8 @@
DPRINTF(Activity, "Waking up CPU\n");
- idleCycles += (curTick - 1) - lastRunningCycle;
+ idleCycles += ((curTick - 1) - lastRunningCycle) / this->clock;
+ numCycles += ((curTick - 1) - lastRunningCycle) / this->clock;
tickEvent.schedule(nextCycle());
}
Kevin Lim wrote:
Hi Jonas,
This is a slight discrepancy in how the model tracks the total number
of cycles the CPU has been running. For efficiency we let the
detailed CPU not schedule itself on the event queue if it has no
activity (e.g. it's waiting on a long cache miss and the ROB fills up,
or it can't fetch any new instructions). The numCycles statistic only
tracks cycles where the CPU is active, which now that I think about
it, is really just a function of the model and isn't fully correlated
with reality. There's another statistic, idleCycles, that shows the
number of cycles that the CPU has spent inactive. Hopefully between
your two runs you'll find that while numCycles is similar, run 1 (with
the slower memory) should have a much higher idleCycles. CPI and IPC
really should be calculated using the sum of numCycles and idleCycles
to provide the proper number of cycles that the CPU has
You should be able to fix this by adding this line in cpu.cc, around
line 1396:
numCycles += (curTick - 1) - lastRunningCycle;
This'll make numCycles properly add in the number of cycles that the
CPU was spent inactive.
I hope this helps,
Kevin
Jonas Diemer wrote:
Hi,
I have been doing some benchmarks (h.264 decoder from MediaBench2) on
a single core system with L1 and L2 caches and varying memory
bandwidth in SE mode.
I couldn't make sense of some stats:
Run 1:
sim_insts 13149912
sim_ticks 7559300000
system.cpu0.cpi_total 0.910270
system.cpu0.numCycles 11969969
Run 2:
sim_insts 13146947
sim_ticks 5679495000
system.cpu0.cpi_total 0.854693
system.cpu0.numCycles 11236603
For both runs, I ran the benchmark to completion (sim_insts are
almost identical - why not perfectly the same?). The only difference
is the memory bus clock speed (run 1 has about 10x less memory
bandwidth).
The two runs have a very different (>30%) sim_ticks (which I believed
was the number of clock cycles from start to end of simulations, thus
should be a measure for "total execution time"), but the cpi and
numCycles differ much less (<7%).
How can this mismatch be explained?
Regards,
Jonas.
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
--
Dipl.-Ing. Jonas Diemer
Institut für Datentechnik und Kommunikationsnetze
(Institute of Computer and Communication Network Engineering)
Hans-Sommer-Str. 66
D-38106 Braunschweig
Germany
Telefon: +49 531 391 3752
Telefax: +49 531 391 4587
E-Mail: [EMAIL PROTECTED]
Web: http://www.ida.ing.tu-bs.de/
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users