Hi Yongbing,
I don't have any 100% solution for you but have a few questions which may help
you to localize the problem:
(1) Which type of model (functional or arm_detailed) do you run to collect the
stats?
Theoretically you should run 'arm_detailed' to take into account
speculative misses.
(2) 'arm_detailed' uses 64B cache line while Cortex-A9 has 32B cache line.
Did you take this into account (i.e. changed 'arm_detailed' cache line size
to 32B)?
(3) Not sure about Chromium but browsers in general may use GPU for compositing
on real HW and execution path will be different comparing to SW-only BBench in
gem5.
Orangeade
Yongbing wrote:
Hi all,
I recently compared the micro-architectural metrics such as L1
cache miss collected by gem5 with that collected by performance counters on
real ARM platform. I found that their difference was so big. For example,
the Icache miss rate per 1k instruction of bbench was about 30 collected by
hardware performance counters (referring to the paper published by Anthony
Gutierrez in IISWC'2011), but only about 3 for gem5. It's about 10x
difference.
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users