Good afternoon,
I have been trying to use Gem5 to research and study the performance of
several different computer architectures. However, I have been noticing
that I may be unable to accurately model the differences in cycle length
for computer programs.
Take for example these two programs:
#include <stdint.h>
int main(void)
{
for (uint32_t i = 0; i < 1000; i++) {
uint32_t x = 5 * 6;
if (x != 30) {
return 1;
}
}
return 0;
}
#include <stdint.h>
int main(void)
{
for (uint32_t i = 0; i < 1000; i++) {
uint32_t x = 5 + 6;
if (x != 11) {
return 1;
}
}
return 0;
}
Compiling and running both individually on a basic RISC-V CPU config,
they both exit at exactly 1,297,721,000. However, in a real system, each
multiply operation would take longer and I'd suspect doing 1000
multiplications would have even a tiny difference in performance. My own
research would also have difficulties analyzing relative performance
unless I'm missing something.
Even custom instructions seem to execute in a single CPU cycle
regardless of how the hardware would be implemented.
Is there a good way to define cycle delays in my Gem5 environment? I can
implement a "multiply" function inserts a bunch of no-ops, but that
would make it more complicated when the program complexity grows.
I've written a small blog post
<https://fleker.medium.com/modeling-memristors-to-execute-physically-accurate-imply-operations-in-gem5-ef888b7dc49b>
exploring some of what I've tried in the past week. If anyone here has
any suggestions I'd be interested to hear them.
Thanks,
Nick
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org