Good afternoon,

I have been trying to use Gem5 to research and study the performance of several different computer architectures. However, I have been noticing that I may be unable to accurately model the differences in cycle length for computer programs.

Take for example these two programs:

#include <stdint.h>

int main(void)
{
    for (uint32_t i = 0; i < 1000; i++) {
        uint32_t x = 5 * 6;
        if (x != 30) {
            return 1;
        }
    }
    return 0;
}

#include <stdint.h>

int main(void)
{
    for (uint32_t i = 0; i < 1000; i++) {
        uint32_t x = 5 + 6;
        if (x != 11) {
            return 1;
        }
    }
    return 0;
}

Compiling and running both individually on a basic RISC-V CPU config, they both exit at exactly 1,297,721,000. However, in a real system, each multiply operation would take longer and I'd suspect doing 1000 multiplications would have even a tiny difference in performance. My own research would also have difficulties analyzing relative performance unless I'm missing something.

Even custom instructions seem to execute in a single CPU cycle regardless of how the hardware would be implemented.

Is there a good way to define cycle delays in my Gem5 environment? I can implement a "multiply" function inserts a bunch of no-ops, but that would make it more complicated when the program complexity grows.

I've written a small blog post <https://fleker.medium.com/modeling-memristors-to-execute-physically-accurate-imply-operations-in-gem5-ef888b7dc49b> exploring some of what I've tried in the past week. If anyone here has any suggestions I'd be interested to hear them.

Thanks,

Nick
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

Reply via email to