In SE mode, there are going to be at least small (and perhaps large)
differences between how a program runs in gem5 vs a real system. Some of
those will be from system calls which are not implemented exactly right in
gem5, or are slightly different because they're actually happening on your
host machine. Some will inevitably be different, like if the program checks
the system time. Comparing execution one to one like this can help find
bugs, but it has to be very carefully tuned and may never give usable
information. That's also assuming that all the system calls the program
uses are even implemented in gem5 to begin with.

Gabe

On Wed, Jan 5, 2022 at 9:47 PM Balazs Gerofi via gem5-users <
gem5-users@gem5.org> wrote:

> Dear gem5 users/devs,
>
> We are using gem5 for some architectural exploration on top of the Fujitsu
> A64FX chip (based on the O3_Arm model).
> We are running into a mysterious error, where a relatively simple program
> fails with a page fault on gem5, but it runs fine on the real hardware.
> With a few extensions to gem5's gdb stub to make it work in SE mode we
> narrowed down the issue to the following problem.
>
> The application code around the failure is:
>
>    0x00000000004041ec <+1828>:  b.le    0x4046d4 <diff+3084>
>    0x00000000004041f0 <+1832>:  and     x14, x10, #0x7
>    0x00000000004041f4 <+1836>:  adrp    x13, 0x422000 <matrix+3964>
>    0x00000000004041f8 <+1840>:  ldr     x0, [x13, #160]
>    0x00000000004041fc <+1844>:  neg     x17, x14
>    0x0000000000404200 <+1848>:  str     x17, [sp, #120]
>    0x0000000000404204 <+1852>:  ldr     w17, [sp, #104]
>    0x0000000000404208 <+1856>:  sxtw    x12, w2
>
> Setting a breakpoint to 0x0000000000404208, we observe the following
> register state:
> Breakpoint 2, 0x0000000000404208 in diff () at
> omp-tasks/alignment/alignment_for/alignment.c:338
> 338           RR[N] = hh = t = t-gh;
> (gdb) i r
> x0             0x432380            4400000
> x1             0x438365            4424549
> x2             0x4b                75
> x3             0x26                38
> x4             0xffffffa6          4294967206
> x5             0x0                 0
> x6             0x421084            4329604
> x7             0x7ffffe0b68        549755685736
> x8             0x26                38
> x9             0xffffffdb          4294967259
> x10            0x26                38
> x11            0x0                 0
> x12            0xffffffda          4294967258
> x13            0x422000            4333568
> x14            0x6                 6
> x15            0x7ffffcd1f0        549755605488
> x16            0x0                 0
> x17            0x0                 0
> x18            0x7ffffd1fe8        549755625448
> x19            0xfffffe4f          4294966863
> x20            0xfffff9bb          4294965691
> x21            0xfffffd31          4294966577
> x22            0x43aeb8            4435640
> x23            0xfffffc23          4294966307
> x24            0xfffff9bb          4294965691
> x25            0x1                 1
> x26            0x956               2390
> x27            0x25                37
> x28            0x7ffffd6ecc        549755645644
> x29            0x7ffffdbcec        549755665644
> x30            0x0                 0
> sp             0x7ffffcd170        0x7ffffcd170
> pc             0x404208            0x404208 <diff+1856>
> cpsr           0x0                 [ EL=0 ]
> fpsr           <unavailable>
> fpcr           <unavailable>
>
> As seen from the code above, the previous instruction (at
> 0x0000000000404204) loads $sp+104 into $w17, however, the memory contents
> and the register value differ:
> (gdb) p *(int *)($sp+104)
> $651 = 1
> (gdb) p $w17
> $652 = 0
>
> Setting a breakpoint to the same location on real HW we get 1 in $w17 (as
> expected based on the memory content).
>
> Any idea/suggestion what might be causing this and how we could
> potentially fix it?
>
>
> Thanks and bests,
> Balazs
>
> -------------
> Balazs Gerofi
> Research Scientist
> High-Performance Artificial Intelligence Systems Research Team
> RIKEN Center for Computational Science, Japan
> https://bgerofi.github.io/
> _______________________________________________
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to