In SE mode, there are going to be at least small (and perhaps large) differences between how a program runs in gem5 vs a real system. Some of those will be from system calls which are not implemented exactly right in gem5, or are slightly different because they're actually happening on your host machine. Some will inevitably be different, like if the program checks the system time. Comparing execution one to one like this can help find bugs, but it has to be very carefully tuned and may never give usable information. That's also assuming that all the system calls the program uses are even implemented in gem5 to begin with.
Gabe On Wed, Jan 5, 2022 at 9:47 PM Balazs Gerofi via gem5-users < gem5-users@gem5.org> wrote: > Dear gem5 users/devs, > > We are using gem5 for some architectural exploration on top of the Fujitsu > A64FX chip (based on the O3_Arm model). > We are running into a mysterious error, where a relatively simple program > fails with a page fault on gem5, but it runs fine on the real hardware. > With a few extensions to gem5's gdb stub to make it work in SE mode we > narrowed down the issue to the following problem. > > The application code around the failure is: > > 0x00000000004041ec <+1828>: b.le 0x4046d4 <diff+3084> > 0x00000000004041f0 <+1832>: and x14, x10, #0x7 > 0x00000000004041f4 <+1836>: adrp x13, 0x422000 <matrix+3964> > 0x00000000004041f8 <+1840>: ldr x0, [x13, #160] > 0x00000000004041fc <+1844>: neg x17, x14 > 0x0000000000404200 <+1848>: str x17, [sp, #120] > 0x0000000000404204 <+1852>: ldr w17, [sp, #104] > 0x0000000000404208 <+1856>: sxtw x12, w2 > > Setting a breakpoint to 0x0000000000404208, we observe the following > register state: > Breakpoint 2, 0x0000000000404208 in diff () at > omp-tasks/alignment/alignment_for/alignment.c:338 > 338 RR[N] = hh = t = t-gh; > (gdb) i r > x0 0x432380 4400000 > x1 0x438365 4424549 > x2 0x4b 75 > x3 0x26 38 > x4 0xffffffa6 4294967206 > x5 0x0 0 > x6 0x421084 4329604 > x7 0x7ffffe0b68 549755685736 > x8 0x26 38 > x9 0xffffffdb 4294967259 > x10 0x26 38 > x11 0x0 0 > x12 0xffffffda 4294967258 > x13 0x422000 4333568 > x14 0x6 6 > x15 0x7ffffcd1f0 549755605488 > x16 0x0 0 > x17 0x0 0 > x18 0x7ffffd1fe8 549755625448 > x19 0xfffffe4f 4294966863 > x20 0xfffff9bb 4294965691 > x21 0xfffffd31 4294966577 > x22 0x43aeb8 4435640 > x23 0xfffffc23 4294966307 > x24 0xfffff9bb 4294965691 > x25 0x1 1 > x26 0x956 2390 > x27 0x25 37 > x28 0x7ffffd6ecc 549755645644 > x29 0x7ffffdbcec 549755665644 > x30 0x0 0 > sp 0x7ffffcd170 0x7ffffcd170 > pc 0x404208 0x404208 <diff+1856> > cpsr 0x0 [ EL=0 ] > fpsr <unavailable> > fpcr <unavailable> > > As seen from the code above, the previous instruction (at > 0x0000000000404204) loads $sp+104 into $w17, however, the memory contents > and the register value differ: > (gdb) p *(int *)($sp+104) > $651 = 1 > (gdb) p $w17 > $652 = 0 > > Setting a breakpoint to the same location on real HW we get 1 in $w17 (as > expected based on the memory content). > > Any idea/suggestion what might be causing this and how we could > potentially fix it? > > > Thanks and bests, > Balazs > > ------------- > Balazs Gerofi > Research Scientist > High-Performance Artificial Intelligence Systems Research Team > RIKEN Center for Computational Science, Japan > https://bgerofi.github.io/ > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-le...@gem5.org > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s >
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s