Dear gem5 users/devs, We are using gem5 for some architectural exploration on top of the Fujitsu A64FX chip (based on the O3_Arm model). We are running into a mysterious error, where a relatively simple program fails with a page fault on gem5, but it runs fine on the real hardware. With a few extensions to gem5's gdb stub to make it work in SE mode we narrowed down the issue to the following problem.
The application code around the failure is: 0x00000000004041ec <+1828>: b.le 0x4046d4 <diff+3084> 0x00000000004041f0 <+1832>: and x14, x10, #0x7 0x00000000004041f4 <+1836>: adrp x13, 0x422000 <matrix+3964> 0x00000000004041f8 <+1840>: ldr x0, [x13, #160] 0x00000000004041fc <+1844>: neg x17, x14 0x0000000000404200 <+1848>: str x17, [sp, #120] 0x0000000000404204 <+1852>: ldr w17, [sp, #104] 0x0000000000404208 <+1856>: sxtw x12, w2 Setting a breakpoint to 0x0000000000404208, we observe the following register state: Breakpoint 2, 0x0000000000404208 in diff () at omp-tasks/alignment/alignment_for/alignment.c:338 338 RR[N] = hh = t = t-gh; (gdb) i r x0 0x432380 4400000 x1 0x438365 4424549 x2 0x4b 75 x3 0x26 38 x4 0xffffffa6 4294967206 x5 0x0 0 x6 0x421084 4329604 x7 0x7ffffe0b68 549755685736 x8 0x26 38 x9 0xffffffdb 4294967259 x10 0x26 38 x11 0x0 0 x12 0xffffffda 4294967258 x13 0x422000 4333568 x14 0x6 6 x15 0x7ffffcd1f0 549755605488 x16 0x0 0 x17 0x0 0 x18 0x7ffffd1fe8 549755625448 x19 0xfffffe4f 4294966863 x20 0xfffff9bb 4294965691 x21 0xfffffd31 4294966577 x22 0x43aeb8 4435640 x23 0xfffffc23 4294966307 x24 0xfffff9bb 4294965691 x25 0x1 1 x26 0x956 2390 x27 0x25 37 x28 0x7ffffd6ecc 549755645644 x29 0x7ffffdbcec 549755665644 x30 0x0 0 sp 0x7ffffcd170 0x7ffffcd170 pc 0x404208 0x404208 <diff+1856> cpsr 0x0 [ EL=0 ] fpsr <unavailable> fpcr <unavailable> As seen from the code above, the previous instruction (at 0x0000000000404204) loads $sp+104 into $w17, however, the memory contents and the register value differ: (gdb) p *(int *)($sp+104) $651 = 1 (gdb) p $w17 $652 = 0 Setting a breakpoint to the same location on real HW we get 1 in $w17 (as expected based on the memory content). Any idea/suggestion what might be causing this and how we could potentially fix it? Thanks and bests, Balazs ------------- Balazs Gerofi Research Scientist High-Performance Artificial Intelligence Systems Research Team RIKEN Center for Computational Science, Japan https://bgerofi.github.io/ _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s