Hi Matt, Gabe, Running in the develop branch the code, seems to run without any errors. I suppose this is due to the fact that things have been reworked in develop.
The backtrace generated by the debug build on the stable branch is: 7335000: system.cpu: T0 : 0x7ffff801bbdd @_end+140737354234813. 3 : CALL_NEAR_I : subi rsp, rsp, 0x8 : IntAlu : D=0x00007fffffffed48 7335000: system.cpu: T0 : 0x7ffff801bbdd @_end+140737354234813. 4 : CALL_NEAR_I : wrip t7, t1 : IntAlu : 7447000: system.cpu: T0 : 0x7ffff801d080 @_end+140737354240096 : hint 7447000: system.cpu: T0 : 0x7ffff801d080 @_end+140737354240096. 0 : HINT_NOP : fault NoFault : No_OpClass : 7447000: system.cpu: T0 : 0x7ffff801d084 @_end+140737354240100 : mov eax, 0xc 7447000: system.cpu: T0 : 0x7ffff801d084 @_end+140737354240100. 0 : MOV_R_I : limm eax, 0xc : IntAlu : D=0x000000000000000c build/X86/arch/x86/insts/static_inst.cc:254: panic: Unknown register class: 1066703648 Memory Usage: 643980 KBytes Program aborted at tick 7455000 --- BEGIN LIBC BACKTRACE --- ../build/X86/gem5.debug(+0xfcebed)[0x55f53b785bed] ../build/X86/gem5.debug(+0xff1b11)[0x55f53b7a8b11] /lib/x86_64-linux-gnu/libpthread.so.0(+0x15420)[0x7fdcfff9f420] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7fdcff14618b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7fdcff125859] ../build/X86/gem5.debug(+0x1d29b8)[0x55f53a9899b8] ../build/X86/gem5.debug(+0x1f7537)[0x55f53a9ae537] ../build/X86/gem5.debug(+0x2f6934)[0x55f53aaad934] ../build/X86/gem5.debug(+0x8b9881)[0x55f53b070881] ../build/X86/gem5.debug(+0x8b14cd)[0x55f53b0684cd] ../build/X86/gem5.debug(+0x8b1c22)[0x55f53b068c22] ../build/X86/gem5.debug(+0x970b91)[0x55f53b127b91] ../build/X86/gem5.debug(+0x96ee43)[0x55f53b125e43] ../build/X86/gem5.debug(+0x96e49d)[0x55f53b12549d] ../build/X86/gem5.debug(+0x96ca3b)[0x55f53b123a3b] ../build/X86/gem5.debug(+0x980254)[0x55f53b137254] ../build/X86/gem5.debug(+0x97c995)[0x55f53b133995] ../build/X86/gem5.debug(+0x987884)[0x55f53b13e884] ../build/X86/gem5.debug(+0x2030ae)[0x55f53a9ba0ae] ../build/X86/gem5.debug(+0x2003d0)[0x55f53a9b73d0] ../build/X86/gem5.debug(+0xfddf5c)[0x55f53b794f5c] ../build/X86/gem5.debug(+0x1005cc3)[0x55f53b7bccc3] ../build/X86/gem5.debug(+0x10058c3)[0x55f53b7bc8c3] ../build/X86/gem5.debug(+0xfaab48)[0x55f53b761b48] ../build/X86/gem5.debug(+0xfa8e1e)[0x55f53b75fe1e] ../build/X86/gem5.debug(+0xfa5183)[0x55f53b75c183] ../build/X86/gem5.debug(+0xfa51ee)[0x55f53b75c1ee] ../build/X86/gem5.debug(+0xaedbb5)[0x55f53b2a4bb5] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8718)[0x7fdd00255718] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7fdd0002af48] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7fdd00177ecb] /lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7fdd002550f4] --- END LIBC BACKTRACE --- I am leaning towards Gabe’s idea that the real bug is that the RegID itself is bogus since different ones are being generated each run. I am sorry for the late response. Nirmit From: mattdsinclair.w...@gmail.com <mattdsinclair.w...@gmail.com> Sent: Wednesday, December 1, 2021 11:07 PM To: Gabe Black <gabe.bl...@gmail.com> Cc: gem5 users mailing list <gem5-users@gem5.org>; Nirmit Jallawar <jalla...@wisc.edu> Subject: Re: [gem5-users] Unrecognized register class when using the "Exec" debug flag Thanks Gabe. Good catch about the actual value -- I just saw a negative number and assumed -1, whoops. Based on what Nirmit is seeing, it seems like HINT_NOP or MOV_R_I must be the instruction causing the fault, but yeah a backtrace will probably help confirm. Nirmit, can you please try running stable with a debug build (to get a backtrace) and develop with a release build and let us know what you see? Matt On Wed, Dec 1, 2021 at 10:47 PM Gabe Black <gabe.bl...@gmail.com<mailto:gabe.bl...@gmail.com>> wrote: I realize this is probably a hard question to answer with Exec being broken, but do you know what instruction is causing the problem? HINT_NOP? Probably the first thing that someone should do (if they haven't already) is to run this under gdb and see what the backtrace looks like, since that would give us a lot more info to work with. Looking at the info we have here, I see that the return from classValue() is -854770912 (not -1?) which to me looks like junk. I think probably what's happening is that the RegId being passed to the instruction's printReg function is from a bad pointer of some sort which is why it doesn't know how to print the register name. The RegId in this case refers to a particular register/operand, not the instruction as a whole. For instance, when the previous instruction prints out eax, that would be a RegId with classValue() (member regClass) set to IntRegClass, and regIdx set to INTREG_RAX. This works a little differently now and is in the process of being significantly reworked, although the gist is largely the same, particularly in the details involved here. The RegId structure tells you what type of register you're dealing with, aka its class, and also which particular register within that space you're referring to. The printReg method is trying to figure out what the name of that register is so it can be printed as part of the disassembly. I think the real bug is going to be that the RegId itself is bogus, and so when it's operated on, it's random junk will lead to random behavior or errors. It could be, for instance, that the instruction is trying to print a register name in its disassembly, but it doesn't actually *have* a register value set up in that slot and so is using uninitialized values. Typically the instructions would try to print out, say, destination register 0 when forming the disassembly string. Alternatively, O3 could have done something whacky and could be trying to do something with a nonsense instruction. I would personally lean towards the first option, but without more info it's hard to tell. I would also suggest trying this with develop. I don't think that's a *solution* to the problem, but it would possibly help isolate a cause. Like I said, how things work in develop are a little bit different, so we might get more info by also seeing what happens in those slightly different circumstances. Gabe On Wed, Dec 1, 2021 at 8:30 PM Matt Sinclair <mattdsinclair.w...@gmail.com<mailto:mattdsinclair.w...@gmail.com>> wrote: Hi Gabe, I was trying to dig through the RegClass code earlier to figure out why the value is -1 for this instruction, and the only thing that I can think of is HINT_NOP needs a RegClass value set for it, but it isn't set for some reason (which is not 100% clear to me). You know this code much better than I do though, hence I was hoping you might see something I'm not seeing. Since this error is happening on a clean checkout of gem5 on stable, it seems like a bug that anyone could face if they use the Exec debug flag. Thanks, Matt ---------- Forwarded message --------- From: Nirmit Jallawar via gem5-users <gem5-users@gem5.org<mailto:gem5-users@gem5.org>> Date: Wed, Dec 1, 2021 at 10:25 PM Subject: [gem5-users] Unrecognized register class when using the "Exec" debug flag To: gem5-users@gem5.org<mailto:gem5-users@gem5.org> <gem5-users@gem5.org<mailto:gem5-users@gem5.org>> Cc: Nirmit Jallawar <jalla...@wisc.edu<mailto:jalla...@wisc.edu>> Hi all, I was trying to run a gem5 simulation using the O3CPU but encountered problems with gem5 “panic” when running with the “Exec” debug flags enabled. I have built gem5 for the x86 ISA, and am using the stable branch. The full log can be found in the zip linked below (crash_debug_log). The error in the log seems to be related to this: build/X86/arch/x86/insts/static_inst.cc:253: panic: Unrecognized register class. On further debugging, it seems that the register class value is being set to -1: …. 7335000: system.cpu: T0 : 0x7ffff801bbdd @_end+140737354234813. 2 : CALL_NEAR_I : stis t7, SS:[rsp + 0xfffffffffffffff8] : MemWrite : D=0x00007ffff801bbe2 A=0x7fffffffed48 7335000: system.cpu: T0 : 0x7ffff801bbdd @_end+140737354234813. 3 : CALL_NEAR_I : subi rsp, rsp, 0x8 : IntAlu : D=0x00007fffffffed48 7335000: system.cpu: T0 : 0x7ffff801bbdd @_end+140737354234813. 4 : CALL_NEAR_I : wrip t7, t1 : IntAlu : 7447000: system.cpu: T0 : 0x7ffff801d080 @_end+140737354240096 : hint 7447000: system.cpu: T0 : 0x7ffff801d080 @_end+140737354240096. 0 : HINT_NOP : fault NoFault : No_OpClass : 7447000: system.cpu: T0 : 0x7ffff801d084 @_end+140737354240100 : mov eax, 0xc 7447000: system.cpu: T0 : 0x7ffff801d084 @_end+140737354240100. 0 : MOV_R_I : limm eax, 0xc : IntAlu : D=0x000000000000000c build/X86/arch/x86/insts/static_inst.cc:254: panic: Unknown register class: -854770912 (reg.classValue()) Memory Usage: 632228 KBytes Program aborted at tick 7455000 --- BEGIN LIBC BACKTRACE --- …. The error does not appear when using no debug flags or using flags like 'IEW'. The command used to run the simulation is: ../build/X86/gem5.opt --debug-flags=Exec DAXPY-newCPU.py daxpy --cpu O3CPU If needed, you can find the related files here: https://drive.google.com/file/d/1Sxg-c9Gy0NU2r3_nd88A_le18C5RkuR_/view?usp=sharing I would appreciate any help on this. Best, Nirmit _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org> To unsubscribe send an email to gem5-users-le...@gem5.org<mailto:gem5-users-le...@gem5.org> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s