Hi Matt, Gabe,

Running in the develop branch the code, seems to run without any errors. I 
suppose this is due to the fact that things have been reworked in develop.

The backtrace generated by the debug build on the stable branch is:

7335000: system.cpu: T0 : 0x7ffff801bbdd @_end+140737354234813. 3 :   
CALL_NEAR_I : subi   rsp, rsp, 0x8 : IntAlu :  D=0x00007fffffffed48
7335000: system.cpu: T0 : 0x7ffff801bbdd @_end+140737354234813. 4 :   
CALL_NEAR_I : wrip   t7, t1 : IntAlu :
7447000: system.cpu: T0 : 0x7ffff801d080 @_end+140737354240096    : hint
7447000: system.cpu: T0 : 0x7ffff801d080 @_end+140737354240096. 0 :   HINT_NOP 
: fault   NoFault : No_OpClass :
7447000: system.cpu: T0 : 0x7ffff801d084 @_end+140737354240100    : mov eax, 0xc
7447000: system.cpu: T0 : 0x7ffff801d084 @_end+140737354240100. 0 :   MOV_R_I : 
limm   eax, 0xc : IntAlu :  D=0x000000000000000c
build/X86/arch/x86/insts/static_inst.cc:254: panic: Unknown register class: 
1066703648
Memory Usage: 643980 KBytes
Program aborted at tick 7455000
--- BEGIN LIBC BACKTRACE ---
../build/X86/gem5.debug(+0xfcebed)[0x55f53b785bed]
../build/X86/gem5.debug(+0xff1b11)[0x55f53b7a8b11]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x15420)[0x7fdcfff9f420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7fdcff14618b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7fdcff125859]
../build/X86/gem5.debug(+0x1d29b8)[0x55f53a9899b8]
../build/X86/gem5.debug(+0x1f7537)[0x55f53a9ae537]
../build/X86/gem5.debug(+0x2f6934)[0x55f53aaad934]
../build/X86/gem5.debug(+0x8b9881)[0x55f53b070881]
../build/X86/gem5.debug(+0x8b14cd)[0x55f53b0684cd]
../build/X86/gem5.debug(+0x8b1c22)[0x55f53b068c22]
../build/X86/gem5.debug(+0x970b91)[0x55f53b127b91]
../build/X86/gem5.debug(+0x96ee43)[0x55f53b125e43]
../build/X86/gem5.debug(+0x96e49d)[0x55f53b12549d]
../build/X86/gem5.debug(+0x96ca3b)[0x55f53b123a3b]
../build/X86/gem5.debug(+0x980254)[0x55f53b137254]
../build/X86/gem5.debug(+0x97c995)[0x55f53b133995]
../build/X86/gem5.debug(+0x987884)[0x55f53b13e884]
../build/X86/gem5.debug(+0x2030ae)[0x55f53a9ba0ae]
../build/X86/gem5.debug(+0x2003d0)[0x55f53a9b73d0]
../build/X86/gem5.debug(+0xfddf5c)[0x55f53b794f5c]
../build/X86/gem5.debug(+0x1005cc3)[0x55f53b7bccc3]
../build/X86/gem5.debug(+0x10058c3)[0x55f53b7bc8c3]
../build/X86/gem5.debug(+0xfaab48)[0x55f53b761b48]
../build/X86/gem5.debug(+0xfa8e1e)[0x55f53b75fe1e]
../build/X86/gem5.debug(+0xfa5183)[0x55f53b75c183]
../build/X86/gem5.debug(+0xfa51ee)[0x55f53b75c1ee]
../build/X86/gem5.debug(+0xaedbb5)[0x55f53b2a4bb5]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8718)[0x7fdd00255718]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x8dd8)[0x7fdd0002af48]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8fb)[0x7fdd00177ecb]
/lib/x86_64-linux-gnu/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x94)[0x7fdd002550f4]
--- END LIBC BACKTRACE ---

I am leaning towards Gabe’s idea that the real bug is that the RegID itself is 
bogus since different ones are being generated each run.

I am sorry for the late response.

Nirmit

From: mattdsinclair.w...@gmail.com <mattdsinclair.w...@gmail.com>
Sent: Wednesday, December 1, 2021 11:07 PM
To: Gabe Black <gabe.bl...@gmail.com>
Cc: gem5 users mailing list <gem5-users@gem5.org>; Nirmit Jallawar 
<jalla...@wisc.edu>
Subject: Re: [gem5-users] Unrecognized register class when using the "Exec" 
debug flag

Thanks Gabe.  Good catch about the actual value -- I just saw a negative number 
and assumed -1, whoops.  Based on what Nirmit is seeing, it seems like HINT_NOP 
or MOV_R_I must be the instruction causing the fault, but yeah a backtrace will 
probably help confirm.

Nirmit, can you please try running stable with a debug build (to get a 
backtrace) and develop with a release build and let us know what you see?

Matt

On Wed, Dec 1, 2021 at 10:47 PM Gabe Black 
<gabe.bl...@gmail.com<mailto:gabe.bl...@gmail.com>> wrote:
I realize this is probably a hard question to answer with Exec being broken, 
but do you know what instruction is causing the problem? HINT_NOP? Probably the 
first thing that someone should do (if they haven't already) is to run this 
under gdb and see what the backtrace looks like, since that would give us a lot 
more info to work with.

Looking at the info we have here, I see that the return from classValue() is 
-854770912 (not -1?) which to me looks like junk. I think probably what's 
happening is that the RegId being passed to the instruction's printReg function 
is from a bad pointer of some sort which is why it doesn't know how to print 
the register name. The RegId in this case refers to a particular 
register/operand, not the instruction as a whole. For instance, when the 
previous instruction prints out eax, that would be a RegId with classValue() 
(member regClass) set to IntRegClass, and regIdx set to INTREG_RAX.

This works a little differently now and is in the process of being 
significantly reworked, although the gist is largely the same, particularly in 
the details involved here. The RegId structure tells you what type of register 
you're dealing with, aka its class, and also which particular register within 
that space you're referring to. The printReg method is trying to figure out 
what the name of that register is so it can be printed as part of the 
disassembly.

I think the real bug is going to be that the RegId itself is bogus, and so when 
it's operated on, it's random junk will lead to random behavior or errors. It 
could be, for instance, that the instruction is trying to print a register name 
in its disassembly, but it doesn't actually *have* a register value set up in 
that slot and so is using uninitialized values. Typically the instructions 
would try to print out, say, destination register 0 when forming the 
disassembly string. Alternatively, O3 could have done something whacky and 
could be trying to do something with a nonsense instruction. I would personally 
lean towards the first option, but without more info it's hard to tell.

I would also suggest trying this with develop. I don't think that's a 
*solution* to the problem, but it would possibly help isolate a cause. Like I 
said, how things work in develop are a little bit different, so we might get 
more info by also seeing what happens in those slightly different circumstances.

Gabe

On Wed, Dec 1, 2021 at 8:30 PM Matt Sinclair 
<mattdsinclair.w...@gmail.com<mailto:mattdsinclair.w...@gmail.com>> wrote:
Hi Gabe,

I was trying to dig through the RegClass code earlier to figure out why the 
value is -1 for this instruction, and the only thing that I can think of is 
HINT_NOP needs a RegClass value set for it, but it isn't set for some reason 
(which is not 100% clear to me).  You know this code much better than I do 
though, hence I was hoping you might see something I'm not seeing.

Since this error is happening on a clean checkout of gem5 on stable, it seems 
like a bug that anyone could face if they use the Exec debug flag.

Thanks,
Matt

---------- Forwarded message ---------
From: Nirmit Jallawar via gem5-users 
<gem5-users@gem5.org<mailto:gem5-users@gem5.org>>
Date: Wed, Dec 1, 2021 at 10:25 PM
Subject: [gem5-users] Unrecognized register class when using the "Exec" debug 
flag
To: gem5-users@gem5.org<mailto:gem5-users@gem5.org> 
<gem5-users@gem5.org<mailto:gem5-users@gem5.org>>
Cc: Nirmit Jallawar <jalla...@wisc.edu<mailto:jalla...@wisc.edu>>

Hi all,

I was trying to run a gem5 simulation using the O3CPU but encountered problems 
with gem5 “panic” when running with the “Exec” debug flags enabled. I have 
built gem5 for the x86 ISA, and am using the stable branch.
The full log can be found in the zip linked below (crash_debug_log).
The error in the log seems to be related to this:
build/X86/arch/x86/insts/static_inst.cc:253: panic: Unrecognized register class.

On further debugging, it seems that the register class value is being set to -1:
….
7335000: system.cpu: T0 : 0x7ffff801bbdd @_end+140737354234813. 2 :   
CALL_NEAR_I : stis   t7, SS:[rsp + 0xfffffffffffffff8] : MemWrite :  
D=0x00007ffff801bbe2 A=0x7fffffffed48
7335000: system.cpu: T0 : 0x7ffff801bbdd @_end+140737354234813. 3 :   
CALL_NEAR_I : subi   rsp, rsp, 0x8 : IntAlu :  D=0x00007fffffffed48
7335000: system.cpu: T0 : 0x7ffff801bbdd @_end+140737354234813. 4 :   
CALL_NEAR_I : wrip   t7, t1 : IntAlu :
7447000: system.cpu: T0 : 0x7ffff801d080 @_end+140737354240096    : hint
7447000: system.cpu: T0 : 0x7ffff801d080 @_end+140737354240096. 0 :   HINT_NOP 
: fault   NoFault : No_OpClass :
7447000: system.cpu: T0 : 0x7ffff801d084 @_end+140737354240100    : mov eax, 0xc
7447000: system.cpu: T0 : 0x7ffff801d084 @_end+140737354240100. 0 :   MOV_R_I : 
limm   eax, 0xc : IntAlu :  D=0x000000000000000c
build/X86/arch/x86/insts/static_inst.cc:254: panic: Unknown register class: 
-854770912 (reg.classValue())
Memory Usage: 632228 KBytes
Program aborted at tick 7455000
--- BEGIN LIBC BACKTRACE ---
….

The error does not appear when using no debug flags or using flags like 'IEW'.

The command used to run the simulation is:

../build/X86/gem5.opt --debug-flags=Exec DAXPY-newCPU.py daxpy --cpu O3CPU

If needed, you can find the related files here: 
https://drive.google.com/file/d/1Sxg-c9Gy0NU2r3_nd88A_le18C5RkuR_/view?usp=sharing

I would appreciate any help on this.



Best,

Nirmit





_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org>
To unsubscribe send an email to 
gem5-users-le...@gem5.org<mailto:gem5-users-le...@gem5.org>
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to