I tried it, and it solved the particular segfault problem. Thanks!
However, a new problem came up further down the execution stream. It is also
from a garbage instruction. Invalid register mode is extracted from the
garbage binary and used in intRegInMode(). Later during the rename stage,
flattenIntIndex() extracts this mode out and found out it's invalid and
panics.
decoder.cc
if (val == 0x4) {
*const uint32_t mode = bits(machInst, 4,
0);*
switch (bits(machInst, 24, 21)) {
case 0x2:
return new
SRS_STORE_IMM_AN_PY_SN_UN_WN_SZ8(machInst, *mode*,
SrsOp::DecrementAfter, false);
SRS_STORE_IMM_AY_PY_SN_UN_WY_SZ8::SRS_STORE_IMM_AY_PY_SN_UN_WY_SZ8(ExtMachInst
machInst,
uint32_t _regMode, int _mode, bool _wb)
: SrsOp("srs", machInst, MemWriteOp,
(OperatingMode)*_regMode*, (AddrMode)_mode, _wb)
{
_srcRegIdx[0] = MISCREG_CPSR + Ctrl_Base_DepTag;
_srcRegIdx[1] = (condCode == COND_AL || condCode == COND_UC) ?
INTREG_ZERO : INTREG_CONDCODES;
* **_srcRegIdx[2] = intRegInMode((OperatingMode)regMode, INTREG_SP);*
int
flattenIntIndex(int reg)
{
assert(reg >= 0);
if (reg < NUM_ARCH_INTREGS) {
return intRegMap[reg];
} else if (reg < NUM_INTREGS) {
return reg;
} else {
int mode = reg / intRegsPerMode;
reg = reg % intRegsPerMode;
switch (mode) {
case MODE_USER:
case MODE_SYSTEM:
return INTREG_USR(reg);
case MODE_FIQ:
return INTREG_FIQ(reg);
case MODE_IRQ:
return INTREG_IRQ(reg);
case MODE_SVC:
return INTREG_SVC(reg);
case MODE_MON:
return INTREG_MON(reg);
case MODE_ABORT:
return INTREG_ABT(reg);
case MODE_UNDEFINED:
return INTREG_UND(reg);
* default:*
* panic("Flattening into an unknown mode.\n");*
}
}
On Mon, Jul 12, 2010 at 10:04 PM, Min Kyu Jeong <[email protected]> wrote:
> I will try it tomorrow and let you know.
>
>
> On Mon, Jul 12, 2010 at 6:51 PM, Gabriel Michael Black <
> [email protected]> wrote:
>
>> Did that patch fix it?
>>
>> Gabe
>>
>>
>> Quoting Gabe Black <[email protected]>:
>>
>> Here's more or less what's going on as far as the register index. The
>>> load microop needs to store into register 1, and it needs to be sure it
>>> stores into the version visible from the "user" mode. It does that by
>>> applying the intRegInMode function which shifts the register index 1 by
>>> MODE_USER * the number of integer registers. Later, the flattenIntIndex
>>> function is called to unambiguously figure out what a particular
>>> register index goes with given the current values of various ISA state
>>> (specifically the CPU mode for ARM) or other conditions flagged by
>>> putting the register index in a particular range. You can see in the
>>> "else" clause that the mode is re-extracted from the index using an
>>> integer division and the offset of 1 is extracted using a mod. This is
>>> then translated into the actual register visible from that mode with
>>> that index. From this point forward, the CPU can pretend the integer
>>> register file is one big flat contiguous space and totally ignore the
>>> ISAs register indexing semantics.
>>>
>>> One additional mechanism is at work when actually storing the register
>>> index in the StaticInst object. There are really three different types
>>> of register indices, integer, float and misc (which could have also been
>>> called "other" or "control"), but these are all stored in the same array
>>> with no flag to distinguish them. To be able to tell them apart later,
>>> an offset is added to them so that the integer indexes are all from 0 to
>>> FP_Base_DepTag - 1, the floating point registers are all from
>>> FP_Base_DepTag to Ctrl_Base_DepTag - 1 (inconsistently named for
>>> historical reasons), and the misc registers are from Ctrl_Base_DepTag
>>> and up. This is a fairly fragile system since if a, say, integer index
>>> is large enough, it might spill into the fp or misc range and be
>>> misidentified later. I'd like to replace this system with
>>> multidimensional indices that track the type explicitly, but I won't get
>>> into the specifics here.
>>>
>>> A flaw in the combination of these two systems is why this particular
>>> index isn't being handled correctly. The FP_Base_DepTag is being set to
>>> NumIntRegs, but in reality because of the intRegInMode function, integer
>>> indices can be a lot larger than that. When this particular index is
>>> being processed, 1->577 is even bigger than Ctrl_Base_DepTag, so it gets
>>> interpreted as a misc index. These aren't renamed and are passed
>>> directly to the ISA object to interpret/use as array indexes. I think
>>> the reason this works fine on the simple CPUs is that they always know
>>> explicitly what type of register something is, so when they go to undo
>>> the DepTag offset, they just pull out the right one automatically. O3
>>> isn't able to do that. I have a patch attached that simply multiplies
>>> the FP_Base_DepTag value by 32 since 31 is the largest MODE_* constant.
>>> In a more final version I'd want to do something that used the real
>>> upper limit instead of just knowing multiplying by 32 gives the right
>>> answer.
>>>
>>> I'm not completely convinced this will solve your segfault, though. It
>>> looks like that is caused by a bad DynInst pointer ending up in one of
>>> the TimeBuffer structures used to pass values between stages. When that
>>> goes out of scope and attempts to reference count itself, the junk
>>> pointer is dereferenced and causes a segfault. You can tell the pointer
>>> is bad because in 64 bit x86, pointers have to be canonical, or in other
>>> words be sign extended beyond the largest implemented virtual address
>>> bit. Anything starting with an 8 as the MSB is pretty much guaranteed to
>>> be bogus. It could still be, however, that writing beyond the end of a
>>> register file trampled on that structure and corrupted the pointer.
>>> There are asserts in the ISA object's functions that should prevent
>>> that, but those would be disabled if you were running with m5.fast (not
>>> recommended unless you're 100% certain everything is working).
>>>
>>> Gabe
>>>
>>> Min Kyu Jeong wrote:
>>>
>>>> The following is the excerpt from the disassembly.
>>>>
>>>> 117c: e321f013 msr CPSR_c, #19 ; 0x13
>>>> 1180: e24fd08c sub sp, pc, #140 ; 0x8c
>>>> 1184: e321f011 msr CPSR_c, #17 ; 0x11
>>>> 1188: e24fd094 sub sp, pc, #148 ; 0x94
>>>> 118c: e321f012 msr CPSR_c, #18 ; 0x12
>>>> 1190: e24fd09c sub sp, pc, #156 ; 0x9c
>>>> 1194: e321f01b msr CPSR_c, #27 ; 0x1b
>>>> 1198: e24fd0a4 sub sp, pc, #164 ; 0xa4
>>>> 119c: e321f017 msr CPSR_c, #23 ; 0x17
>>>> 11a0: e24fd0ac sub sp, pc, #172 ; 0xac
>>>> 11a4: e321f01f msr CPSR_c, #31 ; 0x1f
>>>> 11a8: e24fd0b4 sub sp, pc, #180 ; 0xb4
>>>> 11ac: e321f013 msr CPSR_c, #19 ; 0x13
>>>> 11b0: ea000002 b 11c0 <skipLabel_00000002>
>>>>
>>>> 000011b4 <LabStr_00000002>:
>>>> 11b4: 5f444441 69736162 00315f63 ADD_basic_1.
>>>>
>>>>
>>>> After x11b0, branch is predicated fall-through and the string label is
>>>> fetched. The particular bytes that causes the mess is of address 11b8,
>>>> so I think it is the second 4B chunk of the label: x69736162
>>>>
>>>> The following is the relevant part of trace from the run. There are
>>>> some additional prints that I added.
>>>>
>>>> 17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b4 (0) created
>>>> [sn:1585]
>>>> 17680000: system.cpu.fetch: [tid:0]: Instruction is: svcpl
>>>> 17680000: global: MicroLdrUop, regIdx : 577
>>>> 17680000: global: MicroLdrUop, regIdx : 581
>>>> 17680000: global: MicroLdrUop, regIdx : 582
>>>> 17680000: global: MicroLdrUop, regIdx : 584
>>>> 17680000: global: MicroLdrUop, regIdx : 589
>>>> 17680000: global: MicroLdrUop, regIdx : 590
>>>> 17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (0) created
>>>> [sn:1586]
>>>> 17680000: system.cpu.fetch: [tid:0]: Instruction is: addi_uopvs
>>>> r34, r3, #0
>>>> 17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (1) created
>>>> [sn:1587]
>>>> 17680000: system.cpu.fetch: [tid:0]: Instruction is: subi_uopvs
>>>> r3, r3, #24
>>>> 17680000: system.cpu.fetch: [tid:0]: Done fetching, reached fetch
>>>> bandwidth for this cycle.
>>>> 17680000: system.cpu.fetch: [tid:0]: Setting PC to 0x0011b8.
>>>> ...
>>>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (2) created
>>>> [sn:1588]
>>>> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs
>>>> XXX, [r34, #24]
>>>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (3) created
>>>> [sn:1589]
>>>> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs
>>>> XXX, [r34, #20]
>>>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (4) created
>>>> [sn:1590]
>>>> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs
>>>> XXX, [r34, #16]
>>>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (5) created
>>>> [sn:1591]
>>>> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs
>>>> XXX, [r34, #12]
>>>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (6) created
>>>> [sn:1592]
>>>> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs
>>>> XXX, [r34, #8]
>>>> 17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (7) created
>>>> [sn:1593]
>>>> 17690000: system.cpu.fetch: [tid:0]: Instruction is: ldr_uopvs
>>>> XXX, [r34, #4]
>>>> ...
>>>> 18440000: system.cpu.rename: [tid:0]: Processing instruction [sn:1588]
>>>> with PC 0x11b8.
>>>> 18440000: system.cpu.rename: Adjusting reg index from 105 to 105.
>>>> 18440000: system.cpu.rename: [tid:0]: Looking up arch reg 105, got
>>>> physical reg 512.
>>>> 18440000: system.cpu.rename: [tid:0]: Register 512 is ready.
>>>> 18440000: global: [sn:1588] has 1 ready out of 4 sources. RTI 0)
>>>> 18440000: system.cpu.rename: Flattening index 35 to 35.
>>>> 18440000: system.cpu.rename: [tid:0]: Looking up arch reg 35, got
>>>> physical reg 147.
>>>> 18440000: system.cpu.rename: [tid:0]: Register 147 is ready.
>>>> 18440000: global: [sn:1588] has 2 ready out of 4 sources. RTI 0)
>>>> 18440000: system.cpu.rename: Flattening index 34 to 34.
>>>> 18440000: system.cpu.rename: [tid:0]: Looking up arch reg 34, got
>>>> physical reg 72.
>>>> 18440000: system.cpu.rename: [tid:0]: Register 72 is not ready.
>>>> 18440000: system.cpu.rename: Adjusting reg index from 577 to 577.
>>>> 18440000: system.cpu.rename: [tid:0]: Looking up arch reg 577, got
>>>> physical reg 984.
>>>> 18440000: system.cpu.rename: [tid:0]: Register 984 is ready.
>>>> 18440000: global: [sn:1588] has 3 ready out of 4 sources. RTI 0)
>>>> 18440000: system.cpu.rename: Adjusting reg index from 577 to 577.
>>>> 18440000: global: Renamed misc reg 472
>>>> *18440000: global: Renamed reg 472 to physical reg 984 old mapping was
>>>> 984*
>>>> *18440000: system.cpu.rename: [tid:0]: Renaming arch reg 577 to
>>>> physical reg 984.*
>>>> 18440000: system.cpu.rename: [tid:0]: Adding instruction to history
>>>> buffer (size=3), [sn:1588].
>>>>
>>>>
>>>>
>>>> On Fri, Jul 9, 2010 at 4:23 PM, Gabriel Michael Black
>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>
>>>> Thanks for the extra info which should be very helpful. Can you
>>>> please tell us what the actual bytes are for the junk instruction?
>>>>
>>>>
>>>> Gabe
>>>>
>>>> Quoting Min Kyu Jeong <[email protected] <mailto:[email protected]
>>>> >>:
>>>>
>>>> I looked into this thing, but still don't fully understand how
>>>> the
>>>> out-of-bound reg index causes segfault. Instead, I will just
>>>> describe what
>>>> is happening hoping someone would catch a clue from it.
>>>>
>>>> The register index that goes out of bound is the architectural
>>>> register
>>>> index, stored in StaticInst class _destRegIdx[0]. The
>>>> particular StaticInst
>>>> I am getting this from is MicroLdrUop. In the constructor of
>>>> the MacroMemOp
>>>> (This invalid garbage instruction from mispredicted path is
>>>> decoded as
>>>> LdmStm), MicroLdrUop instances are generated. The destination
>>>> register
>>>> indices for uops are generated from the bit vector, and there
>>>> is this bit of
>>>> code
>>>>
>>>> if (force_user) {
>>>> regIdx = instRegInMode(MODE_USER, regIdx);
>>>> }
>>>>
>>>> that changes regIdx from 1 to 577. This is stored in
>>>> _destRegIdx[0] variable
>>>> of the MicroLdrUop StaticInst. During renaming, 577 is renamed
>>>> to 984 = 577
>>>> - numLogicalRegs + numPhysicalRegs
>>>>
>>>> This instRegInMode() is what I found suspicous, since the reg
>>>> window is
>>>> handled by ArmISA::flattenIntIndex() call.
>>>>
>>>> Anyways, the the simulation segfaults during advance()
>>>> function call of the
>>>> timebuffer at the end of the NEXT tick. The following is the
>>>> call stack.
>>>>
>>>>
>>>> #0 0x000000000040a2fa in RefCounted::decref
>>>> (this=0x8d4810708d48c84d) at
>>>> build/ARM_FS/base/refcnt.hh:51
>>>> #1 0x0000000000761daa in
>>>> RefCountingPtr<BaseO3DynInst<O3CPUImpl> >::del
>>>> (this=0x9cd200) at build/ARM_FS/base/refcnt.hh:69
>>>> #2 0x0000000000761dc1 in ~RefCountingPtr (this=0x9cd200) at
>>>> build/ARM_FS/base/refcnt.hh:85
>>>> #3 0x000000000077ebf9 in ~commitComm (this=0x9cd1a8) at
>>>> build/ARM_FS/cpu/o3/comm.hh:153
>>>> #4 0x000000000077ec49 in ~TimeBufStruct (this=0x9ccec8) at
>>>> build/ARM_FS/cpu/o3/comm.hh:110
>>>> #5 0x00000000007884c4 in TimeBuffer<TimeBufStruct<O3CPUImpl>
>>>> >::advance
>>>> (this=0x1991038) at build/ARM_FS/base/timebuf.hh:187
>>>> #6 0x0000000000798c31 in FullO3CPU<O3CPUImpl>::tick
>>>> (this=0x198d310) at
>>>> build/ARM_FS/cpu/o3/cpu.cc:523
>>>>
>>>> I made this segfault goes away by overriding the idx 577 to 0.
>>>>
>>>> Thanks,
>>>>
>>>> On Wed, Jun 30, 2010 at 2:17 PM, Gabriel Michael Black <
>>>> [email protected] <mailto:[email protected]>> wrote:
>>>>
>>>> Could you be more specific? There are a lot of register
>>>> related indexes and
>>>> I'm not sure exactly which ones you're talking about.
>>>> Could you walk through
>>>> what's happening from the illegal encoding, through the
>>>> decoder, through the
>>>> CPU and up to the segfault? I don't totally understand the
>>>> mechanics of the
>>>> failure at the moment, but my gut reaction is that the
>>>> decoder should have
>>>> returned an "Unknown(machInst)" when the index was out of
>>>> bounds. I'm not
>>>> convinced that's what's happening, though.
>>>>
>>>>
>>>> Gabe
>>>>
>>>> Quoting Min Kyu Jeong <[email protected]
>>>> <mailto:[email protected]>>:
>>>>
>>>> I just found a case somewhat related to this. Not exactly
>>>> an assertion,
>>>>
>>>> but
>>>> a segfault from the mispredicated path (non)instructions.
>>>>
>>>> When the operand register index is out of the range,
>>>> the call to
>>>> timeBuffer.advance() right after the renaming of such
>>>> registers causes
>>>> segfault . I bypassed this problem by making that
>>>> out-of-bound register
>>>> index to ZERO registers during the renaming (more
>>>> particularly, during
>>>> index
>>>> flattening). I think raising a fault would be a better
>>>> solution, but
>>>> holding
>>>> off from actually doing it. Any suggestion would be
>>>> appreciated.
>>>>
>>>> ps. is the [m5-dev] tag in the title added by the
>>>> mailing list, or should
>>>> I
>>>> add it myself?
>>>>
>>>> Thanks,
>>>>
>>>> Min
>>>>
>>>> On Mon, Jun 14, 2010 at 3:52 PM, Gabriel Michael Black <
>>>> [email protected] <mailto:[email protected]>>
>>>> wrote:
>>>>
>>>> It's important to distinguish between M5 making
>>>> sense, and the code it's
>>>>
>>>> executing making sense. We shouldn't (and I hope
>>>> don't) have any asserts
>>>> that check conditions controllable from the
>>>> simulated code since those
>>>> should generally just cause a fault and may, as
>>>> you point out, be
>>>> mispeculated. It's fine to check that M5 is
>>>> internally consistent,
>>>> though.
>>>> This is supposed to work in all the CPU models and
>>>> as far as I know
>>>> generally does. M5's CPU models should, to the
>>>> first order, correctly do
>>>> whatever whacky, nonsensical things the
>>>> instruction memory tells it to do
>>>> without complaining. If you've found a case where
>>>> it doesn't (which has
>>>> happened before) please let us know so we can fix it.
>>>>
>>>> Gabe
>>>>
>>>>
>>>> Quoting Min Kyu Jeong <[email protected]
>>>> <mailto:[email protected]>>:
>>>>
>>>> Is it possible that the speculatively fetched
>>>> instructions can cause
>>>>
>>>> programming assertions to fail? Until a branch
>>>> is resolved, whatever
>>>> (even
>>>> non-instructions) in the predicted path could
>>>> be fetched and decoded.
>>>> Can't
>>>> assertions on instruction sanity fail for those?
>>>>
>>>> I am trying to make O3 CPU model for ARM
>>>> working. In many cases the
>>>> first
>>>> instruction is a branch followed by a
>>>> interrupt vector table. I was
>>>> wondering if such cases exist for other CPU
>>>> models and if it is, handled
>>>> how.
>>>>
>>>> Thanks,
>>>>
>>>> Min
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> m5-dev mailing list
>>>> [email protected] <mailto:[email protected]>
>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> m5-dev mailing list
>>>> [email protected] <mailto:[email protected]>
>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> m5-dev mailing list
>>>> [email protected] <mailto:[email protected]>
>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> m5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>
>>>>
>>>
>>>
>>
>> _______________________________________________
>> m5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/m5-dev
>>
>
>
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev