That one I think should be fixed in the decoder. If the mode isn't valid, an Uknown(machInst) should be returned rather than the Srs instruction. I'll try to get the previous patch cleaned up and committed and a patch for this to you this evening. I think it would be a worthwhile but time consuming and tedious exercise to make sure the bad instruction encodings (like in this case) actually cause undefined exceptions rather than blissfully plowing ahead.

Gabe

Quoting Min Kyu Jeong <[email protected]>:

I tried it, and it solved the particular segfault problem. Thanks!

However, a new problem came up further down the execution stream. It is also
from a garbage instruction. Invalid register mode is extracted from the
garbage binary and used in intRegInMode(). Later during the rename stage,
flattenIntIndex() extracts this mode out and found out it's invalid and
panics.

decoder.cc
                              if (val == 0x4) {
                                  *const uint32_t mode = bits(machInst, 4,
0);*
                                  switch (bits(machInst, 24, 21)) {
                                    case 0x2:
                                      return new
SRS_STORE_IMM_AN_PY_SN_UN_WN_SZ8(machInst, *mode*,
                                              SrsOp::DecrementAfter, false);


SRS_STORE_IMM_AY_PY_SN_UN_WY_SZ8::SRS_STORE_IMM_AY_PY_SN_UN_WY_SZ8(ExtMachInst
machInst,
            uint32_t _regMode, int _mode, bool _wb)
         : SrsOp("srs", machInst, MemWriteOp,
                 (OperatingMode)*_regMode*, (AddrMode)_mode, _wb)
    {

_srcRegIdx[0] = MISCREG_CPSR + Ctrl_Base_DepTag;
_srcRegIdx[1] = (condCode == COND_AL || condCode == COND_UC) ?
               INTREG_ZERO : INTREG_CONDCODES;
* **_srcRegIdx[2] = intRegInMode((OperatingMode)regMode, INTREG_SP);*



        int
        flattenIntIndex(int reg)
        {
            assert(reg >= 0);
            if (reg < NUM_ARCH_INTREGS) {
                return intRegMap[reg];
            } else if (reg < NUM_INTREGS) {
                return reg;
            } else {
                int mode = reg / intRegsPerMode;
                reg = reg % intRegsPerMode;
                switch (mode) {
                  case MODE_USER:
                  case MODE_SYSTEM:
                    return INTREG_USR(reg);
                  case MODE_FIQ:
                    return INTREG_FIQ(reg);
                  case MODE_IRQ:
                    return INTREG_IRQ(reg);
                  case MODE_SVC:
                    return INTREG_SVC(reg);
                  case MODE_MON:
                    return INTREG_MON(reg);
                  case MODE_ABORT:
                    return INTREG_ABT(reg);
                  case MODE_UNDEFINED:
                    return INTREG_UND(reg);
*                  default:*
*                    panic("Flattening into an unknown mode.\n");*
                }
            }



On Mon, Jul 12, 2010 at 10:04 PM, Min Kyu Jeong <[email protected]> wrote:

I will try it tomorrow and let you know.


On Mon, Jul 12, 2010 at 6:51 PM, Gabriel Michael Black <
[email protected]> wrote:

Did that patch fix it?

Gabe


Quoting Gabe Black <[email protected]>:

 Here's more or less what's going on as far as the register index. The
load microop needs to store into register 1, and it needs to be sure it
stores into the version visible from the "user" mode. It does that by
applying the intRegInMode function which shifts the register index 1 by
MODE_USER * the number of integer registers. Later, the flattenIntIndex
function is called to unambiguously figure out what a particular
register index goes with given the current values of various ISA state
(specifically the CPU mode for ARM) or other conditions flagged by
putting the register index in a particular range. You can see in the
"else" clause that the mode is re-extracted from the index using an
integer division and the offset of 1 is extracted using a mod. This is
then translated into the actual register visible from that mode with
that index. From this point forward, the CPU can pretend the integer
register file is one big flat contiguous space and totally ignore the
ISAs register indexing semantics.

One additional mechanism is at work when actually storing the register
index in the StaticInst object. There are really three different types
of register indices, integer, float and misc (which could have also been
called "other" or "control"), but these are all stored in the same array
with no flag to distinguish them. To be able to tell them apart later,
an offset is added to them so that the integer indexes are all from 0 to
FP_Base_DepTag - 1, the floating point registers are all from
FP_Base_DepTag to Ctrl_Base_DepTag - 1 (inconsistently named for
historical reasons), and the misc registers are from Ctrl_Base_DepTag
and up. This is a fairly fragile system since if a, say, integer index
is large enough, it might spill into the fp or misc range and be
misidentified later. I'd like to replace this system with
multidimensional indices that track the type explicitly, but I won't get
into the specifics here.

A flaw in the combination of these two systems is why this particular
index isn't being handled correctly. The FP_Base_DepTag is being set to
NumIntRegs, but in reality because of the intRegInMode function, integer
indices can be a lot larger than that. When this particular index is
being processed, 1->577 is even bigger than Ctrl_Base_DepTag, so it gets
interpreted as a misc index. These aren't renamed and are passed
directly to the ISA object to interpret/use as array indexes. I think
the reason this works fine on the simple CPUs is that they always know
explicitly what type of register something is, so when they go to undo
the DepTag offset, they just pull out the right one automatically. O3
isn't able to do that. I have a patch attached that simply multiplies
the FP_Base_DepTag value by 32 since 31 is the largest MODE_* constant.
In a more final version I'd want to do something that used the real
upper limit instead of just knowing multiplying by 32 gives the right
answer.

I'm not completely convinced this will solve your segfault, though. It
looks like that is caused by a bad DynInst pointer ending up in one of
the TimeBuffer structures used to pass values between stages. When that
goes out of scope and attempts to reference count itself, the junk
pointer is dereferenced and causes a segfault. You can tell the pointer
is bad because in 64 bit x86, pointers have to be canonical, or in other
words be sign extended beyond the largest implemented virtual address
bit. Anything starting with an 8 as the MSB is pretty much guaranteed to
be bogus. It could still be, however, that writing beyond the end of a
register file trampled on that structure and corrupted the pointer.
There are asserts in the ISA object's functions that should prevent
that, but those would be disabled if you were running with m5.fast (not
recommended unless you're 100% certain everything is working).

Gabe

Min Kyu Jeong wrote:

The following is the excerpt from the disassembly.

       117c: e321f013 msr CPSR_c, #19 ; 0x13
       1180: e24fd08c sub sp, pc, #140 ; 0x8c
       1184: e321f011 msr CPSR_c, #17 ; 0x11
       1188: e24fd094 sub sp, pc, #148 ; 0x94
       118c: e321f012 msr CPSR_c, #18 ; 0x12
       1190: e24fd09c sub sp, pc, #156 ; 0x9c
       1194: e321f01b msr CPSR_c, #27 ; 0x1b
       1198: e24fd0a4 sub sp, pc, #164 ; 0xa4
       119c: e321f017 msr CPSR_c, #23 ; 0x17
       11a0: e24fd0ac sub sp, pc, #172 ; 0xac
       11a4: e321f01f msr CPSR_c, #31 ; 0x1f
       11a8: e24fd0b4 sub sp, pc, #180 ; 0xb4
       11ac: e321f013 msr CPSR_c, #19 ; 0x13
       11b0: ea000002 b 11c0 <skipLabel_00000002>

   000011b4 <LabStr_00000002>:
       11b4: 5f444441 69736162 00315f63              ADD_basic_1.


After x11b0, branch is predicated fall-through and the string label is
fetched. The particular bytes that causes the mess is of address 11b8,
so I think it is the second 4B chunk of the label: x69736162

The following is the relevant part of trace from the run. There are
some additional prints that I added.

17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b4 (0) created
[sn:1585]
17680000: system.cpu.fetch: [tid:0]: Instruction is:   svcpl
17680000: global: MicroLdrUop, regIdx : 577
17680000: global: MicroLdrUop, regIdx : 581
17680000: global: MicroLdrUop, regIdx : 582
17680000: global: MicroLdrUop, regIdx : 584
17680000: global: MicroLdrUop, regIdx : 589
17680000: global: MicroLdrUop, regIdx : 590
17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (0) created
[sn:1586]
17680000: system.cpu.fetch: [tid:0]: Instruction is:   addi_uopvs
r34, r3, #0
17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (1) created
[sn:1587]
17680000: system.cpu.fetch: [tid:0]: Instruction is:   subi_uopvs
r3, r3, #24
17680000: system.cpu.fetch: [tid:0]: Done fetching, reached fetch
bandwidth for this cycle.
17680000: system.cpu.fetch: [tid:0]: Setting PC to 0x0011b8.
...
17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (2) created
[sn:1588]
17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
XXX, [r34, #24]
17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (3) created
[sn:1589]
17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
XXX, [r34, #20]
17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (4) created
[sn:1590]
17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
XXX, [r34, #16]
17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (5) created
[sn:1591]
17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
XXX, [r34, #12]
17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (6) created
[sn:1592]
17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
XXX, [r34, #8]
17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (7) created
[sn:1593]
17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
XXX, [r34, #4]
...
18440000: system.cpu.rename: [tid:0]: Processing instruction [sn:1588]
with PC 0x11b8.
18440000: system.cpu.rename: Adjusting reg index from 105 to 105.
18440000: system.cpu.rename: [tid:0]: Looking up arch reg 105, got
physical reg 512.
18440000: system.cpu.rename: [tid:0]: Register 512 is ready.
18440000: global: [sn:1588] has 1 ready out of 4 sources. RTI 0)
18440000: system.cpu.rename: Flattening index 35 to 35.
18440000: system.cpu.rename: [tid:0]: Looking up arch reg 35, got
physical reg 147.
18440000: system.cpu.rename: [tid:0]: Register 147 is ready.
18440000: global: [sn:1588] has 2 ready out of 4 sources. RTI 0)
18440000: system.cpu.rename: Flattening index 34 to 34.
18440000: system.cpu.rename: [tid:0]: Looking up arch reg 34, got
physical reg 72.
18440000: system.cpu.rename: [tid:0]: Register 72 is not ready.
18440000: system.cpu.rename: Adjusting reg index from 577 to 577.
18440000: system.cpu.rename: [tid:0]: Looking up arch reg 577, got
physical reg 984.
18440000: system.cpu.rename: [tid:0]: Register 984 is ready.
18440000: global: [sn:1588] has 3 ready out of 4 sources. RTI 0)
18440000: system.cpu.rename: Adjusting reg index from 577 to 577.
18440000: global: Renamed misc reg 472
*18440000: global: Renamed reg 472 to physical reg 984 old mapping was
984*
*18440000: system.cpu.rename: [tid:0]: Renaming arch reg 577 to
physical reg 984.*
18440000: system.cpu.rename: [tid:0]: Adding instruction to history
buffer (size=3), [sn:1588].



On Fri, Jul 9, 2010 at 4:23 PM, Gabriel Michael Black
<[email protected] <mailto:[email protected]>> wrote:

   Thanks for the extra info which should be very helpful. Can you
   please tell us what the actual bytes are for the junk instruction?


   Gabe

   Quoting Min Kyu Jeong <[email protected] <mailto:[email protected]
>>:

       I looked into this thing, but still don't fully understand how
the
       out-of-bound reg index causes segfault. Instead, I will just
       describe what
       is happening hoping someone would catch a clue from it.

       The register index that goes out of bound is the architectural
       register
       index, stored in StaticInst class _destRegIdx[0]. The
       particular StaticInst
       I am getting this from is MicroLdrUop. In the constructor of
       the MacroMemOp
       (This invalid garbage instruction from mispredicted path is
       decoded as
       LdmStm), MicroLdrUop instances are generated. The destination
       register
       indices for uops are generated from the bit vector, and there
       is this bit of
       code

       if (force_user) {
         regIdx = instRegInMode(MODE_USER, regIdx);
       }

       that changes regIdx from 1 to 577. This is stored in
       _destRegIdx[0] variable
       of the MicroLdrUop StaticInst. During renaming, 577 is renamed
       to 984 = 577
       - numLogicalRegs + numPhysicalRegs

       This instRegInMode() is what I found suspicous, since the reg
       window is
       handled by ArmISA::flattenIntIndex() call.

       Anyways, the the simulation segfaults during advance()
       function call of the
       timebuffer at the end of the NEXT tick. The following is the
       call stack.


       #0  0x000000000040a2fa in RefCounted::decref
       (this=0x8d4810708d48c84d) at
       build/ARM_FS/base/refcnt.hh:51
       #1  0x0000000000761daa in
       RefCountingPtr<BaseO3DynInst<O3CPUImpl> >::del
       (this=0x9cd200) at build/ARM_FS/base/refcnt.hh:69
       #2  0x0000000000761dc1 in ~RefCountingPtr (this=0x9cd200) at
       build/ARM_FS/base/refcnt.hh:85
       #3  0x000000000077ebf9 in ~commitComm (this=0x9cd1a8) at
       build/ARM_FS/cpu/o3/comm.hh:153
       #4  0x000000000077ec49 in ~TimeBufStruct (this=0x9ccec8) at
       build/ARM_FS/cpu/o3/comm.hh:110
       #5  0x00000000007884c4 in TimeBuffer<TimeBufStruct<O3CPUImpl>
       >::advance
       (this=0x1991038) at build/ARM_FS/base/timebuf.hh:187
       #6  0x0000000000798c31 in FullO3CPU<O3CPUImpl>::tick
       (this=0x198d310) at
       build/ARM_FS/cpu/o3/cpu.cc:523

       I made this segfault goes away by overriding the idx 577 to 0.

       Thanks,

       On Wed, Jun 30, 2010 at 2:17 PM, Gabriel Michael Black <
       [email protected] <mailto:[email protected]>> wrote:

           Could you be more specific? There are a lot of register
           related indexes and
           I'm not sure exactly which ones you're talking about.
           Could you walk through
           what's happening from the illegal encoding, through the
           decoder, through the
           CPU and up to the segfault? I don't totally understand the
           mechanics of the
           failure at the moment, but my gut reaction is that the
           decoder should have
           returned an "Unknown(machInst)" when the index was out of
           bounds. I'm not
           convinced that's what's happening, though.


           Gabe

           Quoting Min Kyu Jeong <[email protected]
           <mailto:[email protected]>>:

            I just found a case somewhat related to this. Not exactly
           an assertion,

               but
               a segfault from the mispredicated path (non)instructions.

               When the operand register index is out of the range,
               the call to
               timeBuffer.advance() right after the renaming of such
               registers causes
               segfault . I bypassed this problem by making that
               out-of-bound register
               index to ZERO registers during the renaming (more
               particularly, during
               index
               flattening). I think raising a fault would be a better
               solution, but
               holding
               off from actually doing it. Any suggestion would be
               appreciated.

               ps. is the [m5-dev] tag in the title added by the
               mailing list, or should
               I
               add it myself?

               Thanks,

               Min

               On Mon, Jun 14, 2010 at 3:52 PM, Gabriel Michael Black <
               [email protected] <mailto:[email protected]>>
               wrote:

                It's important to distinguish between M5 making
               sense, and the code it's

                   executing making sense. We shouldn't (and I hope
                   don't) have any asserts
                   that check conditions controllable from the
                   simulated code since those
                   should generally just cause a fault and may, as
                   you point out, be
                   mispeculated. It's fine to check that M5 is
                   internally consistent,
                   though.
                   This is supposed to work in all the CPU models and
                   as far as I know
                   generally does. M5's CPU models should, to the
                   first order, correctly do
                   whatever whacky, nonsensical things the
                   instruction memory tells it to do
                   without complaining. If you've found a case where
                   it doesn't (which has
                   happened before) please let us know so we can fix it.

                   Gabe


                   Quoting Min Kyu Jeong <[email protected]
                   <mailto:[email protected]>>:

                    Is it possible that the speculatively fetched
                   instructions can cause

                       programming assertions to fail? Until a branch
                       is resolved, whatever
                       (even
                       non-instructions) in the predicted path could
                       be fetched and decoded.
                       Can't
                       assertions on instruction sanity fail for those?

                       I am trying to make O3 CPU model for ARM
                       working. In many cases the
                       first
                       instruction is a branch followed by a
                       interrupt vector table. I was
                       wondering if such cases exist for other CPU
                       models and if it is, handled
                       how.

                       Thanks,

                       Min



                   _______________________________________________
                   m5-dev mailing list
                   [email protected] <mailto:[email protected]>
                   http://m5sim.org/mailman/listinfo/m5-dev




           _______________________________________________
           m5-dev mailing list
           [email protected] <mailto:[email protected]>
           http://m5sim.org/mailman/listinfo/m5-dev




   _______________________________________________
   m5-dev mailing list
   [email protected] <mailto:[email protected]>
   http://m5sim.org/mailman/listinfo/m5-dev


------------------------------------------------------------------------

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev





_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev






_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to