Did that patch fix it?

Gabe

Quoting Gabe Black <gbl...@eecs.umich.edu>:

Here's more or less what's going on as far as the register index. The
load microop needs to store into register 1, and it needs to be sure it
stores into the version visible from the "user" mode. It does that by
applying the intRegInMode function which shifts the register index 1 by
MODE_USER * the number of integer registers. Later, the flattenIntIndex
function is called to unambiguously figure out what a particular
register index goes with given the current values of various ISA state
(specifically the CPU mode for ARM) or other conditions flagged by
putting the register index in a particular range. You can see in the
"else" clause that the mode is re-extracted from the index using an
integer division and the offset of 1 is extracted using a mod. This is
then translated into the actual register visible from that mode with
that index. From this point forward, the CPU can pretend the integer
register file is one big flat contiguous space and totally ignore the
ISAs register indexing semantics.

One additional mechanism is at work when actually storing the register
index in the StaticInst object. There are really three different types
of register indices, integer, float and misc (which could have also been
called "other" or "control"), but these are all stored in the same array
with no flag to distinguish them. To be able to tell them apart later,
an offset is added to them so that the integer indexes are all from 0 to
FP_Base_DepTag - 1, the floating point registers are all from
FP_Base_DepTag to Ctrl_Base_DepTag - 1 (inconsistently named for
historical reasons), and the misc registers are from Ctrl_Base_DepTag
and up. This is a fairly fragile system since if a, say, integer index
is large enough, it might spill into the fp or misc range and be
misidentified later. I'd like to replace this system with
multidimensional indices that track the type explicitly, but I won't get
into the specifics here.

A flaw in the combination of these two systems is why this particular
index isn't being handled correctly. The FP_Base_DepTag is being set to
NumIntRegs, but in reality because of the intRegInMode function, integer
indices can be a lot larger than that. When this particular index is
being processed, 1->577 is even bigger than Ctrl_Base_DepTag, so it gets
interpreted as a misc index. These aren't renamed and are passed
directly to the ISA object to interpret/use as array indexes. I think
the reason this works fine on the simple CPUs is that they always know
explicitly what type of register something is, so when they go to undo
the DepTag offset, they just pull out the right one automatically. O3
isn't able to do that. I have a patch attached that simply multiplies
the FP_Base_DepTag value by 32 since 31 is the largest MODE_* constant.
In a more final version I'd want to do something that used the real
upper limit instead of just knowing multiplying by 32 gives the right
answer.

I'm not completely convinced this will solve your segfault, though. It
looks like that is caused by a bad DynInst pointer ending up in one of
the TimeBuffer structures used to pass values between stages. When that
goes out of scope and attempts to reference count itself, the junk
pointer is dereferenced and causes a segfault. You can tell the pointer
is bad because in 64 bit x86, pointers have to be canonical, or in other
words be sign extended beyond the largest implemented virtual address
bit. Anything starting with an 8 as the MSB is pretty much guaranteed to
be bogus. It could still be, however, that writing beyond the end of a
register file trampled on that structure and corrupted the pointer.
There are asserts in the ISA object's functions that should prevent
that, but those would be disabled if you were running with m5.fast (not
recommended unless you're 100% certain everything is working).

Gabe

Min Kyu Jeong wrote:
The following is the excerpt from the disassembly.

        117c: e321f013 msr CPSR_c, #19 ; 0x13
        1180: e24fd08c sub sp, pc, #140 ; 0x8c
        1184: e321f011 msr CPSR_c, #17 ; 0x11
        1188: e24fd094 sub sp, pc, #148 ; 0x94
        118c: e321f012 msr CPSR_c, #18 ; 0x12
        1190: e24fd09c sub sp, pc, #156 ; 0x9c
        1194: e321f01b msr CPSR_c, #27 ; 0x1b
        1198: e24fd0a4 sub sp, pc, #164 ; 0xa4
        119c: e321f017 msr CPSR_c, #23 ; 0x17
        11a0: e24fd0ac sub sp, pc, #172 ; 0xac
        11a4: e321f01f msr CPSR_c, #31 ; 0x1f
        11a8: e24fd0b4 sub sp, pc, #180 ; 0xb4
        11ac: e321f013 msr CPSR_c, #19 ; 0x13
        11b0: ea000002 b 11c0 <skipLabel_00000002>

    000011b4 <LabStr_00000002>:
        11b4: 5f444441 69736162 00315f63              ADD_basic_1.


After x11b0, branch is predicated fall-through and the string label is
fetched. The particular bytes that causes the mess is of address 11b8,
so I think it is the second 4B chunk of the label: x69736162

The following is the relevant part of trace from the run. There are
some additional prints that I added.

17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b4 (0) created
[sn:1585]
17680000: system.cpu.fetch: [tid:0]: Instruction is:   svcpl
17680000: global: MicroLdrUop, regIdx : 577
17680000: global: MicroLdrUop, regIdx : 581
17680000: global: MicroLdrUop, regIdx : 582
17680000: global: MicroLdrUop, regIdx : 584
17680000: global: MicroLdrUop, regIdx : 589
17680000: global: MicroLdrUop, regIdx : 590
17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (0) created
[sn:1586]
17680000: system.cpu.fetch: [tid:0]: Instruction is:   addi_uopvs
r34, r3, #0
17680000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (1) created
[sn:1587]
17680000: system.cpu.fetch: [tid:0]: Instruction is:   subi_uopvs
r3, r3, #24
17680000: system.cpu.fetch: [tid:0]: Done fetching, reached fetch
bandwidth for this cycle.
17680000: system.cpu.fetch: [tid:0]: Setting PC to 0x0011b8.
...
17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (2) created
[sn:1588]
17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
XXX, [r34, #24]
17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (3) created
[sn:1589]
17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
XXX, [r34, #20]
17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (4) created
[sn:1590]
17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
XXX, [r34, #16]
17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (5) created
[sn:1591]
17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
XXX, [r34, #12]
17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (6) created
[sn:1592]
17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
XXX, [r34, #8]
17690000: system.cpu.fetch: [tid:0]: Instruction PC 0x11b8 (7) created
[sn:1593]
17690000: system.cpu.fetch: [tid:0]: Instruction is:   ldr_uopvs
XXX, [r34, #4]
...
18440000: system.cpu.rename: [tid:0]: Processing instruction [sn:1588]
with PC 0x11b8.
18440000: system.cpu.rename: Adjusting reg index from 105 to 105.
18440000: system.cpu.rename: [tid:0]: Looking up arch reg 105, got
physical reg 512.
18440000: system.cpu.rename: [tid:0]: Register 512 is ready.
18440000: global: [sn:1588] has 1 ready out of 4 sources. RTI 0)
18440000: system.cpu.rename: Flattening index 35 to 35.
18440000: system.cpu.rename: [tid:0]: Looking up arch reg 35, got
physical reg 147.
18440000: system.cpu.rename: [tid:0]: Register 147 is ready.
18440000: global: [sn:1588] has 2 ready out of 4 sources. RTI 0)
18440000: system.cpu.rename: Flattening index 34 to 34.
18440000: system.cpu.rename: [tid:0]: Looking up arch reg 34, got
physical reg 72.
18440000: system.cpu.rename: [tid:0]: Register 72 is not ready.
18440000: system.cpu.rename: Adjusting reg index from 577 to 577.
18440000: system.cpu.rename: [tid:0]: Looking up arch reg 577, got
physical reg 984.
18440000: system.cpu.rename: [tid:0]: Register 984 is ready.
18440000: global: [sn:1588] has 3 ready out of 4 sources. RTI 0)
18440000: system.cpu.rename: Adjusting reg index from 577 to 577.
18440000: global: Renamed misc reg 472
*18440000: global: Renamed reg 472 to physical reg 984 old mapping was
984*
*18440000: system.cpu.rename: [tid:0]: Renaming arch reg 577 to
physical reg 984.*
18440000: system.cpu.rename: [tid:0]: Adding instruction to history
buffer (size=3), [sn:1588].



On Fri, Jul 9, 2010 at 4:23 PM, Gabriel Michael Black
<gbl...@eecs.umich.edu <mailto:gbl...@eecs.umich.edu>> wrote:

    Thanks for the extra info which should be very helpful. Can you
    please tell us what the actual bytes are for the junk instruction?


    Gabe

    Quoting Min Kyu Jeong <mkje...@gmail.com <mailto:mkje...@gmail.com>>:

        I looked into this thing, but still don't fully understand how the
        out-of-bound reg index causes segfault. Instead, I will just
        describe what
        is happening hoping someone would catch a clue from it.

        The register index that goes out of bound is the architectural
        register
        index, stored in StaticInst class _destRegIdx[0]. The
        particular StaticInst
        I am getting this from is MicroLdrUop. In the constructor of
        the MacroMemOp
        (This invalid garbage instruction from mispredicted path is
        decoded as
        LdmStm), MicroLdrUop instances are generated. The destination
        register
        indices for uops are generated from the bit vector, and there
        is this bit of
        code

        if (force_user) {
          regIdx = instRegInMode(MODE_USER, regIdx);
        }

        that changes regIdx from 1 to 577. This is stored in
        _destRegIdx[0] variable
        of the MicroLdrUop StaticInst. During renaming, 577 is renamed
        to 984 = 577
        - numLogicalRegs + numPhysicalRegs

        This instRegInMode() is what I found suspicous, since the reg
        window is
        handled by ArmISA::flattenIntIndex() call.

        Anyways, the the simulation segfaults during advance()
        function call of the
        timebuffer at the end of the NEXT tick. The following is the
        call stack.


        #0  0x000000000040a2fa in RefCounted::decref
        (this=0x8d4810708d48c84d) at
        build/ARM_FS/base/refcnt.hh:51
        #1  0x0000000000761daa in
        RefCountingPtr<BaseO3DynInst<O3CPUImpl> >::del
        (this=0x9cd200) at build/ARM_FS/base/refcnt.hh:69
        #2  0x0000000000761dc1 in ~RefCountingPtr (this=0x9cd200) at
        build/ARM_FS/base/refcnt.hh:85
        #3  0x000000000077ebf9 in ~commitComm (this=0x9cd1a8) at
        build/ARM_FS/cpu/o3/comm.hh:153
        #4  0x000000000077ec49 in ~TimeBufStruct (this=0x9ccec8) at
        build/ARM_FS/cpu/o3/comm.hh:110
        #5  0x00000000007884c4 in TimeBuffer<TimeBufStruct<O3CPUImpl>
        >::advance
        (this=0x1991038) at build/ARM_FS/base/timebuf.hh:187
        #6  0x0000000000798c31 in FullO3CPU<O3CPUImpl>::tick
        (this=0x198d310) at
        build/ARM_FS/cpu/o3/cpu.cc:523

        I made this segfault goes away by overriding the idx 577 to 0.

        Thanks,

        On Wed, Jun 30, 2010 at 2:17 PM, Gabriel Michael Black <
        gbl...@eecs.umich.edu <mailto:gbl...@eecs.umich.edu>> wrote:

            Could you be more specific? There are a lot of register
            related indexes and
            I'm not sure exactly which ones you're talking about.
            Could you walk through
            what's happening from the illegal encoding, through the
            decoder, through the
            CPU and up to the segfault? I don't totally understand the
            mechanics of the
            failure at the moment, but my gut reaction is that the
            decoder should have
            returned an "Unknown(machInst)" when the index was out of
            bounds. I'm not
            convinced that's what's happening, though.


            Gabe

            Quoting Min Kyu Jeong <mkje...@gmail.com
            <mailto:mkje...@gmail.com>>:

             I just found a case somewhat related to this. Not exactly
            an assertion,

                but
                a segfault from the mispredicated path (non)instructions.

                When the operand register index is out of the range,
                the call to
                timeBuffer.advance() right after the renaming of such
                registers causes
                segfault . I bypassed this problem by making that
                out-of-bound register
                index to ZERO registers during the renaming (more
                particularly, during
                index
                flattening). I think raising a fault would be a better
                solution, but
                holding
                off from actually doing it. Any suggestion would be
                appreciated.

                ps. is the [m5-dev] tag in the title added by the
                mailing list, or should
                I
                add it myself?

                Thanks,

                Min

                On Mon, Jun 14, 2010 at 3:52 PM, Gabriel Michael Black <
                gbl...@eecs.umich.edu <mailto:gbl...@eecs.umich.edu>>
                wrote:

                 It's important to distinguish between M5 making
                sense, and the code it's

                    executing making sense. We shouldn't (and I hope
                    don't) have any asserts
                    that check conditions controllable from the
                    simulated code since those
                    should generally just cause a fault and may, as
                    you point out, be
                    mispeculated. It's fine to check that M5 is
                    internally consistent,
                    though.
                    This is supposed to work in all the CPU models and
                    as far as I know
                    generally does. M5's CPU models should, to the
                    first order, correctly do
                    whatever whacky, nonsensical things the
                    instruction memory tells it to do
                    without complaining. If you've found a case where
                    it doesn't (which has
                    happened before) please let us know so we can fix it.

                    Gabe


                    Quoting Min Kyu Jeong <mkje...@gmail.com
                    <mailto:mkje...@gmail.com>>:

                     Is it possible that the speculatively fetched
                    instructions can cause

                        programming assertions to fail? Until a branch
                        is resolved, whatever
                        (even
                        non-instructions) in the predicted path could
                        be fetched and decoded.
                        Can't
                        assertions on instruction sanity fail for those?

                        I am trying to make O3 CPU model for ARM
                        working. In many cases the
                        first
                        instruction is a branch followed by a
                        interrupt vector table. I was
                        wondering if such cases exist for other CPU
                        models and if it is, handled
                        how.

                        Thanks,

                        Min



                    _______________________________________________
                    m5-dev mailing list
                    m5-dev@m5sim.org <mailto:m5-dev@m5sim.org>
                    http://m5sim.org/mailman/listinfo/m5-dev




            _______________________________________________
            m5-dev mailing list
            m5-dev@m5sim.org <mailto:m5-dev@m5sim.org>
            http://m5sim.org/mailman/listinfo/m5-dev




    _______________________________________________
    m5-dev mailing list
    m5-dev@m5sim.org <mailto:m5-dev@m5sim.org>
    http://m5sim.org/mailman/listinfo/m5-dev


------------------------------------------------------------------------

_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev





_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to