Re: [gem5-dev] Bug in x86 stack instructions

Jason Lowe-Power Thu, 26 Jan 2017 09:07:49 -0800

I think I get it now. x86 is complicated!

Here's a first stab at the patch: http://reviews.gem5.org/r/3793/


Cheers,
Jason

On Thu, Jan 26, 2017 at 5:23 AM Gabe Black <gabebl...@google.com> wrote:

> As far as explicitly vs. implicitly using the stack segment, I wasn't able
> to find a very conclusive answer while looking through the manuals. This
> would be something to check by running a little test program on an actual
> computer, although I suspect implicit vs. explicit does matter.
>
> One interesting thing I found while looking for that which might be worth
> looking into is that instructions which reference rsp or rbp also
> implicitly use the stack segment. I remember knowing that, but I don't
> remember whether or not gem5 handles that properly.
>
> Unfortunately, we can't really keep track of the effective address size in
> the instruction, just whether we should be using the default size or the
> alternative size. This is because whether or not we're using the default is
> fixed based on the instruction encoding, while how that maps to the actual
> address size is based on that and also segmentation state. The decode cache
> doesn't (as far as I remember) keep track of that state, so it would
> potentially return an instruction which was set up based on now faulty
> assumptions.
>
> As far as whether or not AddrSizeFlagBit can be removed entirely,
> unfortunately it can't. The problem ultimately stems from the fact that
> gem5 assumes that virtual to physical mappings don't change the page
> alignment of a memory address. That means that the address that's presented
> to the CPU as a virtual address needs to have its potentially non-page
> aligned segment base value added into it already, and it has to be a full
> width address. For instance, if the data segment for a byte load has a base
> of 0x3 and an offset of 0xff8, the linear address (post segmentation
> address) you're accessing is going to be 0x1001. As far as the CPU is
> concerned the virtual address and the linear address look like they're from
> different cache lines and different pages.
>
> When presented to the TLB, the address than has to have the segment base
> removed from it again so that the original offset can be checked against
> the segment limit values. This gets tricky when the segment is expand down,
> since in that case the limit value is actually an effective lower bound,
> and the upper bound is either 0xFFFF or 0xFFFFFFFF depending on the size of
> the "d" bit in the appropriate segment descriptor. Once you do the
> subtraction, you need to know how to truncate the result you get, or sign
> extension, carry bits, etc., cause a problem and, if I remember correctly,
> cause spurious GP faults.
>
> This could all be simplified if the TLB applied the segment bases, but then
> I think that would likely complicate the CPU models significantly.
>
> Gabe
>
> On Wed, Jan 25, 2017 at 1:28 PM, Jason Lowe-Power <ja...@lowepower.com>
> wrote:
>
> > I'm leaning towards deleting the AddrSizeFlagBit. Gabe, can you shed some
> > light on the purpose of this flag? It's only ever used for masking
> > addresses in the TLB:
> > http://grok.gem5.org/xref/gem5/src/arch/x86/tlb.cc#318
> > http://grok.gem5.org/xref/gem5/src/arch/x86/tlb.cc#330
> >
> > From what I can understand, it should be unnecessary to mask the address
> > here. In the instruction implementation, the address is already masked if
> > it is a 32-bit mode instruction:
> > http://grok.gem5.org/xref/gem5/src/arch/x86/isa/microops/ldstop.isa#433
> >
> > It seems straightforward to add addressSize=ssz to the st instructions
> > where necessary. I'm confused as to why the TLB ever needs to know what
> the
> > address mode of the instruction is. Why/how would the TLB ever receive a
> > request with an address > 4GB for a 32-bit mode instruction?
> >
> > Thanks again for the help!
> >
> > Cheers,
> > Jason
> >
> > On Wed, Jan 25, 2017 at 10:37 AM Steve Reinhardt <ste...@gmail.com>
> wrote:
> >
> > > Thanks, Gabe! That's all really helpful. I appreciate you taking the
> time
> > > to respond.
> > >
> > > One concern I have is that the portion of the manual Jason quoted says
> > the
> > > rule holds "For instructions that
> > > *implicitly* access the stack segment (SS)" (emphasis mine)... would
> you
> > > say then that instructions that *explicitly* choose SS as the segment
> > > (e.g., with a segment override prefix) would not follow this rule? If
> so,
> > > then it may not be correct to do things inside the instruction based
> > solely
> > > on the fact that segment==SS. Or maybe that's a case that just can't
> > occur
> > > so we don't have to worry about it?
> > >
> > > Also, would there be any harm in setting the AddrSizeFlagBit based on
> > > whether addressSize==32 rather than looking at the legacy.addr bit? If
> > not,
> > > this seems like a simpler solution to me.
> > >
> > > Thanks again!
> > >
> > > Steve
> > >
> > >
> > > On Wed, Jan 25, 2017 at 12:41 AM, Gabe Black <gabebl...@google.com>
> > wrote:
> > >
> > > > Forgive me if I'm repeating a bit of what you said, Steve, but I
> > looked a
> > > > little more and there's more to fix.
> > > >
> > > > Also, if you look at x86/isa/microops/ldstop.isa, you'll find the
> > python
> > > > classes which receive the parameters of the load and store microops
> and
> > > > translate them into C++ instantiations of the underlying microop
> > classes.
> > > > In their __init__ methods, you'll see at the end where the
> > > AddrSizeFlagBit
> > > > is set based on whether or not the machInst.legacy.addr bit is set,
> > > > basically whether the address size prefix had been found when the
> > > > instruction was decoded. For the two classes involved (LdStOp and
> > > > BigLdStOp), I think you'll want to add a check which doesn't set
> > > > AddrSizeFlagBit if the segment is the stack segment.
> > > >
> > > > Additionally, you'll want to make sure the TLB knows to use the stack
> > > > segment size (as opposed to either the default or alternate address
> > > sizes)
> > > > when manipulating the segment offset. You could add a new ISA
> specific
> > > flag
> > > > for x86 memory requests (there seems to be room for exactly one more
> in
> > > > arch/x86/ldstflags.hh) and set it if the segment is the stack in
> those
> > ld
> > > > and st ops, similar to how AddrSizeFlagBit is being set. Then in the
> > TLB,
> > > > specifically where it's calculating logSize, you'll want to make it
> > > > recognize your new flag and use the stack size from the m5Reg (called
> > > just
> > > > m5Reg.stack, I think) instead of either of the address size values.
> > > >
> > > > This is a little gross because it means there's some calculation
> going
> > on
> > > > when address translation is being done (frequently) when it could
> have
> > > been
> > > > done earlier during decode (less frequently). One reason I did things
> > > this
> > > > way (all speculation, I've long since forgot) is that it makes the
> > > decoded
> > > > instructions hold less state. The same bytes may not always refer to
> > the
> > > > same sizes depending on the segmentation state in the machine, so if
> > > those
> > > > particular bytes are being decoded, the gem5 decode cache needs to
> > either
> > > > consider that segmentation state when deciding what instruction to
> spit
> > > > out. If it didn't, it might return an instruction which statically
> says
> > > to
> > > > use 32 bit addresses (for instance), when now segmentation state says
> > it
> > > > the same bytes should mean it should use 16 bit addresses. Instead,
> it
> > > canh
> > > > be factored in dynamically using external state (the m5Reg) to an
> > > > instruction that says which version of the sizes to use but doesn't
> > care
> > > > what values those actually map to at that moment.
> > > >
> > > > Gabe
> > > >
> > > > On Wed, Jan 25, 2017 at 12:03 AM, Gabe Black <gabebl...@google.com>
> > > wrote:
> > > >
> > > > > Hi. I think you guys are generally interpreting this code
> correctly,
> > > but
> > > > > I'll throw in my two cents. If you look at the pseudo code here:
> > > > >
> > > > > http://www.felixcloutier.com/x86/CALL.html
> > > > >
> > > > > then it appears that the size of the return address you push onto
> the
> > > > > stack should be based on the operand size, ie. the data size. That
> > > would
> > > > > mean that the displacement for the store should be -env.dataSize
> like
> > > it
> > > > > is. By the same logic, the subtraction below it should also be
> > > > env.dataSize
> > > > > (symbolically dsz) and not ssz.
> > > > >
> > > > > You are also correct, I think, that since the instruction
> references
> > > the
> > > > > stack segment, it's address size should be set to the stack size
> when
> > > > doing
> > > > > memory operations. Adding addressSize=ssz to the st microop should
> do
> > > > that.
> > > > >
> > > > > According to that pseudo code, the size of the target IP is also
> the
> > > > > operand size, but that's the size the wrip microop will use by
> > default
> > > > and
> > > > > so should be fine as written.
> > > > >
> > > > > I think if you make those two changes (change ssz to dsz in the
> sub,
> > > add
> > > > > addressSize=ssz to the st), then this macroop definition will be
> > > correct.
> > > > > It would be worthwhile to check the other definitions of CALL and
> > make
> > > > sure
> > > > > they aren't similarly broken.
> > > > >
> > > > > Gabe
> > > > >
> > > > > On Tue, Jan 24, 2017 at 3:28 PM, Jason Lowe-Power <
> > ja...@lowepower.com
> > > >
> > > > > wrote:
> > > > >
> > > > >> Thanks for helping me work this out.
> > > > >>
> > > > >> I got the binary working by completely removing the
> AddrSizeFlagBit.
> > > The
> > > > >> only place this bit is used is in the LdStOp (it's set to true if
> > > > >> legacy.addr is true) and then, in the TLB, if the AddrSizeFlagBit
> is
> > > set
> > > > >> it
> > > > >> truncates the address to 32-bits removing the upper-order bits.
> > > > >>
> > > > >> By removing this flag bit, and adding "addressSize=ssz" to the st
> > > > >> micro-op,
> > > > >> the binary is now working.
> > > > >>
> > > > >> My next question: How can I have any confidence that this doesn't
> > > break
> > > > >> other things? Especially when we're talking about 32-bit
> > compatibility
> > > > >> mode. Other than the hello32 binary, are there other things I
> should
> > > > run?
> > > > >>
> > > > >> My intuition says this is an OK change. The addr field in the
> > Request
> > > > >> should never have the upper-order 32 bits set if the instruction
> is
> > a
> > > > >> 32-bit-mode instruction. The instruction implementation already
> > takes
> > > > care
> > > > >> of this, as I found with this bug.
> > > > >>
> > > > >> Other thoughts? Does anyone know if it is definitely required to
> use
> > > > >> AddrSizeFlagBit to truncate the address in the TLB?
> > > > >>
> > > > >> If not, I'll post a patch for this tomorrow.
> > > > >>
> > > > >> Thanks,
> > > > >> Jason
> > > > >>
> > > > >> On Tue, Jan 24, 2017 at 12:19 PM Steve Reinhardt <
> ste...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Hmm, seems like there's a little bit of an inconsistency in that
> > the
> > > > >> > request is using the legacy.addr bit (which is set by the
> decoder
> > > when
> > > > >> it
> > > > >> > sees the address size override prefix [1]) directly, while the
> > > > >> legacy.addr
> > > > >> > bit is also used to calculate the addressSize value [2] but can
> be
> > > > >> > overridden (as we are doing). So if addressSize is overridden
> then
> > > > >> > AddrSizeFlagBit should no longer be set based on legacy.addr.
> > > > >> >
> > > > >> > Or another way of looking at it is that the same process of
> > checking
> > > > >> > legacy.addr to override the address size is done in two places,
> > once
> > > > in
> > > > >> the
> > > > >> > decoder [2] and again in the TLB [3] and it's not clear how to
> > > > suppress
> > > > >> the
> > > > >> > latter one.
> > > > >> >
> > > > >> > I suppose one gnarly way of doing it would be to go into the
> > > micro-op
> > > > >> > definition somewhere and clear the AddrSizeFlagBit out of
> memFlags
> > > if
> > > > >> > addressSize != env.addressSize (i.e., the address size was
> > > > overridden).
> > > > >> > There's probably a cleaner way, but that's the easiest path I
> can
> > > > think
> > > > >> of
> > > > >> > (if it even works).
> > > > >> >
> > > > >> > [1] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#195
> > > > >> > [2] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#390
> > > > >> > [3]
> http://grok.gem5.org/source/xref/gem5/src/arch/x86/tlb.cc#315
> > > > >> >
> > > > >> >
> > > > >> > On Tue, Jan 24, 2017 at 8:37 AM, Jason Lowe-Power <
> > > > ja...@lowepower.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Hi Steve,
> > > > >> > >
> > > > >> > > That was super helpful. I'm now a step closer to solving this!
> > > > >> > >
> > > > >> > > Your suggestion of =ssz, lead me to search for the uses of
> that
> > in
> > > > >> x86,
> > > > >> > and
> > > > >> > > it turns out that almost all of other stack instructions have
> > > > >> > dataSize=ssz.
> > > > >> > > So, I added both dataSize=ssz and addressSize=ssz to the call
> > > > >> > instruction,
> > > > >> > > though I think only the addressSize is actually needed, but
> I'm
> > > not
> > > > >> > > certain.
> > > > >> > >
> > > > >> > > Now, the address is passed to the Request object correctly,
> but
> > > the
> > > > >> > program
> > > > >> > > still fails. The problem now is that the request object is
> > getting
> > > > >> > > the AddrSizeFlagBit set to true, because machInst.legacy.addr
> is
> > > > true.
> > > > >> > > Thus, the TLB only uses the lower 32 bits of the 64-bit
> address.
> > > > >> > >
> > > > >> > > Any ideas on how to change the instruction's memFlags from the
> > > > >> macro-op
> > > > >> > > definition? They are set on
> > > > >> > > http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
> > > > >> > > x86/isa/microops/ldstop.isa#l334
> > > > >> > > .
> > > > >> > >
> > > > >> > > It would be nice if we could fix this in the decoder. I think
> > the
> > > > >> logic
> > > > >> > > should be "if the address prefix is set and this is an
> implicit
> > > > stack
> > > > >> > > operation, ignore the address prefix". However, I'm not sure
> how
> > > to
> > > > >> tell
> > > > >> > if
> > > > >> > > the instruction is "an implicit stack operation" from the
> > decoder.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Jason
> > > > >> > >
> > > > >> > > On Tue, Jan 24, 2017 at 9:05 AM Steve Reinhardt <
> > ste...@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > My recollection of how all this works is that the arguments to
> > the
> > > > >> 'st'
> > > > >> > > micro-op get turned into arguments to a call to the StoreOp
> > > > >> constructor:
> > > > >> > >
> > > > >> > >    597 <http://repo.gem5.o, and that address doesn't
> > > > >> > > existrg/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/
> > > > >> ldstop.isa#l597
> > > > >> > > <http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
> > > > >> > > x86/isa/microops/ldstop.isa#l597>
> > > > >> > > >
> > > > >> > >         class StoreOp(LdStOp):
> > > > >> > >    598 <
> > > > >> > > http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
> > > > >> > > x86/isa/microops/ldstop.isa#l598
> > > > >> > > >
> > > > >> > >             def __init__(self, data, segment, addr, disp = 0,
> > > > >> > >    599 <
> > > > >> > > http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
> > > > >> > > x86/isa/microops/ldstop.isa#l599
> > > > >> > > >
> > > > >> > >                     dataSize="env.dataSize",
> > > > >> > >    600 <
> > > > >> > > http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
> > > > >> > > x86/isa/microops/ldstop.isa#l600
> > > > >> > > >
> > > > >> > >                     addressSize="env.addressSize",
> > > > >> > >    601 <
> > > > >> > > http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
> > > > >> > > x86/isa/microops/ldstop.isa#l601
> > > > >> > > >
> > > > >> > >                     atCPL0=False, nonSpec=False):
> > > > >> > >
> > > > >> > > so the "-env.dataSize" you see is actually the displacement
> for
> > > the
> > > > >> > store,
> > > > >> > > not the dataSize or addressSize.  I think the problem is that
> > the
> > > > >> > > addressSize is using the env,addressSize that's calculated the
> > > > >> "normal"
> > > > >> > > way, i.e., including the effects of the override prefix.
> > > > >> > >
> > > > >> > > Poking around some more, there's an 'ssz' symbol that maps to
> > > > >> > env.stackSize
> > > > >> > > [1] which gets used a lot in stack operations; for example,
> > right
> > > > >> after
> > > > >> > the
> > > > >> > > store in CALL_NEAR_I, 'ssz' is subtracted off of the stack
> > > pointer.
> > > > >> The
> > > > >> > > calculation of env.stackSize seems to follow the rule you
> > > mentioned
> > > > >> about
> > > > >> > > being fixed at 64 bits in 64-bit mode [2, 3].
> > > > >> > >
> > > > >> > > So it might be sufficient to add ', addressSize=ssz' to the
> end
> > of
> > > > the
> > > > >> > 'st'
> > > > >> > > micro-op. Oddly though I don't see addressSize being set on
> any
> > > > other
> > > > >> > stack
> > > > >> > > ops (just dataSize), so I wonder if this is a problem with
> other
> > > > stack
> > > > >> > > instructions or not. If so, it might be useful to have an
> > override
> > > > >> > > hardwired in at some lower level to check if the segment is ss
> > and
> > > > >> force
> > > > >> > > the address size to be stackSize (rather than adding this
> extra
> > > > >> parameter
> > > > >> > > in a dozen different places). I wouldn't know where best to do
> > > that
> > > > >> > though,
> > > > >> > > so the manual override seems best for now.
> > > > >> > >
> > > > >> > > Steve
> > > > >> > >
> > > > >> > > [1] http://grok.gem5.org/xref/gem5/src/arch/x86/isa/
> > > > microasm.isa#107
> > > > >> > > [2] http://grok.gem5.org/xref/gem5/src/arch/x86/isa.cc#91
> > > > >> > > [3]
> http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#400
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > On Mon, Jan 23, 2017 at 4:04 PM, Jason Lowe-Power <
> > > > >> ja...@lowepower.com>
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > To those of you with more x86 ISA implementation knowledge
> > than
> > > I
> > > > >> have:
> > > > >> > > >
> > > > >> > > > I've been working through a bug one of our users found
> (thanks
> > > > >> > > Sanchayan!).
> > > > >> > > > It looks like current versions of ld use the 0x67
> instruction
> > > > prefix
> > > > >> > > > (address size override) as an optimization instead of using
> a
> > > nop.
> > > > >> See
> > > > >> > > >
> https://www.sourceware.org/ml/binutils/2016-05/msg00323.html.
> > > > >> > > >
> > > > >> > > > This causes the call instruction to be decoded with with the
> > > > >> "address
> > > > >> > > size
> > > > >> > > > override prefix", which is correct, in a sense. From what I
> > can
> > > > >> tell,
> > > > >> > > this
> > > > >> > > > is passed to the call instruction via "-env.dataSize" (see
> > call
> > > > >> > > instruction
> > > > >> > > > implementation below).
> > > > >> > > >
> > > > >> > > > def macroop CALL_NEAR_I
> > > > >> > > > {
> > > > >> > > >     # Make the default data size of calls 64 bits in 64 bit
> > mode
> > > > >> > > >     .adjust_env oszIn64Override
> > > > >> > > >     .function_call
> > > > >> > > >
> > > > >> > > >     limm t1, imm
> > > > >> > > >     rdip t7
> > > > >> > > >     # Check target of call
> > > > >> > > >     st t7, ss, [0, t0, rsp], "-env.dataSize"
> > > > >> > > >     subi rsp, rsp, ssz
> > > > >> > > >     wrip t7, t1
> > > > >> > > > };
> > > > >> > > >
> > > > >> > > > Now, the bug is, according to the x86 manual, "For
> > instructions
> > > > that
> > > > >> > > > implicitly access the stack segment (SS), the address size
> for
> > > > stack
> > > > >> > > > accesses is determined by the D (default) bit in the
> > > stack-segment
> > > > >> > > > descriptor. In 64-bit mode, the D bit is ignored, and all
> > stack
> > > > >> > > references
> > > > >> > > > have a 64-bit address size." See
> > > > >> > > > https://support.amd.com/TechDocs/24594.pdf page
> > > > >> > > > 9.
> > > > >> > > >
> > > > >> > > > Thoughts on how to fix this?
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Jason
> > > > >> > > > _______________________________________________
> > > > >> > > > gem5-dev mailing list
> > > > >> > > > gem5-dev@gem5.org
> > > > >> > > > http://m5sim.org/mailman/listinfo/gem5-dev
> > > > >> > > >
> > > > >> > > _______________________________________________
> > > > >> > > gem5-dev mailing list
> > > > >> > > gem5-dev@gem5.org
> > > > >> > > http://m5sim.org/mailman/listinfo/gem5-dev
> > > > >> > >
> > > > >> > > --
> > > > >> > >
> > > > >> > > Jason
> > > > >> > > _______________________________________________
> > > > >> > > gem5-dev mailing list
> > > > >> > > gem5-dev@gem5.org
> > > > >> > > http://m5sim.org/mailman/listinfo/gem5-dev
> > > > >> > >
> > > > >> > _______________________________________________
> > > > >> > gem5-dev mailing list
> > > > >> > gem5-dev@gem5.org
> > > > >> > http://m5sim.org/mailman/listinfo/gem5-dev
> > > > >> >
> > > > >> _______________________________________________
> > > > >> gem5-dev mailing list
> > > > >> gem5-dev@gem5.org
> > > > >> http://m5sim.org/mailman/listinfo/gem5-dev
> > > > >>
> > > > >
> > > > >
> > > > _______________________________________________
> > > > gem5-dev mailing list
> > > > gem5-dev@gem5.org
> > > > http://m5sim.org/mailman/listinfo/gem5-dev
> > > >
> > > _______________________________________________
> > > gem5-dev mailing list
> > > gem5-dev@gem5.org
> > > http://m5sim.org/mailman/listinfo/gem5-dev
> > >
> > _______________________________________________
> > gem5-dev mailing list
> > gem5-dev@gem5.org
> > http://m5sim.org/mailman/listinfo/gem5-dev
> >
> _______________________________________________
> gem5-dev mailing list
> gem5-dev@gem5.org
> http://m5sim.org/mailman/listinfo/gem5-dev
>
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Bug in x86 stack instructions

Reply via email to