Re: [gem5-dev] Bug in x86 stack instructions

Gabe Black Wed, 25 Jan 2017 17:55:32 -0800

As far as explicitly vs. implicitly using the stack segment, I wasn't able
to find a very conclusive answer while looking through the manuals. This
would be something to check by running a little test program on an actual
computer, although I suspect implicit vs. explicit does matter.


One interesting thing I found while looking for that which might be worth
looking into is that instructions which reference rsp or rbp also
implicitly use the stack segment. I remember knowing that, but I don't
remember whether or not gem5 handles that properly.

Unfortunately, we can't really keep track of the effective address size in
the instruction, just whether we should be using the default size or the
alternative size. This is because whether or not we're using the default is
fixed based on the instruction encoding, while how that maps to the actual
address size is based on that and also segmentation state. The decode cache
doesn't (as far as I remember) keep track of that state, so it would
potentially return an instruction which was set up based on now faulty
assumptions.

As far as whether or not AddrSizeFlagBit can be removed entirely,
unfortunately it can't. The problem ultimately stems from the fact that
gem5 assumes that virtual to physical mappings don't change the page
alignment of a memory address. That means that the address that's presented
to the CPU as a virtual address needs to have its potentially non-page
aligned segment base value added into it already, and it has to be a full
width address. For instance, if the data segment for a byte load has a base
of 0x3 and an offset of 0xff8, the linear address (post segmentation
address) you're accessing is going to be 0x1001. As far as the CPU is
concerned the virtual address and the linear address look like they're from
different cache lines and different pages.

When presented to the TLB, the address than has to have the segment base
removed from it again so that the original offset can be checked against
the segment limit values. This gets tricky when the segment is expand down,
since in that case the limit value is actually an effective lower bound,
and the upper bound is either 0xFFFF or 0xFFFFFFFF depending on the size of
the "d" bit in the appropriate segment descriptor. Once you do the
subtraction, you need to know how to truncate the result you get, or sign
extension, carry bits, etc., cause a problem and, if I remember correctly,
cause spurious GP faults.

This could all be simplified if the TLB applied the segment bases, but then
I think that would likely complicate the CPU models significantly.

Gabe

On Wed, Jan 25, 2017 at 1:28 PM, Jason Lowe-Power <ja...@lowepower.com>
wrote:

> I'm leaning towards deleting the AddrSizeFlagBit. Gabe, can you shed some
> light on the purpose of this flag? It's only ever used for masking
> addresses in the TLB:
> http://grok.gem5.org/xref/gem5/src/arch/x86/tlb.cc#318
> http://grok.gem5.org/xref/gem5/src/arch/x86/tlb.cc#330
>
> From what I can understand, it should be unnecessary to mask the address
> here. In the instruction implementation, the address is already masked if
> it is a 32-bit mode instruction:
> http://grok.gem5.org/xref/gem5/src/arch/x86/isa/microops/ldstop.isa#433
>
> It seems straightforward to add addressSize=ssz to the st instructions
> where necessary. I'm confused as to why the TLB ever needs to know what the
> address mode of the instruction is. Why/how would the TLB ever receive a
> request with an address > 4GB for a 32-bit mode instruction?
>
> Thanks again for the help!
>
> Cheers,
> Jason
>
> On Wed, Jan 25, 2017 at 10:37 AM Steve Reinhardt <ste...@gmail.com> wrote:
>
> > Thanks, Gabe! That's all really helpful. I appreciate you taking the time
> > to respond.
> >
> > One concern I have is that the portion of the manual Jason quoted says
> the
> > rule holds "For instructions that
> > *implicitly* access the stack segment (SS)" (emphasis mine)... would you
> > say then that instructions that *explicitly* choose SS as the segment
> > (e.g., with a segment override prefix) would not follow this rule? If so,
> > then it may not be correct to do things inside the instruction based
> solely
> > on the fact that segment==SS. Or maybe that's a case that just can't
> occur
> > so we don't have to worry about it?
> >
> > Also, would there be any harm in setting the AddrSizeFlagBit based on
> > whether addressSize==32 rather than looking at the legacy.addr bit? If
> not,
> > this seems like a simpler solution to me.
> >
> > Thanks again!
> >
> > Steve
> >
> >
> > On Wed, Jan 25, 2017 at 12:41 AM, Gabe Black <gabebl...@google.com>
> wrote:
> >
> > > Forgive me if I'm repeating a bit of what you said, Steve, but I
> looked a
> > > little more and there's more to fix.
> > >
> > > Also, if you look at x86/isa/microops/ldstop.isa, you'll find the
> python
> > > classes which receive the parameters of the load and store microops and
> > > translate them into C++ instantiations of the underlying microop
> classes.
> > > In their __init__ methods, you'll see at the end where the
> > AddrSizeFlagBit
> > > is set based on whether or not the machInst.legacy.addr bit is set,
> > > basically whether the address size prefix had been found when the
> > > instruction was decoded. For the two classes involved (LdStOp and
> > > BigLdStOp), I think you'll want to add a check which doesn't set
> > > AddrSizeFlagBit if the segment is the stack segment.
> > >
> > > Additionally, you'll want to make sure the TLB knows to use the stack
> > > segment size (as opposed to either the default or alternate address
> > sizes)
> > > when manipulating the segment offset. You could add a new ISA specific
> > flag
> > > for x86 memory requests (there seems to be room for exactly one more in
> > > arch/x86/ldstflags.hh) and set it if the segment is the stack in those
> ld
> > > and st ops, similar to how AddrSizeFlagBit is being set. Then in the
> TLB,
> > > specifically where it's calculating logSize, you'll want to make it
> > > recognize your new flag and use the stack size from the m5Reg (called
> > just
> > > m5Reg.stack, I think) instead of either of the address size values.
> > >
> > > This is a little gross because it means there's some calculation going
> on
> > > when address translation is being done (frequently) when it could have
> > been
> > > done earlier during decode (less frequently). One reason I did things
> > this
> > > way (all speculation, I've long since forgot) is that it makes the
> > decoded
> > > instructions hold less state. The same bytes may not always refer to
> the
> > > same sizes depending on the segmentation state in the machine, so if
> > those
> > > particular bytes are being decoded, the gem5 decode cache needs to
> either
> > > consider that segmentation state when deciding what instruction to spit
> > > out. If it didn't, it might return an instruction which statically says
> > to
> > > use 32 bit addresses (for instance), when now segmentation state says
> it
> > > the same bytes should mean it should use 16 bit addresses. Instead, it
> > canh
> > > be factored in dynamically using external state (the m5Reg) to an
> > > instruction that says which version of the sizes to use but doesn't
> care
> > > what values those actually map to at that moment.
> > >
> > > Gabe
> > >
> > > On Wed, Jan 25, 2017 at 12:03 AM, Gabe Black <gabebl...@google.com>
> > wrote:
> > >
> > > > Hi. I think you guys are generally interpreting this code correctly,
> > but
> > > > I'll throw in my two cents. If you look at the pseudo code here:
> > > >
> > > > http://www.felixcloutier.com/x86/CALL.html
> > > >
> > > > then it appears that the size of the return address you push onto the
> > > > stack should be based on the operand size, ie. the data size. That
> > would
> > > > mean that the displacement for the store should be -env.dataSize like
> > it
> > > > is. By the same logic, the subtraction below it should also be
> > > env.dataSize
> > > > (symbolically dsz) and not ssz.
> > > >
> > > > You are also correct, I think, that since the instruction references
> > the
> > > > stack segment, it's address size should be set to the stack size when
> > > doing
> > > > memory operations. Adding addressSize=ssz to the st microop should do
> > > that.
> > > >
> > > > According to that pseudo code, the size of the target IP is also the
> > > > operand size, but that's the size the wrip microop will use by
> default
> > > and
> > > > so should be fine as written.
> > > >
> > > > I think if you make those two changes (change ssz to dsz in the sub,
> > add
> > > > addressSize=ssz to the st), then this macroop definition will be
> > correct.
> > > > It would be worthwhile to check the other definitions of CALL and
> make
> > > sure
> > > > they aren't similarly broken.
> > > >
> > > > Gabe
> > > >
> > > > On Tue, Jan 24, 2017 at 3:28 PM, Jason Lowe-Power <
> ja...@lowepower.com
> > >
> > > > wrote:
> > > >
> > > >> Thanks for helping me work this out.
> > > >>
> > > >> I got the binary working by completely removing the AddrSizeFlagBit.
> > The
> > > >> only place this bit is used is in the LdStOp (it's set to true if
> > > >> legacy.addr is true) and then, in the TLB, if the AddrSizeFlagBit is
> > set
> > > >> it
> > > >> truncates the address to 32-bits removing the upper-order bits.
> > > >>
> > > >> By removing this flag bit, and adding "addressSize=ssz" to the st
> > > >> micro-op,
> > > >> the binary is now working.
> > > >>
> > > >> My next question: How can I have any confidence that this doesn't
> > break
> > > >> other things? Especially when we're talking about 32-bit
> compatibility
> > > >> mode. Other than the hello32 binary, are there other things I should
> > > run?
> > > >>
> > > >> My intuition says this is an OK change. The addr field in the
> Request
> > > >> should never have the upper-order 32 bits set if the instruction is
> a
> > > >> 32-bit-mode instruction. The instruction implementation already
> takes
> > > care
> > > >> of this, as I found with this bug.
> > > >>
> > > >> Other thoughts? Does anyone know if it is definitely required to use
> > > >> AddrSizeFlagBit to truncate the address in the TLB?
> > > >>
> > > >> If not, I'll post a patch for this tomorrow.
> > > >>
> > > >> Thanks,
> > > >> Jason
> > > >>
> > > >> On Tue, Jan 24, 2017 at 12:19 PM Steve Reinhardt <ste...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hmm, seems like there's a little bit of an inconsistency in that
> the
> > > >> > request is using the legacy.addr bit (which is set by the decoder
> > when
> > > >> it
> > > >> > sees the address size override prefix [1]) directly, while the
> > > >> legacy.addr
> > > >> > bit is also used to calculate the addressSize value [2] but can be
> > > >> > overridden (as we are doing). So if addressSize is overridden then
> > > >> > AddrSizeFlagBit should no longer be set based on legacy.addr.
> > > >> >
> > > >> > Or another way of looking at it is that the same process of
> checking
> > > >> > legacy.addr to override the address size is done in two places,
> once
> > > in
> > > >> the
> > > >> > decoder [2] and again in the TLB [3] and it's not clear how to
> > > suppress
> > > >> the
> > > >> > latter one.
> > > >> >
> > > >> > I suppose one gnarly way of doing it would be to go into the
> > micro-op
> > > >> > definition somewhere and clear the AddrSizeFlagBit out of memFlags
> > if
> > > >> > addressSize != env.addressSize (i.e., the address size was
> > > overridden).
> > > >> > There's probably a cleaner way, but that's the easiest path I can
> > > think
> > > >> of
> > > >> > (if it even works).
> > > >> >
> > > >> > [1] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#195
> > > >> > [2] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#390
> > > >> > [3] http://grok.gem5.org/source/xref/gem5/src/arch/x86/tlb.cc#315
> > > >> >
> > > >> >
> > > >> > On Tue, Jan 24, 2017 at 8:37 AM, Jason Lowe-Power <
> > > ja...@lowepower.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Hi Steve,
> > > >> > >
> > > >> > > That was super helpful. I'm now a step closer to solving this!
> > > >> > >
> > > >> > > Your suggestion of =ssz, lead me to search for the uses of that
> in
> > > >> x86,
> > > >> > and
> > > >> > > it turns out that almost all of other stack instructions have
> > > >> > dataSize=ssz.
> > > >> > > So, I added both dataSize=ssz and addressSize=ssz to the call
> > > >> > instruction,
> > > >> > > though I think only the addressSize is actually needed, but I'm
> > not
> > > >> > > certain.
> > > >> > >
> > > >> > > Now, the address is passed to the Request object correctly, but
> > the
> > > >> > program
> > > >> > > still fails. The problem now is that the request object is
> getting
> > > >> > > the AddrSizeFlagBit set to true, because machInst.legacy.addr is
> > > true.
> > > >> > > Thus, the TLB only uses the lower 32 bits of the 64-bit address.
> > > >> > >
> > > >> > > Any ideas on how to change the instruction's memFlags from the
> > > >> macro-op
> > > >> > > definition? They are set on
> > > >> > > http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
> > > >> > > x86/isa/microops/ldstop.isa#l334
> > > >> > > .
> > > >> > >
> > > >> > > It would be nice if we could fix this in the decoder. I think
> the
> > > >> logic
> > > >> > > should be "if the address prefix is set and this is an implicit
> > > stack
> > > >> > > operation, ignore the address prefix". However, I'm not sure how
> > to
> > > >> tell
> > > >> > if
> > > >> > > the instruction is "an implicit stack operation" from the
> decoder.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Jason
> > > >> > >
> > > >> > > On Tue, Jan 24, 2017 at 9:05 AM Steve Reinhardt <
> ste...@gmail.com
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > > My recollection of how all this works is that the arguments to
> the
> > > >> 'st'
> > > >> > > micro-op get turned into arguments to a call to the StoreOp
> > > >> constructor:
> > > >> > >
> > > >> > >    597 <http://repo.gem5.o, and that address doesn't
> > > >> > > existrg/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/
> > > >> ldstop.isa#l597
> > > >> > > <http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
> > > >> > > x86/isa/microops/ldstop.isa#l597>
> > > >> > > >
> > > >> > >         class StoreOp(LdStOp):
> > > >> > >    598 <
> > > >> > > http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
> > > >> > > x86/isa/microops/ldstop.isa#l598
> > > >> > > >
> > > >> > >             def __init__(self, data, segment, addr, disp = 0,
> > > >> > >    599 <
> > > >> > > http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
> > > >> > > x86/isa/microops/ldstop.isa#l599
> > > >> > > >
> > > >> > >                     dataSize="env.dataSize",
> > > >> > >    600 <
> > > >> > > http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
> > > >> > > x86/isa/microops/ldstop.isa#l600
> > > >> > > >
> > > >> > >                     addressSize="env.addressSize",
> > > >> > >    601 <
> > > >> > > http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/
> > > >> > > x86/isa/microops/ldstop.isa#l601
> > > >> > > >
> > > >> > >                     atCPL0=False, nonSpec=False):
> > > >> > >
> > > >> > > so the "-env.dataSize" you see is actually the displacement for
> > the
> > > >> > store,
> > > >> > > not the dataSize or addressSize.  I think the problem is that
> the
> > > >> > > addressSize is using the env,addressSize that's calculated the
> > > >> "normal"
> > > >> > > way, i.e., including the effects of the override prefix.
> > > >> > >
> > > >> > > Poking around some more, there's an 'ssz' symbol that maps to
> > > >> > env.stackSize
> > > >> > > [1] which gets used a lot in stack operations; for example,
> right
> > > >> after
> > > >> > the
> > > >> > > store in CALL_NEAR_I, 'ssz' is subtracted off of the stack
> > pointer.
> > > >> The
> > > >> > > calculation of env.stackSize seems to follow the rule you
> > mentioned
> > > >> about
> > > >> > > being fixed at 64 bits in 64-bit mode [2, 3].
> > > >> > >
> > > >> > > So it might be sufficient to add ', addressSize=ssz' to the end
> of
> > > the
> > > >> > 'st'
> > > >> > > micro-op. Oddly though I don't see addressSize being set on any
> > > other
> > > >> > stack
> > > >> > > ops (just dataSize), so I wonder if this is a problem with other
> > > stack
> > > >> > > instructions or not. If so, it might be useful to have an
> override
> > > >> > > hardwired in at some lower level to check if the segment is ss
> and
> > > >> force
> > > >> > > the address size to be stackSize (rather than adding this extra
> > > >> parameter
> > > >> > > in a dozen different places). I wouldn't know where best to do
> > that
> > > >> > though,
> > > >> > > so the manual override seems best for now.
> > > >> > >
> > > >> > > Steve
> > > >> > >
> > > >> > > [1] http://grok.gem5.org/xref/gem5/src/arch/x86/isa/
> > > microasm.isa#107
> > > >> > > [2] http://grok.gem5.org/xref/gem5/src/arch/x86/isa.cc#91
> > > >> > > [3] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#400
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Mon, Jan 23, 2017 at 4:04 PM, Jason Lowe-Power <
> > > >> ja...@lowepower.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > To those of you with more x86 ISA implementation knowledge
> than
> > I
> > > >> have:
> > > >> > > >
> > > >> > > > I've been working through a bug one of our users found (thanks
> > > >> > > Sanchayan!).
> > > >> > > > It looks like current versions of ld use the 0x67 instruction
> > > prefix
> > > >> > > > (address size override) as an optimization instead of using a
> > nop.
> > > >> See
> > > >> > > > https://www.sourceware.org/ml/binutils/2016-05/msg00323.html.
> > > >> > > >
> > > >> > > > This causes the call instruction to be decoded with with the
> > > >> "address
> > > >> > > size
> > > >> > > > override prefix", which is correct, in a sense. From what I
> can
> > > >> tell,
> > > >> > > this
> > > >> > > > is passed to the call instruction via "-env.dataSize" (see
> call
> > > >> > > instruction
> > > >> > > > implementation below).
> > > >> > > >
> > > >> > > > def macroop CALL_NEAR_I
> > > >> > > > {
> > > >> > > >     # Make the default data size of calls 64 bits in 64 bit
> mode
> > > >> > > >     .adjust_env oszIn64Override
> > > >> > > >     .function_call
> > > >> > > >
> > > >> > > >     limm t1, imm
> > > >> > > >     rdip t7
> > > >> > > >     # Check target of call
> > > >> > > >     st t7, ss, [0, t0, rsp], "-env.dataSize"
> > > >> > > >     subi rsp, rsp, ssz
> > > >> > > >     wrip t7, t1
> > > >> > > > };
> > > >> > > >
> > > >> > > > Now, the bug is, according to the x86 manual, "For
> instructions
> > > that
> > > >> > > > implicitly access the stack segment (SS), the address size for
> > > stack
> > > >> > > > accesses is determined by the D (default) bit in the
> > stack-segment
> > > >> > > > descriptor. In 64-bit mode, the D bit is ignored, and all
> stack
> > > >> > > references
> > > >> > > > have a 64-bit address size." See
> > > >> > > > https://support.amd.com/TechDocs/24594.pdf page
> > > >> > > > 9.
> > > >> > > >
> > > >> > > > Thoughts on how to fix this?
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Jason
> > > >> > > > _______________________________________________
> > > >> > > > gem5-dev mailing list
> > > >> > > > gem5-dev@gem5.org
> > > >> > > > http://m5sim.org/mailman/listinfo/gem5-dev
> > > >> > > >
> > > >> > > _______________________________________________
> > > >> > > gem5-dev mailing list
> > > >> > > gem5-dev@gem5.org
> > > >> > > http://m5sim.org/mailman/listinfo/gem5-dev
> > > >> > >
> > > >> > > --
> > > >> > >
> > > >> > > Jason
> > > >> > > _______________________________________________
> > > >> > > gem5-dev mailing list
> > > >> > > gem5-dev@gem5.org
> > > >> > > http://m5sim.org/mailman/listinfo/gem5-dev
> > > >> > >
> > > >> > _______________________________________________
> > > >> > gem5-dev mailing list
> > > >> > gem5-dev@gem5.org
> > > >> > http://m5sim.org/mailman/listinfo/gem5-dev
> > > >> >
> > > >> _______________________________________________
> > > >> gem5-dev mailing list
> > > >> gem5-dev@gem5.org
> > > >> http://m5sim.org/mailman/listinfo/gem5-dev
> > > >>
> > > >
> > > >
> > > _______________________________________________
> > > gem5-dev mailing list
> > > gem5-dev@gem5.org
> > > http://m5sim.org/mailman/listinfo/gem5-dev
> > >
> > _______________________________________________
> > gem5-dev mailing list
> > gem5-dev@gem5.org
> > http://m5sim.org/mailman/listinfo/gem5-dev
> >
> _______________________________________________
> gem5-dev mailing list
> gem5-dev@gem5.org
> http://m5sim.org/mailman/listinfo/gem5-dev
>
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Bug in x86 stack instructions

Reply via email to