I think I get it now. x86 is complicated! Here's a first stab at the patch: http://reviews.gem5.org/r/3793/
Cheers, Jason On Thu, Jan 26, 2017 at 5:23 AM Gabe Black <gabebl...@google.com> wrote: > As far as explicitly vs. implicitly using the stack segment, I wasn't able > to find a very conclusive answer while looking through the manuals. This > would be something to check by running a little test program on an actual > computer, although I suspect implicit vs. explicit does matter. > > One interesting thing I found while looking for that which might be worth > looking into is that instructions which reference rsp or rbp also > implicitly use the stack segment. I remember knowing that, but I don't > remember whether or not gem5 handles that properly. > > Unfortunately, we can't really keep track of the effective address size in > the instruction, just whether we should be using the default size or the > alternative size. This is because whether or not we're using the default is > fixed based on the instruction encoding, while how that maps to the actual > address size is based on that and also segmentation state. The decode cache > doesn't (as far as I remember) keep track of that state, so it would > potentially return an instruction which was set up based on now faulty > assumptions. > > As far as whether or not AddrSizeFlagBit can be removed entirely, > unfortunately it can't. The problem ultimately stems from the fact that > gem5 assumes that virtual to physical mappings don't change the page > alignment of a memory address. That means that the address that's presented > to the CPU as a virtual address needs to have its potentially non-page > aligned segment base value added into it already, and it has to be a full > width address. For instance, if the data segment for a byte load has a base > of 0x3 and an offset of 0xff8, the linear address (post segmentation > address) you're accessing is going to be 0x1001. As far as the CPU is > concerned the virtual address and the linear address look like they're from > different cache lines and different pages. > > When presented to the TLB, the address than has to have the segment base > removed from it again so that the original offset can be checked against > the segment limit values. This gets tricky when the segment is expand down, > since in that case the limit value is actually an effective lower bound, > and the upper bound is either 0xFFFF or 0xFFFFFFFF depending on the size of > the "d" bit in the appropriate segment descriptor. Once you do the > subtraction, you need to know how to truncate the result you get, or sign > extension, carry bits, etc., cause a problem and, if I remember correctly, > cause spurious GP faults. > > This could all be simplified if the TLB applied the segment bases, but then > I think that would likely complicate the CPU models significantly. > > Gabe > > On Wed, Jan 25, 2017 at 1:28 PM, Jason Lowe-Power <ja...@lowepower.com> > wrote: > > > I'm leaning towards deleting the AddrSizeFlagBit. Gabe, can you shed some > > light on the purpose of this flag? It's only ever used for masking > > addresses in the TLB: > > http://grok.gem5.org/xref/gem5/src/arch/x86/tlb.cc#318 > > http://grok.gem5.org/xref/gem5/src/arch/x86/tlb.cc#330 > > > > From what I can understand, it should be unnecessary to mask the address > > here. In the instruction implementation, the address is already masked if > > it is a 32-bit mode instruction: > > http://grok.gem5.org/xref/gem5/src/arch/x86/isa/microops/ldstop.isa#433 > > > > It seems straightforward to add addressSize=ssz to the st instructions > > where necessary. I'm confused as to why the TLB ever needs to know what > the > > address mode of the instruction is. Why/how would the TLB ever receive a > > request with an address > 4GB for a 32-bit mode instruction? > > > > Thanks again for the help! > > > > Cheers, > > Jason > > > > On Wed, Jan 25, 2017 at 10:37 AM Steve Reinhardt <ste...@gmail.com> > wrote: > > > > > Thanks, Gabe! That's all really helpful. I appreciate you taking the > time > > > to respond. > > > > > > One concern I have is that the portion of the manual Jason quoted says > > the > > > rule holds "For instructions that > > > *implicitly* access the stack segment (SS)" (emphasis mine)... would > you > > > say then that instructions that *explicitly* choose SS as the segment > > > (e.g., with a segment override prefix) would not follow this rule? If > so, > > > then it may not be correct to do things inside the instruction based > > solely > > > on the fact that segment==SS. Or maybe that's a case that just can't > > occur > > > so we don't have to worry about it? > > > > > > Also, would there be any harm in setting the AddrSizeFlagBit based on > > > whether addressSize==32 rather than looking at the legacy.addr bit? If > > not, > > > this seems like a simpler solution to me. > > > > > > Thanks again! > > > > > > Steve > > > > > > > > > On Wed, Jan 25, 2017 at 12:41 AM, Gabe Black <gabebl...@google.com> > > wrote: > > > > > > > Forgive me if I'm repeating a bit of what you said, Steve, but I > > looked a > > > > little more and there's more to fix. > > > > > > > > Also, if you look at x86/isa/microops/ldstop.isa, you'll find the > > python > > > > classes which receive the parameters of the load and store microops > and > > > > translate them into C++ instantiations of the underlying microop > > classes. > > > > In their __init__ methods, you'll see at the end where the > > > AddrSizeFlagBit > > > > is set based on whether or not the machInst.legacy.addr bit is set, > > > > basically whether the address size prefix had been found when the > > > > instruction was decoded. For the two classes involved (LdStOp and > > > > BigLdStOp), I think you'll want to add a check which doesn't set > > > > AddrSizeFlagBit if the segment is the stack segment. > > > > > > > > Additionally, you'll want to make sure the TLB knows to use the stack > > > > segment size (as opposed to either the default or alternate address > > > sizes) > > > > when manipulating the segment offset. You could add a new ISA > specific > > > flag > > > > for x86 memory requests (there seems to be room for exactly one more > in > > > > arch/x86/ldstflags.hh) and set it if the segment is the stack in > those > > ld > > > > and st ops, similar to how AddrSizeFlagBit is being set. Then in the > > TLB, > > > > specifically where it's calculating logSize, you'll want to make it > > > > recognize your new flag and use the stack size from the m5Reg (called > > > just > > > > m5Reg.stack, I think) instead of either of the address size values. > > > > > > > > This is a little gross because it means there's some calculation > going > > on > > > > when address translation is being done (frequently) when it could > have > > > been > > > > done earlier during decode (less frequently). One reason I did things > > > this > > > > way (all speculation, I've long since forgot) is that it makes the > > > decoded > > > > instructions hold less state. The same bytes may not always refer to > > the > > > > same sizes depending on the segmentation state in the machine, so if > > > those > > > > particular bytes are being decoded, the gem5 decode cache needs to > > either > > > > consider that segmentation state when deciding what instruction to > spit > > > > out. If it didn't, it might return an instruction which statically > says > > > to > > > > use 32 bit addresses (for instance), when now segmentation state says > > it > > > > the same bytes should mean it should use 16 bit addresses. Instead, > it > > > canh > > > > be factored in dynamically using external state (the m5Reg) to an > > > > instruction that says which version of the sizes to use but doesn't > > care > > > > what values those actually map to at that moment. > > > > > > > > Gabe > > > > > > > > On Wed, Jan 25, 2017 at 12:03 AM, Gabe Black <gabebl...@google.com> > > > wrote: > > > > > > > > > Hi. I think you guys are generally interpreting this code > correctly, > > > but > > > > > I'll throw in my two cents. If you look at the pseudo code here: > > > > > > > > > > http://www.felixcloutier.com/x86/CALL.html > > > > > > > > > > then it appears that the size of the return address you push onto > the > > > > > stack should be based on the operand size, ie. the data size. That > > > would > > > > > mean that the displacement for the store should be -env.dataSize > like > > > it > > > > > is. By the same logic, the subtraction below it should also be > > > > env.dataSize > > > > > (symbolically dsz) and not ssz. > > > > > > > > > > You are also correct, I think, that since the instruction > references > > > the > > > > > stack segment, it's address size should be set to the stack size > when > > > > doing > > > > > memory operations. Adding addressSize=ssz to the st microop should > do > > > > that. > > > > > > > > > > According to that pseudo code, the size of the target IP is also > the > > > > > operand size, but that's the size the wrip microop will use by > > default > > > > and > > > > > so should be fine as written. > > > > > > > > > > I think if you make those two changes (change ssz to dsz in the > sub, > > > add > > > > > addressSize=ssz to the st), then this macroop definition will be > > > correct. > > > > > It would be worthwhile to check the other definitions of CALL and > > make > > > > sure > > > > > they aren't similarly broken. > > > > > > > > > > Gabe > > > > > > > > > > On Tue, Jan 24, 2017 at 3:28 PM, Jason Lowe-Power < > > ja...@lowepower.com > > > > > > > > > wrote: > > > > > > > > > >> Thanks for helping me work this out. > > > > >> > > > > >> I got the binary working by completely removing the > AddrSizeFlagBit. > > > The > > > > >> only place this bit is used is in the LdStOp (it's set to true if > > > > >> legacy.addr is true) and then, in the TLB, if the AddrSizeFlagBit > is > > > set > > > > >> it > > > > >> truncates the address to 32-bits removing the upper-order bits. > > > > >> > > > > >> By removing this flag bit, and adding "addressSize=ssz" to the st > > > > >> micro-op, > > > > >> the binary is now working. > > > > >> > > > > >> My next question: How can I have any confidence that this doesn't > > > break > > > > >> other things? Especially when we're talking about 32-bit > > compatibility > > > > >> mode. Other than the hello32 binary, are there other things I > should > > > > run? > > > > >> > > > > >> My intuition says this is an OK change. The addr field in the > > Request > > > > >> should never have the upper-order 32 bits set if the instruction > is > > a > > > > >> 32-bit-mode instruction. The instruction implementation already > > takes > > > > care > > > > >> of this, as I found with this bug. > > > > >> > > > > >> Other thoughts? Does anyone know if it is definitely required to > use > > > > >> AddrSizeFlagBit to truncate the address in the TLB? > > > > >> > > > > >> If not, I'll post a patch for this tomorrow. > > > > >> > > > > >> Thanks, > > > > >> Jason > > > > >> > > > > >> On Tue, Jan 24, 2017 at 12:19 PM Steve Reinhardt < > ste...@gmail.com> > > > > >> wrote: > > > > >> > > > > >> > Hmm, seems like there's a little bit of an inconsistency in that > > the > > > > >> > request is using the legacy.addr bit (which is set by the > decoder > > > when > > > > >> it > > > > >> > sees the address size override prefix [1]) directly, while the > > > > >> legacy.addr > > > > >> > bit is also used to calculate the addressSize value [2] but can > be > > > > >> > overridden (as we are doing). So if addressSize is overridden > then > > > > >> > AddrSizeFlagBit should no longer be set based on legacy.addr. > > > > >> > > > > > >> > Or another way of looking at it is that the same process of > > checking > > > > >> > legacy.addr to override the address size is done in two places, > > once > > > > in > > > > >> the > > > > >> > decoder [2] and again in the TLB [3] and it's not clear how to > > > > suppress > > > > >> the > > > > >> > latter one. > > > > >> > > > > > >> > I suppose one gnarly way of doing it would be to go into the > > > micro-op > > > > >> > definition somewhere and clear the AddrSizeFlagBit out of > memFlags > > > if > > > > >> > addressSize != env.addressSize (i.e., the address size was > > > > overridden). > > > > >> > There's probably a cleaner way, but that's the easiest path I > can > > > > think > > > > >> of > > > > >> > (if it even works). > > > > >> > > > > > >> > [1] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#195 > > > > >> > [2] http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#390 > > > > >> > [3] > http://grok.gem5.org/source/xref/gem5/src/arch/x86/tlb.cc#315 > > > > >> > > > > > >> > > > > > >> > On Tue, Jan 24, 2017 at 8:37 AM, Jason Lowe-Power < > > > > ja...@lowepower.com> > > > > >> > wrote: > > > > >> > > > > > >> > > Hi Steve, > > > > >> > > > > > > >> > > That was super helpful. I'm now a step closer to solving this! > > > > >> > > > > > > >> > > Your suggestion of =ssz, lead me to search for the uses of > that > > in > > > > >> x86, > > > > >> > and > > > > >> > > it turns out that almost all of other stack instructions have > > > > >> > dataSize=ssz. > > > > >> > > So, I added both dataSize=ssz and addressSize=ssz to the call > > > > >> > instruction, > > > > >> > > though I think only the addressSize is actually needed, but > I'm > > > not > > > > >> > > certain. > > > > >> > > > > > > >> > > Now, the address is passed to the Request object correctly, > but > > > the > > > > >> > program > > > > >> > > still fails. The problem now is that the request object is > > getting > > > > >> > > the AddrSizeFlagBit set to true, because machInst.legacy.addr > is > > > > true. > > > > >> > > Thus, the TLB only uses the lower 32 bits of the 64-bit > address. > > > > >> > > > > > > >> > > Any ideas on how to change the instruction's memFlags from the > > > > >> macro-op > > > > >> > > definition? They are set on > > > > >> > > http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/ > > > > >> > > x86/isa/microops/ldstop.isa#l334 > > > > >> > > . > > > > >> > > > > > > >> > > It would be nice if we could fix this in the decoder. I think > > the > > > > >> logic > > > > >> > > should be "if the address prefix is set and this is an > implicit > > > > stack > > > > >> > > operation, ignore the address prefix". However, I'm not sure > how > > > to > > > > >> tell > > > > >> > if > > > > >> > > the instruction is "an implicit stack operation" from the > > decoder. > > > > >> > > > > > > >> > > Thanks, > > > > >> > > Jason > > > > >> > > > > > > >> > > On Tue, Jan 24, 2017 at 9:05 AM Steve Reinhardt < > > ste...@gmail.com > > > > > > > > >> > wrote: > > > > >> > > > > > > >> > > My recollection of how all this works is that the arguments to > > the > > > > >> 'st' > > > > >> > > micro-op get turned into arguments to a call to the StoreOp > > > > >> constructor: > > > > >> > > > > > > >> > > 597 <http://repo.gem5.o, and that address doesn't > > > > >> > > existrg/gem5/file/cd7f3a1dbf55/src/arch/x86/isa/microops/ > > > > >> ldstop.isa#l597 > > > > >> > > <http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/ > > > > >> > > x86/isa/microops/ldstop.isa#l597> > > > > >> > > > > > > > >> > > class StoreOp(LdStOp): > > > > >> > > 598 < > > > > >> > > http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/ > > > > >> > > x86/isa/microops/ldstop.isa#l598 > > > > >> > > > > > > > >> > > def __init__(self, data, segment, addr, disp = 0, > > > > >> > > 599 < > > > > >> > > http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/ > > > > >> > > x86/isa/microops/ldstop.isa#l599 > > > > >> > > > > > > > >> > > dataSize="env.dataSize", > > > > >> > > 600 < > > > > >> > > http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/ > > > > >> > > x86/isa/microops/ldstop.isa#l600 > > > > >> > > > > > > > >> > > addressSize="env.addressSize", > > > > >> > > 601 < > > > > >> > > http://repo.gem5.org/gem5/file/cd7f3a1dbf55/src/arch/ > > > > >> > > x86/isa/microops/ldstop.isa#l601 > > > > >> > > > > > > > >> > > atCPL0=False, nonSpec=False): > > > > >> > > > > > > >> > > so the "-env.dataSize" you see is actually the displacement > for > > > the > > > > >> > store, > > > > >> > > not the dataSize or addressSize. I think the problem is that > > the > > > > >> > > addressSize is using the env,addressSize that's calculated the > > > > >> "normal" > > > > >> > > way, i.e., including the effects of the override prefix. > > > > >> > > > > > > >> > > Poking around some more, there's an 'ssz' symbol that maps to > > > > >> > env.stackSize > > > > >> > > [1] which gets used a lot in stack operations; for example, > > right > > > > >> after > > > > >> > the > > > > >> > > store in CALL_NEAR_I, 'ssz' is subtracted off of the stack > > > pointer. > > > > >> The > > > > >> > > calculation of env.stackSize seems to follow the rule you > > > mentioned > > > > >> about > > > > >> > > being fixed at 64 bits in 64-bit mode [2, 3]. > > > > >> > > > > > > >> > > So it might be sufficient to add ', addressSize=ssz' to the > end > > of > > > > the > > > > >> > 'st' > > > > >> > > micro-op. Oddly though I don't see addressSize being set on > any > > > > other > > > > >> > stack > > > > >> > > ops (just dataSize), so I wonder if this is a problem with > other > > > > stack > > > > >> > > instructions or not. If so, it might be useful to have an > > override > > > > >> > > hardwired in at some lower level to check if the segment is ss > > and > > > > >> force > > > > >> > > the address size to be stackSize (rather than adding this > extra > > > > >> parameter > > > > >> > > in a dozen different places). I wouldn't know where best to do > > > that > > > > >> > though, > > > > >> > > so the manual override seems best for now. > > > > >> > > > > > > >> > > Steve > > > > >> > > > > > > >> > > [1] http://grok.gem5.org/xref/gem5/src/arch/x86/isa/ > > > > microasm.isa#107 > > > > >> > > [2] http://grok.gem5.org/xref/gem5/src/arch/x86/isa.cc#91 > > > > >> > > [3] > http://grok.gem5.org/xref/gem5/src/arch/x86/decoder.cc#400 > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > On Mon, Jan 23, 2017 at 4:04 PM, Jason Lowe-Power < > > > > >> ja...@lowepower.com> > > > > >> > > wrote: > > > > >> > > > > > > >> > > > To those of you with more x86 ISA implementation knowledge > > than > > > I > > > > >> have: > > > > >> > > > > > > > >> > > > I've been working through a bug one of our users found > (thanks > > > > >> > > Sanchayan!). > > > > >> > > > It looks like current versions of ld use the 0x67 > instruction > > > > prefix > > > > >> > > > (address size override) as an optimization instead of using > a > > > nop. > > > > >> See > > > > >> > > > > https://www.sourceware.org/ml/binutils/2016-05/msg00323.html. > > > > >> > > > > > > > >> > > > This causes the call instruction to be decoded with with the > > > > >> "address > > > > >> > > size > > > > >> > > > override prefix", which is correct, in a sense. From what I > > can > > > > >> tell, > > > > >> > > this > > > > >> > > > is passed to the call instruction via "-env.dataSize" (see > > call > > > > >> > > instruction > > > > >> > > > implementation below). > > > > >> > > > > > > > >> > > > def macroop CALL_NEAR_I > > > > >> > > > { > > > > >> > > > # Make the default data size of calls 64 bits in 64 bit > > mode > > > > >> > > > .adjust_env oszIn64Override > > > > >> > > > .function_call > > > > >> > > > > > > > >> > > > limm t1, imm > > > > >> > > > rdip t7 > > > > >> > > > # Check target of call > > > > >> > > > st t7, ss, [0, t0, rsp], "-env.dataSize" > > > > >> > > > subi rsp, rsp, ssz > > > > >> > > > wrip t7, t1 > > > > >> > > > }; > > > > >> > > > > > > > >> > > > Now, the bug is, according to the x86 manual, "For > > instructions > > > > that > > > > >> > > > implicitly access the stack segment (SS), the address size > for > > > > stack > > > > >> > > > accesses is determined by the D (default) bit in the > > > stack-segment > > > > >> > > > descriptor. In 64-bit mode, the D bit is ignored, and all > > stack > > > > >> > > references > > > > >> > > > have a 64-bit address size." See > > > > >> > > > https://support.amd.com/TechDocs/24594.pdf page > > > > >> > > > 9. > > > > >> > > > > > > > >> > > > Thoughts on how to fix this? > > > > >> > > > > > > > >> > > > Thanks, > > > > >> > > > Jason > > > > >> > > > _______________________________________________ > > > > >> > > > gem5-dev mailing list > > > > >> > > > gem5-dev@gem5.org > > > > >> > > > http://m5sim.org/mailman/listinfo/gem5-dev > > > > >> > > > > > > > >> > > _______________________________________________ > > > > >> > > gem5-dev mailing list > > > > >> > > gem5-dev@gem5.org > > > > >> > > http://m5sim.org/mailman/listinfo/gem5-dev > > > > >> > > > > > > >> > > -- > > > > >> > > > > > > >> > > Jason > > > > >> > > _______________________________________________ > > > > >> > > gem5-dev mailing list > > > > >> > > gem5-dev@gem5.org > > > > >> > > http://m5sim.org/mailman/listinfo/gem5-dev > > > > >> > > > > > > >> > _______________________________________________ > > > > >> > gem5-dev mailing list > > > > >> > gem5-dev@gem5.org > > > > >> > http://m5sim.org/mailman/listinfo/gem5-dev > > > > >> > > > > > >> _______________________________________________ > > > > >> gem5-dev mailing list > > > > >> gem5-dev@gem5.org > > > > >> http://m5sim.org/mailman/listinfo/gem5-dev > > > > >> > > > > > > > > > > > > > > _______________________________________________ > > > > gem5-dev mailing list > > > > gem5-dev@gem5.org > > > > http://m5sim.org/mailman/listinfo/gem5-dev > > > > > > > _______________________________________________ > > > gem5-dev mailing list > > > gem5-dev@gem5.org > > > http://m5sim.org/mailman/listinfo/gem5-dev > > > > > _______________________________________________ > > gem5-dev mailing list > > gem5-dev@gem5.org > > http://m5sim.org/mailman/listinfo/gem5-dev > > > _______________________________________________ > gem5-dev mailing list > gem5-dev@gem5.org > http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev