On 3 December 2015 at 21:21, Richard Henderson <r...@twiddle.net> wrote: > On 12/03/2015 07:08 AM, Peter Maydell wrote: >> On 3 December 2015 at 14:58, Laurent Desnogues >> <laurent.desnog...@gmail.com> wrote: >>> After quickly looking at the code in softmmu_template.h, I wonder if >>> MO_ALIGN would correcly handle the ldrexd pair case which requires an >>> 8-byte alignment but does 2 4-byte loads (even if the code is tweaked >>> to read 8-byte at once, then checking 16-byte alignment of AArch64 >>> ldxp 64-bit could not be handled correctly). >> >> You're right, those are not going to be handled correctly. >> But I think it would be better to enhance the MO_ALIGN >> handling somehow to deal with "must be more highly aligned than >> the datasize" cases as well as the "alignment must match datasize" >> ones. > > What's the full set of features that you'd like here?
As Laurent says, for the "load-pair" instructions we need to be able to load data into two registers with an alignment corresponding to the whole thing. We can obviously do 'load 2x 32 bit regs' with a 64-bit load and then split the result into the two values, but the 'load 2x 64 bit regs' would need either a 128 bit load or a 'load 64 bit reg with 128 bit aligment'. The other instructions with wider-than-the-type alignment are the SIMD ones, which can ask for an alignment of up to 256 bits (for instance VLD1 multiple-single-elements). You could reasonably argue that there we should just emit code to do the check, since we're already pre-calculating the address and then emitting a lot of load or store TCG ops. >> (As you say we'd need >> to do the ldrexd as a 64-bit access, but we should do that >> anyway because it's supposed to be single-copy-atomic, >> architecturally speaking.) > > Something to remember for future is that we're not doing single-copy > of 64-bit data for 32-bit hosts. I'm not even sure that's generally > possible without generating awful code. Since this is in LDREXD we're going to be doing something fairly heavy-overhead anyway for the exclusive handling, but yes, worth remembering when we have multithreaded TCG. (Guests do rely on the single-copy-atomicity of (non-exclusive) 64 bit accesses for updating page table entries for LPAE CPUs.) thanks -- PMM