On 15/07/2015 00:09, Aurelien Jarno wrote: >> > 2) 64-bit processors that have loads with 32-bit addresses. >> > >> > => qemu_ld/qemu_st can use 32-bit addresses to do the >> > truncation >> > >> > aarch64, I think, falls under this group > I don't think that works. We don't want to get a load with a 32-bit > address. We want a load of (guest_base + address), with guest_base > possibly being 64-bit, address being 32-bit and the result likely > being 64-bit.
aarch64, IIUC, has complicated addressing modes with a 64-bit base and a 32-bit sign- or zero-extended index, which is exactly what you need here. However, the backend is not using it, so right now aarch64 is the same as x86. > Well the use of ADDR32 is a bit special, it only works because we can't > use %gs to add the guest base address. When we can't use %gs, ADDR32 > can't work. Yes. bsd-user would have to sign extend, in particular. > I don't think the register allocator is at fault at all. The register > tcg_reg_alloc_mov doesn't check for the register type because a TCG mov > is by definition only between registers of the same size. Ok, I see your point. If you put it like this :) the fault definitely lies in the backends. What I'm proposing would be in a new tcg_reg_alloc_trunc function, and it would require implementing a non-noop trunc. I still believe the register allocator can be improved to do 32-bit loads, though as an optimization and not as a bugfix: > > Even if the prefix was added, modifying the register allocator to use > > 32-bit loads would still be useful as an optimization, since on x86 > > 32-bit loads are smaller than 64-bit loads. > > AFAIK, that's already the case. The REXW prefix is only emitted for > 64-bit ops. Yes, but a load from a 64-bit register to a 32-bit destination emits REX.W. From Leon's dump: mov_i32 tmp1,w0.d0 => mov 0xe8(%r14),%rbp mov_i32 tmp0,tmp1 mov_i32 t8,tmp0 => mov %ebp,0x60(%r14) Note %rbp as the load destination and %ebp as the source of the store. Paolo