On 2015-07-15 09:31, Paolo Bonzini wrote: > Ok, I see your point. If you put it like this :) the fault definitely > lies in the backends. What I'm proposing would be in a new > tcg_reg_alloc_trunc function, and it would require implementing a > non-noop trunc.
Why not reusing the existing trunc_shr_i64_i32 op? AFAIU, it has been designed exactly for that. Actually I think we should implement the following ops as optional but *real* TCG ops: - trunc_shr_i64_i32 - extu_i32_i64 - ext_i32_i64 Then each backend can implement the one it considers necessary. If not implemented in a backend it is simply replaced by a mov. This would also allow to remove the "remember high bits as garbage" in the optimizer, which I consider a band aid more than a real fix. Note that we might have multiple choices for example on x86: 1) implement trunc_shr_i64_i32 and ext_i32_i64 This way we make sure that all 32-bit values are always stored zero-extended (even if a move has been propagated by the register allocator or by the optimizer). The extu_i32_i64 can therefore always be considered as a mov op. 2) implement extu_i32_i64 and ext_i32_i64 We have to guarantee that all 32-bit ops ignore the high part of the registers (which is not the case currently for qemu_ld/st in user mode) as they might contain garbage. Given that we have to properly zero and sign extend the value when converting a 32-bit value in a 64-bit value. > I still believe the register allocator can be improved to do 32-bit > loads, though as an optimization and not as a bugfix: > > > > Even if the prefix was added, modifying the register allocator to use > > > 32-bit loads would still be useful as an optimization, since on x86 > > > 32-bit loads are smaller than 64-bit loads. > > > > AFAIK, that's already the case. The REXW prefix is only emitted for > > 64-bit ops. > > Yes, but a load from a 64-bit register to a 32-bit destination emits > REX.W. From Leon's dump: > > mov_i32 tmp1,w0.d0 => mov 0xe8(%r14),%rbp > mov_i32 tmp0,tmp1 > mov_i32 t8,tmp0 => mov %ebp,0x60(%r14) > > Note %rbp as the load destination and %ebp as the source of the store. Indeed, that's something we might want to improve (and is due to the fact we have just replaced trunc_shr_i64_i32 by a move on x86). Note however that this simplification might be target specific (it is at least little endian specific if we don't adjust the address). -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net