On 30 December 2013 15:03, Richard Henderson <r...@twiddle.net> wrote:
> On 12/28/2013 01:49 PM, Peter Maydell wrote:
>>      if (size < 4) {
>>          switch (size) {
>>          case 0:
>> -            tcg_gen_ld8u_i64(tmp, cpu_env, freg_offs);
>> +            tcg_gen_ld8u_i64(tmp, cpu_env, fp_reg_offset(srcidx, MO_8));
>>              break;
>>          case 1:
>> -            tcg_gen_ld16u_i64(tmp, cpu_env, freg_offs);
>> +            tcg_gen_ld16u_i64(tmp, cpu_env, fp_reg_offset(srcidx, MO_16));
>>              break;
>>          case 2:
>> -            tcg_gen_ld32u_i64(tmp, cpu_env, freg_offs);
>> +            tcg_gen_ld32u_i64(tmp, cpu_env, fp_reg_offset(srcidx, MO_32));
>>              break;
>>          case 3:
>> -            tcg_gen_ld_i64(tmp, cpu_env, freg_offs);
>> +            tcg_gen_ld_i64(tmp, cpu_env, fp_reg_offset(srcidx, MO_64));
>>              break;
>>          }
>>          tcg_gen_qemu_st_i64(tmp, tcg_addr, get_mem_index(s), MO_TE + size);
>
> It occurs to me to wonder whether it wouldn't just be better to load the whole
> 64-bit quantity and store the piece we need, ignoring the entire host-endian 
> issue.

Yeah, we could do that. Will the optimiser optimise away the unnecessary
extra load of the unused high 32 bits for the "32 bit or smaller" case on
a 32 bit host CPU?

thanks
-- PMM

Reply via email to