On 6.6.19. 18:46, Richard Henderson wrote:
On 6/6/19 5:15 AM, Stefan Brankovic wrote:
+ tcg_gen_addi_i64(result, sh, 7);
+ for (i = 7; i >= 1; i--) {
+ tcg_gen_shli_i64(tmp, sh, i * 8);
+ tcg_gen_or_i64(result, result, tmp);
+ tcg_gen_addi_i64(sh, sh, 1);
+ }
Better to replicate sh into the 8 positions and then use one add.
tcg_gen_muli_i64(sh, sh, 0x0101010101010101ull);
tcg_gen_addi_i64(hi_result, sh, 0x0001020304050607ull);
tcg_gen_addi_i64(lo_result, sh, 0x08090a0b0c0d0e0full);
and
tcg_gen_subfi_i64(hi_result, 0x1011121314151617ull, sh);
tcg_gen_subfi_i64(lo_result, 0x18191a1b1c1d1e1full, sh);
for lvsr.
I think you are right, this is definitely better way of implementing it.
I will adopt your approach in v2.
Kind Regards,
Stefan
r~