On Mon, 19 Apr 2010, Richard Henderson wrote: > On 04/18/2010 05:13 PM, Aurelien Jarno wrote: > > On Tue, Apr 13, 2010 at 04:33:59PM -0700, Richard Henderson wrote: > >> Define OPC_BSWAP. Factor opcode emission to separate functions. > >> Use bswap+shift to implement 16-bit swap instead of a rolw; this > >> gets the proper zero-extension required by INDEX_op_bswap16_i32. > > > > This is not required by INDEX_op_bswap16_i32. What is need is that the > > value in the input register has the 16 upper bits set to 0. > > Ah.
Apparently i'm not the only one who misinterpreted this bit of bswap documentation. How about: diff --git a/tcg/README b/tcg/README index 68d27ff..5b39a38 100644 --- a/tcg/README +++ b/tcg/README @@ -269,7 +269,7 @@ ext32u_i64 t0, t1 * bswap16_i32/i64 t0, t1 16 bit byte swap on a 32/64 bit value. It assumes that the two/six high order -bytes are set to zero. +bytes of t1 are set to zero. * bswap32_i32/i64 t0, t1 > > > Considering > > that, the rolw instruction is faster than bswap + shift. > > Well, no, it isn't. > > static inline int test_rolw(unsigned short *s) > { > int i, start, end; > asm volatile("rdtsc\n\t" > "movl %%eax, %1\n\t" > "movzwl %3,%2\n\t" > "rolw $8, %w2\n\t" > "addl $1,%2\n\t" > "rdtsc" > : "=&a"(end), "=r"(start), "=r"(i) : "m"(*s) : "edx"); > return end - start; > } > > static inline int test_bswap(unsigned short *s) > { > int i, start, end; > asm volatile("rdtsc\n\t" > "movl %%eax, %1\n\t" > "movzwl %3,%2\n\t" > "bswap %2\n\t" > "shl $16,%2\n\t" > "addl $1,%2\n\t" > "rdtsc" > : "=&a"(end), "=r"(start), "=r"(i) : "m"(*s) : "edx"); > return end - start; > } > > > model name : Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz > rolw 60 60 72 60 60 72 60 60 72 60 > bswap 60 60 60 60 60 60 60 60 60 60 > > model name : Dual-Core AMD Opteron(tm) Processor 1210 > rolw 9 10 9 9 8 8 8 8 8 8 > bswap 9 9 8 8 8 8 8 8 8 8 > > The rolw sequence isn't ever faster, and it's more unstable, > likely due to the partial register stall I mentioned. > > I will grant that the rolw sequence is smaller, and I can > adjust this patch to use that sequence if you wish. > > > r~ > > -- mailto:av1...@comtv.ru