Patches 1-3: Reviewed-by: Richard Henderson <r...@twiddle.net> That said,
On 06/06/2011 09:25 AM, Paolo Bonzini wrote: > +/* conservative code for little endian unaligned accesses */ > +static inline int lduw_le_p(const void *ptr) > +{ > +#ifdef _ARCH_PPC > + int val; > + __asm__ __volatile__ ("lhbrx %0,0,%1" : "=r" (val) : "r" (ptr)); > + return val; > +#else > + const uint8_t *p = ptr; > + return p[0] | (p[1] << 8); > +#endif > +} Can we add a patch 4/3 that removes this sort of hard-coded assembly stuff in favour of generic gcc code. E.g. static inline int lduw_le_p(const void *ptr) { struct pack { uint16_t x __attribute__((packed)); }; const struct pack *p = ptr; uint16_t ret; ret = p->x; ret = le_bswap(ret, 16); return ret; } One could fairly well macroize all 6 instances. The compiler knows that ppc and i386 can do the unaligned loads. The compiler *should* be able to match the bswap_N patterns, and thus also fold the unaligned load with the bswap for ppc. I can confirm that gcc head does in fact do the right thing for ppc32/64 here, as well as for i386/x86_64. I don't have previous versions of gcc checked out atm... I havn't checked, but this *ought* to enable the load-asi bswap instruction for sparcv9. I also havn't checked what happens for a target like sparcv8 that lacks both unaligned load and bswap, to see that we don't simply double the number of shifts. However, I'd be tempted to file that as a gcc missed optimization bug. r~