On Mon, 6 Jun 2011, Richard Henderson wrote: > Patches 1-3: > Reviewed-by: Richard Henderson <r...@twiddle.net> > > That said, > > On 06/06/2011 09:25 AM, Paolo Bonzini wrote: > > +/* conservative code for little endian unaligned accesses */ > > +static inline int lduw_le_p(const void *ptr) > > +{ > > +#ifdef _ARCH_PPC > > + int val; > > + __asm__ __volatile__ ("lhbrx %0,0,%1" : "=r" (val) : "r" (ptr)); > > + return val; > > +#else > > + const uint8_t *p = ptr; > > + return p[0] | (p[1] << 8); > > +#endif > > +} > > Can we add a patch 4/3 that removes this sort of hard-coded > assembly stuff in favour of generic gcc code. E.g. > > static inline int lduw_le_p(const void *ptr) > { > struct pack { uint16_t x __attribute__((packed)); }; > const struct pack *p = ptr; > uint16_t ret; > > ret = p->x; > ret = le_bswap(ret, 16); > > return ret; > } > > One could fairly well macroize all 6 instances. > > The compiler knows that ppc and i386 can do the unaligned loads. > The compiler *should* be able to match the bswap_N patterns, and > thus also fold the unaligned load with the bswap for ppc. > > I can confirm that gcc head does in fact do the right thing for > ppc32/64 here, as well as for i386/x86_64. I don't have previous > versions of gcc checked out atm...
Depends on how bswap_16 is defined. If it is __builtin_bswap16 then 4.5.0 and 4.6.0 generate byte reversed loads, and previous versions lack that builtin, so i don't think this generic code should go in. > > I havn't checked, but this *ought* to enable the load-asi bswap > instruction for sparcv9. I also havn't checked what happens for > a target like sparcv8 that lacks both unaligned load and bswap, > to see that we don't simply double the number of shifts. However, > I'd be tempted to file that as a gcc missed optimization bug. > > > r~ > -- mailto:av1...@comtv.ru