On Mon, 6 Jun 2011, Richard Henderson wrote:

> Patches 1-3:
> Reviewed-by: Richard Henderson <r...@twiddle.net>
> 
> That said,
> 
> On 06/06/2011 09:25 AM, Paolo Bonzini wrote:
> > +/* conservative code for little endian unaligned accesses */
> > +static inline int lduw_le_p(const void *ptr)
> > +{
> > +#ifdef _ARCH_PPC
> > +    int val;
> > +    __asm__ __volatile__ ("lhbrx %0,0,%1" : "=r" (val) : "r" (ptr));
> > +    return val;
> > +#else
> > +    const uint8_t *p = ptr;
> > +    return p[0] | (p[1] << 8);
> > +#endif
> > +}
> 
> Can we add a patch 4/3 that removes this sort of hard-coded
> assembly stuff in favour of generic gcc code.  E.g.
> 
> static inline int lduw_le_p(const void *ptr)
> {
>   struct pack { uint16_t x __attribute__((packed)); };
>   const struct pack *p = ptr;
>   uint16_t ret;
> 
>   ret = p->x;
>   ret = le_bswap(ret, 16);
> 
>   return ret;
> }
> 
> One could fairly well macroize all 6 instances.
> 
> The compiler knows that ppc and i386 can do the unaligned loads.
> The compiler *should* be able to match the bswap_N patterns, and
> thus also fold the unaligned load with the bswap for ppc.
> 
> I can confirm that gcc head does in fact do the right thing for
> ppc32/64 here, as well as for i386/x86_64.  I don't have previous
> versions of gcc checked out atm...

Depends on how bswap_16 is defined. If it is __builtin_bswap16
then 4.5.0 and 4.6.0 generate byte reversed loads, and previous
versions lack that builtin, so i don't think this generic code
should go in.

> 
> I havn't checked, but this *ought* to enable the load-asi bswap
> instruction for sparcv9.  I also havn't checked what happens for
> a target like sparcv8 that lacks both unaligned load and bswap,
> to see that we don't simply double the number of shifts.  However,
> I'd be tempted to file that as a gcc missed optimization bug.
> 
> 
> r~
> 

-- 
mailto:av1...@comtv.ru

Reply via email to