Patches 1-3:
Reviewed-by: Richard Henderson <r...@twiddle.net>

That said,

On 06/06/2011 09:25 AM, Paolo Bonzini wrote:
> +/* conservative code for little endian unaligned accesses */
> +static inline int lduw_le_p(const void *ptr)
> +{
> +#ifdef _ARCH_PPC
> +    int val;
> +    __asm__ __volatile__ ("lhbrx %0,0,%1" : "=r" (val) : "r" (ptr));
> +    return val;
> +#else
> +    const uint8_t *p = ptr;
> +    return p[0] | (p[1] << 8);
> +#endif
> +}

Can we add a patch 4/3 that removes this sort of hard-coded
assembly stuff in favour of generic gcc code.  E.g.

static inline int lduw_le_p(const void *ptr)
{
  struct pack { uint16_t x __attribute__((packed)); };
  const struct pack *p = ptr;
  uint16_t ret;

  ret = p->x;
  ret = le_bswap(ret, 16);

  return ret;
}

One could fairly well macroize all 6 instances.

The compiler knows that ppc and i386 can do the unaligned loads.
The compiler *should* be able to match the bswap_N patterns, and
thus also fold the unaligned load with the bswap for ppc.

I can confirm that gcc head does in fact do the right thing for
ppc32/64 here, as well as for i386/x86_64.  I don't have previous
versions of gcc checked out atm...

I havn't checked, but this *ought* to enable the load-asi bswap
instruction for sparcv9.  I also havn't checked what happens for
a target like sparcv8 that lacks both unaligned load and bswap,
to see that we don't simply double the number of shifts.  However,
I'd be tempted to file that as a gcc missed optimization bug.


r~

Reply via email to