Am 11.04.2016 um 21:05 schrieb Matt Turner:
> Replaces four byte loads and four byte stores with a load, bswap,
> rotate, store; or a movbe, rotate, store.
> ---
>  src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 19 +++++++++++--------
>  1 file changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c 
> b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
> index 0a68751..fa5ec75 100644
> --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
> +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
> @@ -56,24 +56,27 @@ static const uint32_t ytile_width = 128;
>  static const uint32_t ytile_height = 32;
>  static const uint32_t ytile_span = 16;
>  
> +static inline uint32_t
> +ror(uint32_t n, uint32_t d)
> +{
> +   return (n >> d) | (n << (32 - d));
> +}
> +
>  /**
>   * Copy RGBA to BGRA - swap R and B.
>   */
>  static inline void *
>  rgba8_copy(void *dst, const void *src, size_t bytes)
>  {
> -   uint8_t *d = dst;
> -   uint8_t const *s = src;
> +   uint32_t *d = dst;
> +   uint32_t const *s = src;
>  
>     assert(bytes % 4 == 0);
>  
>     while (bytes >= 4) {
> -      d[0] = s[2];
> -      d[1] = s[1];
> -      d[2] = s[0];
> -      d[3] = s[3];
> -      d += 4;
> -      s += 4;
> +      *d = ror(__builtin_bswap32(*s), 8);
> +      d += 1;
> +      s += 1;
>        bytes -= 4;
>     }
>     return dst;
> 

FWIW mesa core has truckloads of similar per-byte load/swap/store
functions for format conversions, which could be optimized similarly (or
be done, hint hint, with sse2...).

But nonethelss, looking good here.

For the series (with some issue mentioned for 2/4, which I don't know if
anybody cares)
Reviewed-by: Roland Scheidegger <srol...@vmware.com>

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to