Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-06 Thread Juha-Pekka Heikkila
On 05.11.2014 21:21, Ian Romanick wrote:
 On 11/04/2014 01:24 PM, Roland Scheidegger wrote:
 Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila:
 +   for(i = 0; i  n; i++) {
 +  _mesa_clamp_float_rgba(rgba_src[i], temp, min, max);
 +
 +  *operand = _mm_mul_ps(multiplier, *operand);
 +  truncated_integers = _mm_cvttps_epi32(*operand);
 +  mmove = _mm_set_ps(aMap[map_p[ACOMP]], bMap[map_p[BCOMP]],
 + gMap[map_p[GCOMP]], rMap[map_p[RCOMP]] );
 +
 +  _mm_storeu_ps(rgba_dst[i], mmove);
 The sse2 code at the end looks counterproductive to me. Not sure what
 gcc will generate but I'd suspect it involves some simd-int domain
 transition for the table lookups, plus another int-simd transition to
 get the values back into simd domain (alternatively it might use
 stores/load here) just so you can store them again...
 It would probably be better to just store the values directly after the
 table lookups.
 But in any case actually I'm beginning to suspect noone really cares
 about performance anyway for that path (who the hell uses these
 scale/map features?) so whatever works...
 
 Which raises another question... do we have any piglit tests that
 actually exercise this path?

No we don't. I made small test for this to see how it works, I was
planning to move my test to Piglit later.

/Juha-Pekka

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-05 Thread Juha-Pekka Heikkila
Hi,

I did rely on gcc optimization run on moving things around for me. What
_mesa_streaming_clamp_float_rgba really look like when I compile it is this:

Dump of assembler code for function _mesa_streaming_clamp_float_rgba:
   0x7401a0a0 +0: test   %edi,%edi
   0x7401a0a2 +2: je 0x7401a0d7
_mesa_streaming_clamp_float_rgba+55
   0x7401a0a4 +4: sub$0x1,%edi
   0x7401a0a7 +7: shufps $0x0,%xmm0,%xmm0
   0x7401a0ab +11:shufps $0x0,%xmm1,%xmm1
   0x7401a0af +15:add$0x1,%rdi
   0x7401a0b3 +19:shl$0x4,%rdi
   0x7401a0b7 +23:xor%eax,%eax
   0x7401a0b9 +25:nopl   0x0(%rax)
   0x7401a0c0 +32:movups (%rsi,%rax,1),%xmm2
   0x7401a0c4 +36:maxps  %xmm0,%xmm2
   0x7401a0c7 +39:minps  %xmm1,%xmm2
   0x7401a0ca +42:movups %xmm2,(%rdx,%rax,1)
   0x7401a0ce +46:add$0x10,%rax
   0x7401a0d2 +50:cmp%rdi,%rax
   0x7401a0d5 +53:jne0x7401a0c0
_mesa_streaming_clamp_float_rgba+32
   0x7401a0d7 +55:repz retq
End of assembler dump.

Gcc has after inlining moved all unnecessary stuff outside the loop but
I can still have _mesa_clamp_float_rgba function ready for generic use
on source level. I did trust gcc here also with the unrolling, looking
at the loop unrolling would reduce three instructions per round but I
suspect add/cmp/jne are not the expensive instructions here (I didn't check)

Out of order execution might be interesting to try here though. I need
to check if I can get gcc to behave properly, never before attempted
that with intrinsics on gcc :)

/Juha-Pekka



On 04.11.2014 19:35, Siavash Eliasi wrote:
 Hello. I'd get rid of _mm_set1_ps inside _mesa_clamp_float_rgba by
 passing _m128 version of min/max directly, so _mm_set1_ps will be
 moved out of the for loop.
 
 I'd also unroll the _mesa_streaming_clamp_float_rgba loop to minimize
 the loop overhead (and utilize out of order execution as a bonus),
 because nothing compute intensive is happening there. You can also use
 prefetching (_mm_prefetch) there to improve performance by reading data
 ahead from memory.
 
 Best regards,
 Siavash Eliasi.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-05 Thread Juha-Pekka Heikkila
On 04.11.2014 21:46, Patrick Baggett wrote:
 
 
 On Tue, Nov 4, 2014 at 6:05 AM, Juha-Pekka Heikkila
 juhapekka.heikk...@gmail.com mailto:juhapekka.heikk...@gmail.com wrote:
 
 Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
 mailto:juhapekka.heikk...@gmail.com
 ---
  src/mesa/Makefile.am  |   8 +++
  src/mesa/main/x86/sse2_clamping.c | 103
 ++
  src/mesa/main/x86/sse2_clamping.h |  49 ++
  3 files changed, 160 insertions(+)
  create mode 100644 src/mesa/main/x86/sse2_clamping.c
  create mode 100644 src/mesa/main/x86/sse2_clamping.h
 
 diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
 index e71bccb..5d3c6f5 100644
 --- a/src/mesa/Makefile.am
 +++ b/src/mesa/Makefile.am
 @@ -111,6 +111,10 @@ if SSE41_SUPPORTED
  ARCH_LIBS += libmesa_sse41.la http://libmesa_sse41.la
  endif
 
 +if SSE2_SUPPORTED
 +ARCH_LIBS += libmesa_sse2.la http://libmesa_sse2.la
 +endif
 +
  MESA_ASM_FILES_FOR_ARCH =
 
  if HAVE_X86_ASM
 @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \
 main/streaming-load-memcpy.c
  libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
 
 +libmesa_sse2_la_SOURCES = \
 +   main/x86/sse2_clamping.c
 +libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2
 +
  pkgconfigdir = $(libdir)/pkgconfig
  pkgconfig_DATA = gl.pc
 
 diff --git a/src/mesa/main/x86/sse2_clamping.c
 b/src/mesa/main/x86/sse2_clamping.c
 new file mode 100644
 index 000..7df1c85
 --- /dev/null
 +++ b/src/mesa/main/x86/sse2_clamping.c
 @@ -0,0 +1,103 @@
 +/*
 + * Copyright © 2014 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person
 obtaining a
 + * copy of this software and associated documentation files (the
 Software),
 + * to deal in the Software without restriction, including without
 limitation
 + * the rights to use, copy, modify, merge, publish, distribute,
 sublicense,
 + * and/or sell copies of the Software, and to permit persons to
 whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including
 the next
 + * paragraph) shall be included in all copies or substantial
 portions of the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
 EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
 EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
 DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
 ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 OTHER DEALINGS
 + * IN THE SOFTWARE.
 + *
 + * Authors:
 + *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
 mailto:juhapekka.heikk...@gmail.com
 + *
 + */
 +
 +#ifdef __SSE2__
 +#include main/macros.h
 +#include main/x86/sse2_clamping.h
 +#include emmintrin.h
 +
 +/**
 + * Clamp four float values to [min,max]
 + */
 +static inline void
 +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const
 float min,
 +   const float max)
 +{
 +   __m128  operand, minval, maxval;
 +
 +   operand = _mm_loadu_ps(src);
 +   minval = _mm_set1_ps(min);
 +   maxval = _mm_set1_ps(max);
 +   operand = _mm_max_ps(operand, minval);
 +   operand = _mm_min_ps(operand, maxval);
 +   _mm_storeu_ps(result, operand);
 +}
 +
 +
 +/* Clamp n amount float rgba pixels to [min,max] using SSE2
 
 
 Conceptually, _mesa_streaming_clamp_float_rgba() is clamping a
 contiguous array of floats to some min/max value. The fact that they are
 pixels is somewhat incidental when looking at it from a stream
 perspective. It looks like the code is more or less just operating on
 n*4 floats. Given that, a more efficient implementation would check
 alignment and then use aligned loads and streaming stores. It doesn't
 really matter if you straddle pixel boundaries as long as each float is
 operated on. I'm not sure how much effort you want to put into this
 though. :)
  

I was thinking about aligned versus unaligned loads and stores when Matt
commented about it on my first rfc set but didn't do anything about it
yet. I don't know how big difference there could really be gained in
real world, never tested. Google just brought answers saying it's
substantially faster but no numbers or real comparisons. It could be
good combination with what Siavash suggested on out of order execution.

Of all this sse stuff this clamping is the thing why I started to write
this 

Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-05 Thread Juha-Pekka Heikkila
On 04.11.2014 23:24, Roland Scheidegger wrote:
 Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila:
 Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
 ---
  src/mesa/Makefile.am  |   8 +++
  src/mesa/main/x86/sse2_clamping.c | 103 
 ++
  src/mesa/main/x86/sse2_clamping.h |  49 ++
  3 files changed, 160 insertions(+)
  create mode 100644 src/mesa/main/x86/sse2_clamping.c
  create mode 100644 src/mesa/main/x86/sse2_clamping.h

 diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
 index e71bccb..5d3c6f5 100644
 --- a/src/mesa/Makefile.am
 +++ b/src/mesa/Makefile.am
 @@ -111,6 +111,10 @@ if SSE41_SUPPORTED
  ARCH_LIBS += libmesa_sse41.la
  endif
  
 +if SSE2_SUPPORTED
 +ARCH_LIBS += libmesa_sse2.la
 +endif
 +
  MESA_ASM_FILES_FOR_ARCH =
  
  if HAVE_X86_ASM
 @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \
  main/streaming-load-memcpy.c
  libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
  
 +libmesa_sse2_la_SOURCES = \
 +main/x86/sse2_clamping.c
 +libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2
 +
  pkgconfigdir = $(libdir)/pkgconfig
  pkgconfig_DATA = gl.pc
  
 diff --git a/src/mesa/main/x86/sse2_clamping.c 
 b/src/mesa/main/x86/sse2_clamping.c
 new file mode 100644
 index 000..7df1c85
 --- /dev/null
 +++ b/src/mesa/main/x86/sse2_clamping.c
 @@ -0,0 +1,103 @@
 +/*
 + * Copyright © 2014 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the 
 Software),
 + * to deal in the Software without restriction, including without limitation
 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the next
 + * paragraph) shall be included in all copies or substantial portions of the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS 
 OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
 OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
 DEALINGS
 + * IN THE SOFTWARE.
 + *
 + * Authors:
 + *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
 + *
 + */
 +
 +#ifdef __SSE2__
 +#include main/macros.h
 +#include main/x86/sse2_clamping.h
 +#include emmintrin.h
 +
 +/**
 + * Clamp four float values to [min,max]
 + */
 +static inline void
 +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min,
 +   const float max)
 +{
 +   __m128  operand, minval, maxval;
 +
 +   operand = _mm_loadu_ps(src);
 +   minval = _mm_set1_ps(min);
 +   maxval = _mm_set1_ps(max);
 +   operand = _mm_max_ps(operand, minval);
 +   operand = _mm_min_ps(operand, maxval);
 +   _mm_storeu_ps(result, operand);
 +}
 +
 +
 +/* Clamp n amount float rgba pixels to [min,max] using SSE2
 + */
 +void
 +_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4],
 + GLfloat rgba_dst[][4], const GLfloat min,
 + const GLfloat max)
 +{
 +   int i;
 +
 +   for (i = 0; i  n; i++) {
 +  _mesa_clamp_float_rgba(rgba_src[i], rgba_dst[i], min, max);
 +   }
 +}
 +
 +
 +/* Clamp n amount float rgba pixels to [min,max] using SSE2 and apply
 + * scaling and mapping to components.
 + *
 + * this replace handling of [RGBA] channels:
 + * rgba_temp[RCOMP] = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F);
 + * rgba[i][RCOMP] = rMap[F_TO_I(rgba_temp[RCOMP] * scale[RCOMP])];
 + */
 +void
 +_mesa_clamp_float_rgba_scale_and_map(const GLuint n, GLfloat rgba_src[][4],
 + GLfloat rgba_dst[][4], const GLfloat 
 min,
 + const GLfloat max,
 + const GLfloat scale[4],
 + const GLfloat* rMap, const GLfloat* 
 gMap,
 + const GLfloat* bMap, const GLfloat* 
 aMap)
 +{
 +   int i;
 +   GLfloat __attribute__((aligned(16))) temp[4];
 +   __m128  *operand = (__m128*) temp, multiplier, mmove;
 +   __m128i truncated_integers;
 +
 +   const unsigned int* map_p = (const unsigned int*) truncated_integers;
 +
 +   multiplier = _mm_loadu_ps(scale);
 +
 +   for(i = 0; i  n; i++) {
 +  _mesa_clamp_float_rgba(rgba_src[i], temp, min, max);
 +
 +  *operand = _mm_mul_ps(multiplier, *operand);
 +  truncated_integers = _mm_cvttps_epi32(*operand);
 +  mmove = _mm_set_ps(aMap[map_p[ACOMP]], bMap[map_p[BCOMP]],
 + 

Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-05 Thread Roland Scheidegger
Am 05.11.2014 um 10:13 schrieb Juha-Pekka Heikkila:
 On 04.11.2014 23:24, Roland Scheidegger wrote:
 Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila:
 Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
 ---
  src/mesa/Makefile.am  |   8 +++
  src/mesa/main/x86/sse2_clamping.c | 103 
 ++
  src/mesa/main/x86/sse2_clamping.h |  49 ++
  3 files changed, 160 insertions(+)
  create mode 100644 src/mesa/main/x86/sse2_clamping.c
  create mode 100644 src/mesa/main/x86/sse2_clamping.h

 diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
 index e71bccb..5d3c6f5 100644
 --- a/src/mesa/Makefile.am
 +++ b/src/mesa/Makefile.am
 @@ -111,6 +111,10 @@ if SSE41_SUPPORTED
  ARCH_LIBS += libmesa_sse41.la
  endif
  
 +if SSE2_SUPPORTED
 +ARCH_LIBS += libmesa_sse2.la
 +endif
 +
  MESA_ASM_FILES_FOR_ARCH =
  
  if HAVE_X86_ASM
 @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \
 main/streaming-load-memcpy.c
  libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
  
 +libmesa_sse2_la_SOURCES = \
 +   main/x86/sse2_clamping.c
 +libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2
 +
  pkgconfigdir = $(libdir)/pkgconfig
  pkgconfig_DATA = gl.pc
  
 diff --git a/src/mesa/main/x86/sse2_clamping.c 
 b/src/mesa/main/x86/sse2_clamping.c
 new file mode 100644
 index 000..7df1c85
 --- /dev/null
 +++ b/src/mesa/main/x86/sse2_clamping.c
 @@ -0,0 +1,103 @@
 +/*
 + * Copyright © 2014 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the 
 Software),
 + * to deal in the Software without restriction, including without 
 limitation
 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the 
 next
 + * paragraph) shall be included in all copies or substantial portions of 
 the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS 
 OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
 OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
 DEALINGS
 + * IN THE SOFTWARE.
 + *
 + * Authors:
 + *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
 + *
 + */
 +
 +#ifdef __SSE2__
 +#include main/macros.h
 +#include main/x86/sse2_clamping.h
 +#include emmintrin.h
 +
 +/**
 + * Clamp four float values to [min,max]
 + */
 +static inline void
 +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min,
 +   const float max)
 +{
 +   __m128  operand, minval, maxval;
 +
 +   operand = _mm_loadu_ps(src);
 +   minval = _mm_set1_ps(min);
 +   maxval = _mm_set1_ps(max);
 +   operand = _mm_max_ps(operand, minval);
 +   operand = _mm_min_ps(operand, maxval);
 +   _mm_storeu_ps(result, operand);
 +}
 +
 +
 +/* Clamp n amount float rgba pixels to [min,max] using SSE2
 + */
 +void
 +_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4],
 + GLfloat rgba_dst[][4], const GLfloat min,
 + const GLfloat max)
 +{
 +   int i;
 +
 +   for (i = 0; i  n; i++) {
 +  _mesa_clamp_float_rgba(rgba_src[i], rgba_dst[i], min, max);
 +   }
 +}
 +
 +
 +/* Clamp n amount float rgba pixels to [min,max] using SSE2 and apply
 + * scaling and mapping to components.
 + *
 + * this replace handling of [RGBA] channels:
 + * rgba_temp[RCOMP] = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F);
 + * rgba[i][RCOMP] = rMap[F_TO_I(rgba_temp[RCOMP] * scale[RCOMP])];
 + */
 +void
 +_mesa_clamp_float_rgba_scale_and_map(const GLuint n, GLfloat rgba_src[][4],
 + GLfloat rgba_dst[][4], const GLfloat 
 min,
 + const GLfloat max,
 + const GLfloat scale[4],
 + const GLfloat* rMap, const GLfloat* 
 gMap,
 + const GLfloat* bMap, const GLfloat* 
 aMap)
 +{
 +   int i;
 +   GLfloat __attribute__((aligned(16))) temp[4];
 +   __m128  *operand = (__m128*) temp, multiplier, mmove;
 +   __m128i truncated_integers;
 +
 +   const unsigned int* map_p = (const unsigned int*) truncated_integers;
 +
 +   multiplier = _mm_loadu_ps(scale);
 +
 +   for(i = 0; i  n; i++) {
 +  _mesa_clamp_float_rgba(rgba_src[i], temp, min, max);
 +
 +  *operand = _mm_mul_ps(multiplier, *operand);
 +  truncated_integers = _mm_cvttps_epi32(*operand);
 +  mmove = _mm_set_ps(aMap[map_p[ACOMP]], 

Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-05 Thread Ian Romanick
On 11/04/2014 01:24 PM, Roland Scheidegger wrote:
 Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila:
 +   for(i = 0; i  n; i++) {
 +  _mesa_clamp_float_rgba(rgba_src[i], temp, min, max);
 +
 +  *operand = _mm_mul_ps(multiplier, *operand);
 +  truncated_integers = _mm_cvttps_epi32(*operand);
 +  mmove = _mm_set_ps(aMap[map_p[ACOMP]], bMap[map_p[BCOMP]],
 + gMap[map_p[GCOMP]], rMap[map_p[RCOMP]] );
 +
 +  _mm_storeu_ps(rgba_dst[i], mmove);
 The sse2 code at the end looks counterproductive to me. Not sure what
 gcc will generate but I'd suspect it involves some simd-int domain
 transition for the table lookups, plus another int-simd transition to
 get the values back into simd domain (alternatively it might use
 stores/load here) just so you can store them again...
 It would probably be better to just store the values directly after the
 table lookups.
 But in any case actually I'm beginning to suspect noone really cares
 about performance anyway for that path (who the hell uses these
 scale/map features?) so whatever works...

Which raises another question... do we have any piglit tests that
actually exercise this path?

 Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-04 Thread Juha-Pekka Heikkila
Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
---
 src/mesa/Makefile.am  |   8 +++
 src/mesa/main/x86/sse2_clamping.c | 103 ++
 src/mesa/main/x86/sse2_clamping.h |  49 ++
 3 files changed, 160 insertions(+)
 create mode 100644 src/mesa/main/x86/sse2_clamping.c
 create mode 100644 src/mesa/main/x86/sse2_clamping.h

diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
index e71bccb..5d3c6f5 100644
--- a/src/mesa/Makefile.am
+++ b/src/mesa/Makefile.am
@@ -111,6 +111,10 @@ if SSE41_SUPPORTED
 ARCH_LIBS += libmesa_sse41.la
 endif
 
+if SSE2_SUPPORTED
+ARCH_LIBS += libmesa_sse2.la
+endif
+
 MESA_ASM_FILES_FOR_ARCH =
 
 if HAVE_X86_ASM
@@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \
main/streaming-load-memcpy.c
 libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
 
+libmesa_sse2_la_SOURCES = \
+   main/x86/sse2_clamping.c
+libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2
+
 pkgconfigdir = $(libdir)/pkgconfig
 pkgconfig_DATA = gl.pc
 
diff --git a/src/mesa/main/x86/sse2_clamping.c 
b/src/mesa/main/x86/sse2_clamping.c
new file mode 100644
index 000..7df1c85
--- /dev/null
+++ b/src/mesa/main/x86/sse2_clamping.c
@@ -0,0 +1,103 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the Software),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
+ *
+ */
+
+#ifdef __SSE2__
+#include main/macros.h
+#include main/x86/sse2_clamping.h
+#include emmintrin.h
+
+/**
+ * Clamp four float values to [min,max]
+ */
+static inline void
+_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min,
+   const float max)
+{
+   __m128  operand, minval, maxval;
+
+   operand = _mm_loadu_ps(src);
+   minval = _mm_set1_ps(min);
+   maxval = _mm_set1_ps(max);
+   operand = _mm_max_ps(operand, minval);
+   operand = _mm_min_ps(operand, maxval);
+   _mm_storeu_ps(result, operand);
+}
+
+
+/* Clamp n amount float rgba pixels to [min,max] using SSE2
+ */
+void
+_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4],
+ GLfloat rgba_dst[][4], const GLfloat min,
+ const GLfloat max)
+{
+   int i;
+
+   for (i = 0; i  n; i++) {
+  _mesa_clamp_float_rgba(rgba_src[i], rgba_dst[i], min, max);
+   }
+}
+
+
+/* Clamp n amount float rgba pixels to [min,max] using SSE2 and apply
+ * scaling and mapping to components.
+ *
+ * this replace handling of [RGBA] channels:
+ * rgba_temp[RCOMP] = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F);
+ * rgba[i][RCOMP] = rMap[F_TO_I(rgba_temp[RCOMP] * scale[RCOMP])];
+ */
+void
+_mesa_clamp_float_rgba_scale_and_map(const GLuint n, GLfloat rgba_src[][4],
+ GLfloat rgba_dst[][4], const GLfloat min,
+ const GLfloat max,
+ const GLfloat scale[4],
+ const GLfloat* rMap, const GLfloat* gMap,
+ const GLfloat* bMap, const GLfloat* aMap)
+{
+   int i;
+   GLfloat __attribute__((aligned(16))) temp[4];
+   __m128  *operand = (__m128*) temp, multiplier, mmove;
+   __m128i truncated_integers;
+
+   const unsigned int* map_p = (const unsigned int*) truncated_integers;
+
+   multiplier = _mm_loadu_ps(scale);
+
+   for(i = 0; i  n; i++) {
+  _mesa_clamp_float_rgba(rgba_src[i], temp, min, max);
+
+  *operand = _mm_mul_ps(multiplier, *operand);
+  truncated_integers = _mm_cvttps_epi32(*operand);
+  mmove = _mm_set_ps(aMap[map_p[ACOMP]], bMap[map_p[BCOMP]],
+ gMap[map_p[GCOMP]], rMap[map_p[RCOMP]] );
+
+  _mm_storeu_ps(rgba_dst[i], mmove);
+   }
+}
+
+
+#endif /* __SSE2__ */
diff --git a/src/mesa/main/x86/sse2_clamping.h 
b/src/mesa/main/x86/sse2_clamping.h
new file mode 100644
index 000..688fab7
--- /dev/null

Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-04 Thread Siavash Eliasi
Hello. I'd get rid of _mm_set1_ps inside _mesa_clamp_float_rgba by 
passing _m128 version of min/max directly, so _mm_set1_ps will be 
moved out of the for loop.


I'd also unroll the _mesa_streaming_clamp_float_rgba loop to minimize 
the loop overhead (and utilize out of order execution as a bonus), 
because nothing compute intensive is happening there. You can also use 
prefetching (_mm_prefetch) there to improve performance by reading data 
ahead from memory.


Best regards,
Siavash Eliasi.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-04 Thread Patrick Baggett
On Tue, Nov 4, 2014 at 6:05 AM, Juha-Pekka Heikkila 
juhapekka.heikk...@gmail.com wrote:

 Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
 ---
  src/mesa/Makefile.am  |   8 +++
  src/mesa/main/x86/sse2_clamping.c | 103
 ++
  src/mesa/main/x86/sse2_clamping.h |  49 ++
  3 files changed, 160 insertions(+)
  create mode 100644 src/mesa/main/x86/sse2_clamping.c
  create mode 100644 src/mesa/main/x86/sse2_clamping.h

 diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
 index e71bccb..5d3c6f5 100644
 --- a/src/mesa/Makefile.am
 +++ b/src/mesa/Makefile.am
 @@ -111,6 +111,10 @@ if SSE41_SUPPORTED
  ARCH_LIBS += libmesa_sse41.la
  endif

 +if SSE2_SUPPORTED
 +ARCH_LIBS += libmesa_sse2.la
 +endif
 +
  MESA_ASM_FILES_FOR_ARCH =

  if HAVE_X86_ASM
 @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \
 main/streaming-load-memcpy.c
  libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1

 +libmesa_sse2_la_SOURCES = \
 +   main/x86/sse2_clamping.c
 +libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2
 +
  pkgconfigdir = $(libdir)/pkgconfig
  pkgconfig_DATA = gl.pc

 diff --git a/src/mesa/main/x86/sse2_clamping.c
 b/src/mesa/main/x86/sse2_clamping.c
 new file mode 100644
 index 000..7df1c85
 --- /dev/null
 +++ b/src/mesa/main/x86/sse2_clamping.c
 @@ -0,0 +1,103 @@
 +/*
 + * Copyright © 2014 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the
 Software),
 + * to deal in the Software without restriction, including without
 limitation
 + * the rights to use, copy, modify, merge, publish, distribute,
 sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the
 next
 + * paragraph) shall be included in all copies or substantial portions of
 the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
 EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
 SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
 OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 DEALINGS
 + * IN THE SOFTWARE.
 + *
 + * Authors:
 + *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
 + *
 + */
 +
 +#ifdef __SSE2__
 +#include main/macros.h
 +#include main/x86/sse2_clamping.h
 +#include emmintrin.h
 +
 +/**
 + * Clamp four float values to [min,max]
 + */
 +static inline void
 +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min,
 +   const float max)
 +{
 +   __m128  operand, minval, maxval;
 +
 +   operand = _mm_loadu_ps(src);
 +   minval = _mm_set1_ps(min);
 +   maxval = _mm_set1_ps(max);
 +   operand = _mm_max_ps(operand, minval);
 +   operand = _mm_min_ps(operand, maxval);
 +   _mm_storeu_ps(result, operand);
 +}
 +
 +
 +/* Clamp n amount float rgba pixels to [min,max] using SSE2


Conceptually, _mesa_streaming_clamp_float_rgba() is clamping a contiguous
array of floats to some min/max value. The fact that they are pixels is
somewhat incidental when looking at it from a stream perspective. It looks
like the code is more or less just operating on n*4 floats. Given that, a
more efficient implementation would check alignment and then use aligned
loads and streaming stores. It doesn't really matter if you straddle pixel
boundaries as long as each float is operated on. I'm not sure how much
effort you want to put into this though. :)


 + */
 +void
 +_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4],
 + GLfloat rgba_dst[][4], const GLfloat min,
 + const GLfloat max)
 +{
 +   int i;
 +
 +   for (i = 0; i  n; i++) {
 +  _mesa_clamp_float_rgba(rgba_src[i], rgba_dst[i], min, max);
 +   }
 +}
 +
 +
 +/* Clamp n amount float rgba pixels to [min,max] using SSE2 and apply
 + * scaling and mapping to components.
 + *
 + * this replace handling of [RGBA] channels:
 + * rgba_temp[RCOMP] = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F);
 + * rgba[i][RCOMP] = rMap[F_TO_I(rgba_temp[RCOMP] * scale[RCOMP])];
 + */
 +void
 +_mesa_clamp_float_rgba_scale_and_map(const GLuint n, GLfloat
 rgba_src[][4],
 + GLfloat rgba_dst[][4], const GLfloat
 min,
 + const GLfloat max,
 + const GLfloat scale[4],
 + const GLfloat* rMap, const GLfloat*
 gMap,
 + const GLfloat* bMap, const GLfloat*
 aMap)
 +{
 +   int i;
 +   GLfloat 

Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-04 Thread Roland Scheidegger
Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila:
 Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
 ---
  src/mesa/Makefile.am  |   8 +++
  src/mesa/main/x86/sse2_clamping.c | 103 
 ++
  src/mesa/main/x86/sse2_clamping.h |  49 ++
  3 files changed, 160 insertions(+)
  create mode 100644 src/mesa/main/x86/sse2_clamping.c
  create mode 100644 src/mesa/main/x86/sse2_clamping.h
 
 diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
 index e71bccb..5d3c6f5 100644
 --- a/src/mesa/Makefile.am
 +++ b/src/mesa/Makefile.am
 @@ -111,6 +111,10 @@ if SSE41_SUPPORTED
  ARCH_LIBS += libmesa_sse41.la
  endif
  
 +if SSE2_SUPPORTED
 +ARCH_LIBS += libmesa_sse2.la
 +endif
 +
  MESA_ASM_FILES_FOR_ARCH =
  
  if HAVE_X86_ASM
 @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \
   main/streaming-load-memcpy.c
  libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
  
 +libmesa_sse2_la_SOURCES = \
 + main/x86/sse2_clamping.c
 +libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2
 +
  pkgconfigdir = $(libdir)/pkgconfig
  pkgconfig_DATA = gl.pc
  
 diff --git a/src/mesa/main/x86/sse2_clamping.c 
 b/src/mesa/main/x86/sse2_clamping.c
 new file mode 100644
 index 000..7df1c85
 --- /dev/null
 +++ b/src/mesa/main/x86/sse2_clamping.c
 @@ -0,0 +1,103 @@
 +/*
 + * Copyright © 2014 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the Software),
 + * to deal in the Software without restriction, including without limitation
 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the next
 + * paragraph) shall be included in all copies or substantial portions of the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
 DEALINGS
 + * IN THE SOFTWARE.
 + *
 + * Authors:
 + *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
 + *
 + */
 +
 +#ifdef __SSE2__
 +#include main/macros.h
 +#include main/x86/sse2_clamping.h
 +#include emmintrin.h
 +
 +/**
 + * Clamp four float values to [min,max]
 + */
 +static inline void
 +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min,
 +   const float max)
 +{
 +   __m128  operand, minval, maxval;
 +
 +   operand = _mm_loadu_ps(src);
 +   minval = _mm_set1_ps(min);
 +   maxval = _mm_set1_ps(max);
 +   operand = _mm_max_ps(operand, minval);
 +   operand = _mm_min_ps(operand, maxval);
 +   _mm_storeu_ps(result, operand);
 +}
 +
 +
 +/* Clamp n amount float rgba pixels to [min,max] using SSE2
 + */
 +void
 +_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4],
 + GLfloat rgba_dst[][4], const GLfloat min,
 + const GLfloat max)
 +{
 +   int i;
 +
 +   for (i = 0; i  n; i++) {
 +  _mesa_clamp_float_rgba(rgba_src[i], rgba_dst[i], min, max);
 +   }
 +}
 +
 +
 +/* Clamp n amount float rgba pixels to [min,max] using SSE2 and apply
 + * scaling and mapping to components.
 + *
 + * this replace handling of [RGBA] channels:
 + * rgba_temp[RCOMP] = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F);
 + * rgba[i][RCOMP] = rMap[F_TO_I(rgba_temp[RCOMP] * scale[RCOMP])];
 + */
 +void
 +_mesa_clamp_float_rgba_scale_and_map(const GLuint n, GLfloat rgba_src[][4],
 + GLfloat rgba_dst[][4], const GLfloat 
 min,
 + const GLfloat max,
 + const GLfloat scale[4],
 + const GLfloat* rMap, const GLfloat* 
 gMap,
 + const GLfloat* bMap, const GLfloat* 
 aMap)
 +{
 +   int i;
 +   GLfloat __attribute__((aligned(16))) temp[4];
 +   __m128  *operand = (__m128*) temp, multiplier, mmove;
 +   __m128i truncated_integers;
 +
 +   const unsigned int* map_p = (const unsigned int*) truncated_integers;
 +
 +   multiplier = _mm_loadu_ps(scale);
 +
 +   for(i = 0; i  n; i++) {
 +  _mesa_clamp_float_rgba(rgba_src[i], temp, min, max);
 +
 +  *operand = _mm_mul_ps(multiplier, *operand);
 +  truncated_integers = _mm_cvttps_epi32(*operand);
 +  mmove = _mm_set_ps(aMap[map_p[ACOMP]], bMap[map_p[BCOMP]],
 + gMap[map_p[GCOMP]], rMap[map_p[RCOMP]] );
 +
 +