Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping
On 05.11.2014 21:21, Ian Romanick wrote: On 11/04/2014 01:24 PM, Roland Scheidegger wrote: Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila: + for(i = 0; i n; i++) { + _mesa_clamp_float_rgba(rgba_src[i], temp, min, max); + + *operand = _mm_mul_ps(multiplier, *operand); + truncated_integers = _mm_cvttps_epi32(*operand); + mmove = _mm_set_ps(aMap[map_p[ACOMP]], bMap[map_p[BCOMP]], + gMap[map_p[GCOMP]], rMap[map_p[RCOMP]] ); + + _mm_storeu_ps(rgba_dst[i], mmove); The sse2 code at the end looks counterproductive to me. Not sure what gcc will generate but I'd suspect it involves some simd-int domain transition for the table lookups, plus another int-simd transition to get the values back into simd domain (alternatively it might use stores/load here) just so you can store them again... It would probably be better to just store the values directly after the table lookups. But in any case actually I'm beginning to suspect noone really cares about performance anyway for that path (who the hell uses these scale/map features?) so whatever works... Which raises another question... do we have any piglit tests that actually exercise this path? No we don't. I made small test for this to see how it works, I was planning to move my test to Piglit later. /Juha-Pekka ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping
Hi, I did rely on gcc optimization run on moving things around for me. What _mesa_streaming_clamp_float_rgba really look like when I compile it is this: Dump of assembler code for function _mesa_streaming_clamp_float_rgba: 0x7401a0a0 +0: test %edi,%edi 0x7401a0a2 +2: je 0x7401a0d7 _mesa_streaming_clamp_float_rgba+55 0x7401a0a4 +4: sub$0x1,%edi 0x7401a0a7 +7: shufps $0x0,%xmm0,%xmm0 0x7401a0ab +11:shufps $0x0,%xmm1,%xmm1 0x7401a0af +15:add$0x1,%rdi 0x7401a0b3 +19:shl$0x4,%rdi 0x7401a0b7 +23:xor%eax,%eax 0x7401a0b9 +25:nopl 0x0(%rax) 0x7401a0c0 +32:movups (%rsi,%rax,1),%xmm2 0x7401a0c4 +36:maxps %xmm0,%xmm2 0x7401a0c7 +39:minps %xmm1,%xmm2 0x7401a0ca +42:movups %xmm2,(%rdx,%rax,1) 0x7401a0ce +46:add$0x10,%rax 0x7401a0d2 +50:cmp%rdi,%rax 0x7401a0d5 +53:jne0x7401a0c0 _mesa_streaming_clamp_float_rgba+32 0x7401a0d7 +55:repz retq End of assembler dump. Gcc has after inlining moved all unnecessary stuff outside the loop but I can still have _mesa_clamp_float_rgba function ready for generic use on source level. I did trust gcc here also with the unrolling, looking at the loop unrolling would reduce three instructions per round but I suspect add/cmp/jne are not the expensive instructions here (I didn't check) Out of order execution might be interesting to try here though. I need to check if I can get gcc to behave properly, never before attempted that with intrinsics on gcc :) /Juha-Pekka On 04.11.2014 19:35, Siavash Eliasi wrote: Hello. I'd get rid of _mm_set1_ps inside _mesa_clamp_float_rgba by passing _m128 version of min/max directly, so _mm_set1_ps will be moved out of the for loop. I'd also unroll the _mesa_streaming_clamp_float_rgba loop to minimize the loop overhead (and utilize out of order execution as a bonus), because nothing compute intensive is happening there. You can also use prefetching (_mm_prefetch) there to improve performance by reading data ahead from memory. Best regards, Siavash Eliasi. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping
On 04.11.2014 21:46, Patrick Baggett wrote: On Tue, Nov 4, 2014 at 6:05 AM, Juha-Pekka Heikkila juhapekka.heikk...@gmail.com mailto:juhapekka.heikk...@gmail.com wrote: Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com mailto:juhapekka.heikk...@gmail.com --- src/mesa/Makefile.am | 8 +++ src/mesa/main/x86/sse2_clamping.c | 103 ++ src/mesa/main/x86/sse2_clamping.h | 49 ++ 3 files changed, 160 insertions(+) create mode 100644 src/mesa/main/x86/sse2_clamping.c create mode 100644 src/mesa/main/x86/sse2_clamping.h diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am index e71bccb..5d3c6f5 100644 --- a/src/mesa/Makefile.am +++ b/src/mesa/Makefile.am @@ -111,6 +111,10 @@ if SSE41_SUPPORTED ARCH_LIBS += libmesa_sse41.la http://libmesa_sse41.la endif +if SSE2_SUPPORTED +ARCH_LIBS += libmesa_sse2.la http://libmesa_sse2.la +endif + MESA_ASM_FILES_FOR_ARCH = if HAVE_X86_ASM @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \ main/streaming-load-memcpy.c libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 +libmesa_sse2_la_SOURCES = \ + main/x86/sse2_clamping.c +libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2 + pkgconfigdir = $(libdir)/pkgconfig pkgconfig_DATA = gl.pc diff --git a/src/mesa/main/x86/sse2_clamping.c b/src/mesa/main/x86/sse2_clamping.c new file mode 100644 index 000..7df1c85 --- /dev/null +++ b/src/mesa/main/x86/sse2_clamping.c @@ -0,0 +1,103 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com mailto:juhapekka.heikk...@gmail.com + * + */ + +#ifdef __SSE2__ +#include main/macros.h +#include main/x86/sse2_clamping.h +#include emmintrin.h + +/** + * Clamp four float values to [min,max] + */ +static inline void +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min, + const float max) +{ + __m128 operand, minval, maxval; + + operand = _mm_loadu_ps(src); + minval = _mm_set1_ps(min); + maxval = _mm_set1_ps(max); + operand = _mm_max_ps(operand, minval); + operand = _mm_min_ps(operand, maxval); + _mm_storeu_ps(result, operand); +} + + +/* Clamp n amount float rgba pixels to [min,max] using SSE2 Conceptually, _mesa_streaming_clamp_float_rgba() is clamping a contiguous array of floats to some min/max value. The fact that they are pixels is somewhat incidental when looking at it from a stream perspective. It looks like the code is more or less just operating on n*4 floats. Given that, a more efficient implementation would check alignment and then use aligned loads and streaming stores. It doesn't really matter if you straddle pixel boundaries as long as each float is operated on. I'm not sure how much effort you want to put into this though. :) I was thinking about aligned versus unaligned loads and stores when Matt commented about it on my first rfc set but didn't do anything about it yet. I don't know how big difference there could really be gained in real world, never tested. Google just brought answers saying it's substantially faster but no numbers or real comparisons. It could be good combination with what Siavash suggested on out of order execution. Of all this sse stuff this clamping is the thing why I started to write this
Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping
On 04.11.2014 23:24, Roland Scheidegger wrote: Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila: Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com --- src/mesa/Makefile.am | 8 +++ src/mesa/main/x86/sse2_clamping.c | 103 ++ src/mesa/main/x86/sse2_clamping.h | 49 ++ 3 files changed, 160 insertions(+) create mode 100644 src/mesa/main/x86/sse2_clamping.c create mode 100644 src/mesa/main/x86/sse2_clamping.h diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am index e71bccb..5d3c6f5 100644 --- a/src/mesa/Makefile.am +++ b/src/mesa/Makefile.am @@ -111,6 +111,10 @@ if SSE41_SUPPORTED ARCH_LIBS += libmesa_sse41.la endif +if SSE2_SUPPORTED +ARCH_LIBS += libmesa_sse2.la +endif + MESA_ASM_FILES_FOR_ARCH = if HAVE_X86_ASM @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \ main/streaming-load-memcpy.c libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 +libmesa_sse2_la_SOURCES = \ +main/x86/sse2_clamping.c +libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2 + pkgconfigdir = $(libdir)/pkgconfig pkgconfig_DATA = gl.pc diff --git a/src/mesa/main/x86/sse2_clamping.c b/src/mesa/main/x86/sse2_clamping.c new file mode 100644 index 000..7df1c85 --- /dev/null +++ b/src/mesa/main/x86/sse2_clamping.c @@ -0,0 +1,103 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com + * + */ + +#ifdef __SSE2__ +#include main/macros.h +#include main/x86/sse2_clamping.h +#include emmintrin.h + +/** + * Clamp four float values to [min,max] + */ +static inline void +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min, + const float max) +{ + __m128 operand, minval, maxval; + + operand = _mm_loadu_ps(src); + minval = _mm_set1_ps(min); + maxval = _mm_set1_ps(max); + operand = _mm_max_ps(operand, minval); + operand = _mm_min_ps(operand, maxval); + _mm_storeu_ps(result, operand); +} + + +/* Clamp n amount float rgba pixels to [min,max] using SSE2 + */ +void +_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4], + GLfloat rgba_dst[][4], const GLfloat min, + const GLfloat max) +{ + int i; + + for (i = 0; i n; i++) { + _mesa_clamp_float_rgba(rgba_src[i], rgba_dst[i], min, max); + } +} + + +/* Clamp n amount float rgba pixels to [min,max] using SSE2 and apply + * scaling and mapping to components. + * + * this replace handling of [RGBA] channels: + * rgba_temp[RCOMP] = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F); + * rgba[i][RCOMP] = rMap[F_TO_I(rgba_temp[RCOMP] * scale[RCOMP])]; + */ +void +_mesa_clamp_float_rgba_scale_and_map(const GLuint n, GLfloat rgba_src[][4], + GLfloat rgba_dst[][4], const GLfloat min, + const GLfloat max, + const GLfloat scale[4], + const GLfloat* rMap, const GLfloat* gMap, + const GLfloat* bMap, const GLfloat* aMap) +{ + int i; + GLfloat __attribute__((aligned(16))) temp[4]; + __m128 *operand = (__m128*) temp, multiplier, mmove; + __m128i truncated_integers; + + const unsigned int* map_p = (const unsigned int*) truncated_integers; + + multiplier = _mm_loadu_ps(scale); + + for(i = 0; i n; i++) { + _mesa_clamp_float_rgba(rgba_src[i], temp, min, max); + + *operand = _mm_mul_ps(multiplier, *operand); + truncated_integers = _mm_cvttps_epi32(*operand); + mmove = _mm_set_ps(aMap[map_p[ACOMP]], bMap[map_p[BCOMP]], +
Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping
Am 05.11.2014 um 10:13 schrieb Juha-Pekka Heikkila: On 04.11.2014 23:24, Roland Scheidegger wrote: Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila: Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com --- src/mesa/Makefile.am | 8 +++ src/mesa/main/x86/sse2_clamping.c | 103 ++ src/mesa/main/x86/sse2_clamping.h | 49 ++ 3 files changed, 160 insertions(+) create mode 100644 src/mesa/main/x86/sse2_clamping.c create mode 100644 src/mesa/main/x86/sse2_clamping.h diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am index e71bccb..5d3c6f5 100644 --- a/src/mesa/Makefile.am +++ b/src/mesa/Makefile.am @@ -111,6 +111,10 @@ if SSE41_SUPPORTED ARCH_LIBS += libmesa_sse41.la endif +if SSE2_SUPPORTED +ARCH_LIBS += libmesa_sse2.la +endif + MESA_ASM_FILES_FOR_ARCH = if HAVE_X86_ASM @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \ main/streaming-load-memcpy.c libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 +libmesa_sse2_la_SOURCES = \ + main/x86/sse2_clamping.c +libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2 + pkgconfigdir = $(libdir)/pkgconfig pkgconfig_DATA = gl.pc diff --git a/src/mesa/main/x86/sse2_clamping.c b/src/mesa/main/x86/sse2_clamping.c new file mode 100644 index 000..7df1c85 --- /dev/null +++ b/src/mesa/main/x86/sse2_clamping.c @@ -0,0 +1,103 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com + * + */ + +#ifdef __SSE2__ +#include main/macros.h +#include main/x86/sse2_clamping.h +#include emmintrin.h + +/** + * Clamp four float values to [min,max] + */ +static inline void +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min, + const float max) +{ + __m128 operand, minval, maxval; + + operand = _mm_loadu_ps(src); + minval = _mm_set1_ps(min); + maxval = _mm_set1_ps(max); + operand = _mm_max_ps(operand, minval); + operand = _mm_min_ps(operand, maxval); + _mm_storeu_ps(result, operand); +} + + +/* Clamp n amount float rgba pixels to [min,max] using SSE2 + */ +void +_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4], + GLfloat rgba_dst[][4], const GLfloat min, + const GLfloat max) +{ + int i; + + for (i = 0; i n; i++) { + _mesa_clamp_float_rgba(rgba_src[i], rgba_dst[i], min, max); + } +} + + +/* Clamp n amount float rgba pixels to [min,max] using SSE2 and apply + * scaling and mapping to components. + * + * this replace handling of [RGBA] channels: + * rgba_temp[RCOMP] = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F); + * rgba[i][RCOMP] = rMap[F_TO_I(rgba_temp[RCOMP] * scale[RCOMP])]; + */ +void +_mesa_clamp_float_rgba_scale_and_map(const GLuint n, GLfloat rgba_src[][4], + GLfloat rgba_dst[][4], const GLfloat min, + const GLfloat max, + const GLfloat scale[4], + const GLfloat* rMap, const GLfloat* gMap, + const GLfloat* bMap, const GLfloat* aMap) +{ + int i; + GLfloat __attribute__((aligned(16))) temp[4]; + __m128 *operand = (__m128*) temp, multiplier, mmove; + __m128i truncated_integers; + + const unsigned int* map_p = (const unsigned int*) truncated_integers; + + multiplier = _mm_loadu_ps(scale); + + for(i = 0; i n; i++) { + _mesa_clamp_float_rgba(rgba_src[i], temp, min, max); + + *operand = _mm_mul_ps(multiplier, *operand); + truncated_integers = _mm_cvttps_epi32(*operand); + mmove = _mm_set_ps(aMap[map_p[ACOMP]],
Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping
On 11/04/2014 01:24 PM, Roland Scheidegger wrote: Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila: + for(i = 0; i n; i++) { + _mesa_clamp_float_rgba(rgba_src[i], temp, min, max); + + *operand = _mm_mul_ps(multiplier, *operand); + truncated_integers = _mm_cvttps_epi32(*operand); + mmove = _mm_set_ps(aMap[map_p[ACOMP]], bMap[map_p[BCOMP]], + gMap[map_p[GCOMP]], rMap[map_p[RCOMP]] ); + + _mm_storeu_ps(rgba_dst[i], mmove); The sse2 code at the end looks counterproductive to me. Not sure what gcc will generate but I'd suspect it involves some simd-int domain transition for the table lookups, plus another int-simd transition to get the values back into simd domain (alternatively it might use stores/load here) just so you can store them again... It would probably be better to just store the values directly after the table lookups. But in any case actually I'm beginning to suspect noone really cares about performance anyway for that path (who the hell uses these scale/map features?) so whatever works... Which raises another question... do we have any piglit tests that actually exercise this path? Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping
Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com --- src/mesa/Makefile.am | 8 +++ src/mesa/main/x86/sse2_clamping.c | 103 ++ src/mesa/main/x86/sse2_clamping.h | 49 ++ 3 files changed, 160 insertions(+) create mode 100644 src/mesa/main/x86/sse2_clamping.c create mode 100644 src/mesa/main/x86/sse2_clamping.h diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am index e71bccb..5d3c6f5 100644 --- a/src/mesa/Makefile.am +++ b/src/mesa/Makefile.am @@ -111,6 +111,10 @@ if SSE41_SUPPORTED ARCH_LIBS += libmesa_sse41.la endif +if SSE2_SUPPORTED +ARCH_LIBS += libmesa_sse2.la +endif + MESA_ASM_FILES_FOR_ARCH = if HAVE_X86_ASM @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \ main/streaming-load-memcpy.c libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 +libmesa_sse2_la_SOURCES = \ + main/x86/sse2_clamping.c +libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2 + pkgconfigdir = $(libdir)/pkgconfig pkgconfig_DATA = gl.pc diff --git a/src/mesa/main/x86/sse2_clamping.c b/src/mesa/main/x86/sse2_clamping.c new file mode 100644 index 000..7df1c85 --- /dev/null +++ b/src/mesa/main/x86/sse2_clamping.c @@ -0,0 +1,103 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com + * + */ + +#ifdef __SSE2__ +#include main/macros.h +#include main/x86/sse2_clamping.h +#include emmintrin.h + +/** + * Clamp four float values to [min,max] + */ +static inline void +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min, + const float max) +{ + __m128 operand, minval, maxval; + + operand = _mm_loadu_ps(src); + minval = _mm_set1_ps(min); + maxval = _mm_set1_ps(max); + operand = _mm_max_ps(operand, minval); + operand = _mm_min_ps(operand, maxval); + _mm_storeu_ps(result, operand); +} + + +/* Clamp n amount float rgba pixels to [min,max] using SSE2 + */ +void +_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4], + GLfloat rgba_dst[][4], const GLfloat min, + const GLfloat max) +{ + int i; + + for (i = 0; i n; i++) { + _mesa_clamp_float_rgba(rgba_src[i], rgba_dst[i], min, max); + } +} + + +/* Clamp n amount float rgba pixels to [min,max] using SSE2 and apply + * scaling and mapping to components. + * + * this replace handling of [RGBA] channels: + * rgba_temp[RCOMP] = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F); + * rgba[i][RCOMP] = rMap[F_TO_I(rgba_temp[RCOMP] * scale[RCOMP])]; + */ +void +_mesa_clamp_float_rgba_scale_and_map(const GLuint n, GLfloat rgba_src[][4], + GLfloat rgba_dst[][4], const GLfloat min, + const GLfloat max, + const GLfloat scale[4], + const GLfloat* rMap, const GLfloat* gMap, + const GLfloat* bMap, const GLfloat* aMap) +{ + int i; + GLfloat __attribute__((aligned(16))) temp[4]; + __m128 *operand = (__m128*) temp, multiplier, mmove; + __m128i truncated_integers; + + const unsigned int* map_p = (const unsigned int*) truncated_integers; + + multiplier = _mm_loadu_ps(scale); + + for(i = 0; i n; i++) { + _mesa_clamp_float_rgba(rgba_src[i], temp, min, max); + + *operand = _mm_mul_ps(multiplier, *operand); + truncated_integers = _mm_cvttps_epi32(*operand); + mmove = _mm_set_ps(aMap[map_p[ACOMP]], bMap[map_p[BCOMP]], + gMap[map_p[GCOMP]], rMap[map_p[RCOMP]] ); + + _mm_storeu_ps(rgba_dst[i], mmove); + } +} + + +#endif /* __SSE2__ */ diff --git a/src/mesa/main/x86/sse2_clamping.h b/src/mesa/main/x86/sse2_clamping.h new file mode 100644 index 000..688fab7 --- /dev/null
Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping
Hello. I'd get rid of _mm_set1_ps inside _mesa_clamp_float_rgba by passing _m128 version of min/max directly, so _mm_set1_ps will be moved out of the for loop. I'd also unroll the _mesa_streaming_clamp_float_rgba loop to minimize the loop overhead (and utilize out of order execution as a bonus), because nothing compute intensive is happening there. You can also use prefetching (_mm_prefetch) there to improve performance by reading data ahead from memory. Best regards, Siavash Eliasi. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping
On Tue, Nov 4, 2014 at 6:05 AM, Juha-Pekka Heikkila juhapekka.heikk...@gmail.com wrote: Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com --- src/mesa/Makefile.am | 8 +++ src/mesa/main/x86/sse2_clamping.c | 103 ++ src/mesa/main/x86/sse2_clamping.h | 49 ++ 3 files changed, 160 insertions(+) create mode 100644 src/mesa/main/x86/sse2_clamping.c create mode 100644 src/mesa/main/x86/sse2_clamping.h diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am index e71bccb..5d3c6f5 100644 --- a/src/mesa/Makefile.am +++ b/src/mesa/Makefile.am @@ -111,6 +111,10 @@ if SSE41_SUPPORTED ARCH_LIBS += libmesa_sse41.la endif +if SSE2_SUPPORTED +ARCH_LIBS += libmesa_sse2.la +endif + MESA_ASM_FILES_FOR_ARCH = if HAVE_X86_ASM @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \ main/streaming-load-memcpy.c libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 +libmesa_sse2_la_SOURCES = \ + main/x86/sse2_clamping.c +libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2 + pkgconfigdir = $(libdir)/pkgconfig pkgconfig_DATA = gl.pc diff --git a/src/mesa/main/x86/sse2_clamping.c b/src/mesa/main/x86/sse2_clamping.c new file mode 100644 index 000..7df1c85 --- /dev/null +++ b/src/mesa/main/x86/sse2_clamping.c @@ -0,0 +1,103 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com + * + */ + +#ifdef __SSE2__ +#include main/macros.h +#include main/x86/sse2_clamping.h +#include emmintrin.h + +/** + * Clamp four float values to [min,max] + */ +static inline void +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min, + const float max) +{ + __m128 operand, minval, maxval; + + operand = _mm_loadu_ps(src); + minval = _mm_set1_ps(min); + maxval = _mm_set1_ps(max); + operand = _mm_max_ps(operand, minval); + operand = _mm_min_ps(operand, maxval); + _mm_storeu_ps(result, operand); +} + + +/* Clamp n amount float rgba pixels to [min,max] using SSE2 Conceptually, _mesa_streaming_clamp_float_rgba() is clamping a contiguous array of floats to some min/max value. The fact that they are pixels is somewhat incidental when looking at it from a stream perspective. It looks like the code is more or less just operating on n*4 floats. Given that, a more efficient implementation would check alignment and then use aligned loads and streaming stores. It doesn't really matter if you straddle pixel boundaries as long as each float is operated on. I'm not sure how much effort you want to put into this though. :) + */ +void +_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4], + GLfloat rgba_dst[][4], const GLfloat min, + const GLfloat max) +{ + int i; + + for (i = 0; i n; i++) { + _mesa_clamp_float_rgba(rgba_src[i], rgba_dst[i], min, max); + } +} + + +/* Clamp n amount float rgba pixels to [min,max] using SSE2 and apply + * scaling and mapping to components. + * + * this replace handling of [RGBA] channels: + * rgba_temp[RCOMP] = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F); + * rgba[i][RCOMP] = rMap[F_TO_I(rgba_temp[RCOMP] * scale[RCOMP])]; + */ +void +_mesa_clamp_float_rgba_scale_and_map(const GLuint n, GLfloat rgba_src[][4], + GLfloat rgba_dst[][4], const GLfloat min, + const GLfloat max, + const GLfloat scale[4], + const GLfloat* rMap, const GLfloat* gMap, + const GLfloat* bMap, const GLfloat* aMap) +{ + int i; + GLfloat
Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping
Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila: Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com --- src/mesa/Makefile.am | 8 +++ src/mesa/main/x86/sse2_clamping.c | 103 ++ src/mesa/main/x86/sse2_clamping.h | 49 ++ 3 files changed, 160 insertions(+) create mode 100644 src/mesa/main/x86/sse2_clamping.c create mode 100644 src/mesa/main/x86/sse2_clamping.h diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am index e71bccb..5d3c6f5 100644 --- a/src/mesa/Makefile.am +++ b/src/mesa/Makefile.am @@ -111,6 +111,10 @@ if SSE41_SUPPORTED ARCH_LIBS += libmesa_sse41.la endif +if SSE2_SUPPORTED +ARCH_LIBS += libmesa_sse2.la +endif + MESA_ASM_FILES_FOR_ARCH = if HAVE_X86_ASM @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \ main/streaming-load-memcpy.c libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 +libmesa_sse2_la_SOURCES = \ + main/x86/sse2_clamping.c +libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2 + pkgconfigdir = $(libdir)/pkgconfig pkgconfig_DATA = gl.pc diff --git a/src/mesa/main/x86/sse2_clamping.c b/src/mesa/main/x86/sse2_clamping.c new file mode 100644 index 000..7df1c85 --- /dev/null +++ b/src/mesa/main/x86/sse2_clamping.c @@ -0,0 +1,103 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com + * + */ + +#ifdef __SSE2__ +#include main/macros.h +#include main/x86/sse2_clamping.h +#include emmintrin.h + +/** + * Clamp four float values to [min,max] + */ +static inline void +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min, + const float max) +{ + __m128 operand, minval, maxval; + + operand = _mm_loadu_ps(src); + minval = _mm_set1_ps(min); + maxval = _mm_set1_ps(max); + operand = _mm_max_ps(operand, minval); + operand = _mm_min_ps(operand, maxval); + _mm_storeu_ps(result, operand); +} + + +/* Clamp n amount float rgba pixels to [min,max] using SSE2 + */ +void +_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4], + GLfloat rgba_dst[][4], const GLfloat min, + const GLfloat max) +{ + int i; + + for (i = 0; i n; i++) { + _mesa_clamp_float_rgba(rgba_src[i], rgba_dst[i], min, max); + } +} + + +/* Clamp n amount float rgba pixels to [min,max] using SSE2 and apply + * scaling and mapping to components. + * + * this replace handling of [RGBA] channels: + * rgba_temp[RCOMP] = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F); + * rgba[i][RCOMP] = rMap[F_TO_I(rgba_temp[RCOMP] * scale[RCOMP])]; + */ +void +_mesa_clamp_float_rgba_scale_and_map(const GLuint n, GLfloat rgba_src[][4], + GLfloat rgba_dst[][4], const GLfloat min, + const GLfloat max, + const GLfloat scale[4], + const GLfloat* rMap, const GLfloat* gMap, + const GLfloat* bMap, const GLfloat* aMap) +{ + int i; + GLfloat __attribute__((aligned(16))) temp[4]; + __m128 *operand = (__m128*) temp, multiplier, mmove; + __m128i truncated_integers; + + const unsigned int* map_p = (const unsigned int*) truncated_integers; + + multiplier = _mm_loadu_ps(scale); + + for(i = 0; i n; i++) { + _mesa_clamp_float_rgba(rgba_src[i], temp, min, max); + + *operand = _mm_mul_ps(multiplier, *operand); + truncated_integers = _mm_cvttps_epi32(*operand); + mmove = _mm_set_ps(aMap[map_p[ACOMP]], bMap[map_p[BCOMP]], + gMap[map_p[GCOMP]], rMap[map_p[RCOMP]] ); + +