Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote: On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi siavashser...@gmail.com wrote: Then I do recommend removing the if (cpu_has_sse4_1) from this patch and similar places, because there is no runtime CPU dispatching happening for SSE optimized code paths in action and just adds extra overhead (unnecessary branches) to the generated code. No. Sorry, I realize I misread your previous question: I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you to build the code and then use it only on systems that actually support it. All of this could have been pretty easily answered by a few greps though... I wonder what difference it would make to have an option to compile out the run-time check code to avoid the additional overhead in cases where the builder *knows* at compile time what the run-time system is? (ie Gentoo) signature.asc Description: This is a digitally signed message part ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On Fri, 2014-11-07 at 11:44 +, Steven Newbury wrote: On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote: On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi siavashser...@gmail.com wrote: Then I do recommend removing the if (cpu_has_sse4_1) from this patch and similar places, because there is no runtime CPU dispatching happening for SSE optimized code paths in action and just adds extra overhead (unnecessary branches) to the generated code. No. Sorry, I realize I misread your previous question: I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you to build the code and then use it only on systems that actually support it. All of this could have been pretty easily answered by a few greps though... I wonder what difference it would make to have an option to compile out the run-time check code to avoid the additional overhead in cases where the builder *knows* at compile time what the run-time system is? (ie Gentoo) As long as the check is placed in the right location it shouldn't really make a noticeable difference. i.e. just outside the hotspot and not inside it. Things that will have more impact is not being able to inline certain code such as in the latest patchset I sent out. It seems this is another side effect the way gcc handles intrinsics. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On 11/07/2014 03:14 PM, Steven Newbury wrote: On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote: On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi siavashser...@gmail.com wrote: Then I do recommend removing the if (cpu_has_sse4_1) from this patch and similar places, because there is no runtime CPU dispatching happening for SSE optimized code paths in action and just adds extra overhead (unnecessary branches) to the generated code. No. Sorry, I realize I misread your previous question: I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you to build the code and then use it only on systems that actually support it. All of this could have been pretty easily answered by a few greps though... I wonder what difference it would make to have an option to compile out the run-time check code to avoid the additional overhead in cases where the builder *knows* at compile time what the run-time system is? (ie Gentoo) I think that's possible. Since cpu_has_sse4_1 and friends are simply macros, one can set them to true or 1 during compile time if it's going to be built for an SSE 4.1 capable target so your smart compiler will totally get rid of the unnecessary runtime check. I guess common_x86_features.h should be modified to something like this: #ifdef __SSE4_1__ #define cpu_has_sse4_1 1 #else #define cpu_has_sse4_1(_mesa_x86_cpu_features X86_FEATURE_SSE4_1) #endif ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On Fri Nov 7 14:09:09 2014 GMT, Siavash Eliasi wrote: On 11/07/2014 03:14 PM, Steven Newbury wrote: On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote: On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi siavashser...@gmail.com wrote: Then I do recommend removing the if (cpu_has_sse4_1) from this patch and similar places, because there is no runtime CPU dispatching happening for SSE optimized code paths in action and just adds extra overhead (unnecessary branches) to the generated code. No. Sorry, I realize I misread your previous question: I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you to build the code and then use it only on systems that actually support it. All of this could have been pretty easily answered by a few greps though... I wonder what difference it would make to have an option to compile out the run-time check code to avoid the additional overhead in cases where the builder *knows* at compile time what the run-time system is? (ie Gentoo) I think that's possible. Since cpu_has_sse4_1 and friends are simply macros, one can set them to true or 1 during compile time if it's going to be built for an SSE 4.1 capable target so your smart compiler will totally get rid of the unnecessary runtime check. I guess common_x86_features.h should be modified to something like this: #ifdef __SSE4_1__ #define cpu_has_sse4_1 1 #else #define cpu_has_sse4_1(_mesa_x86_cpu_features X86_FEATURE_SSE4_1) #endif Yes, this was what I was thinking. Then perhaps an option for disabling run-time detection, with the available cpu features then determined during configuration setting appropriate defines. Whether it's worth it I don't know. I can imagine the compiler having an easier job optimizing the code. -- ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On 11/07/2014 06:09 AM, Siavash Eliasi wrote: On 11/07/2014 03:14 PM, Steven Newbury wrote: On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote: On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi siavashser...@gmail.com wrote: Then I do recommend removing the if (cpu_has_sse4_1) from this patch and similar places, because there is no runtime CPU dispatching happening for SSE optimized code paths in action and just adds extra overhead (unnecessary branches) to the generated code. No. Sorry, I realize I misread your previous question: I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you to build the code and then use it only on systems that actually support it. All of this could have been pretty easily answered by a few greps though... I wonder what difference it would make to have an option to compile out the run-time check code to avoid the additional overhead in cases where the builder *knows* at compile time what the run-time system is? (ie Gentoo) I think that's possible. Since cpu_has_sse4_1 and friends are simply macros, one can set them to true or 1 during compile time if it's going to be built for an SSE 4.1 capable target so your smart compiler will totally get rid of the unnecessary runtime check. I guess common_x86_features.h should be modified to something like this: #ifdef __SSE4_1__ #define cpu_has_sse4_1 1 #else #define cpu_has_sse4_1(_mesa_x86_cpu_features X86_FEATURE_SSE4_1) #endif I was thinking about doing something similar for cpu_has_xmm and cpu_has_xmm2 for x64. SSE and SSE2 are required parts of that instruction set, so they're always there. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
Then I do recommend removing the if (cpu_has_sse4_1) from this patch and similar places, because there is no runtime CPU dispatching happening for SSE optimized code paths in action and just adds extra overhead (unnecessary branches) to the generated code. Same must be applied to these patches: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy http://lists.freedesktop.org/archives/mesa-dev/2014-November/070256.html Best regards, Siavash Eliasi. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On 11/07/2014 07:31 PM, Ian Romanick wrote: On 11/07/2014 06:09 AM, Siavash Eliasi wrote: On 11/07/2014 03:14 PM, Steven Newbury wrote: On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote: On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi siavashser...@gmail.com wrote: Then I do recommend removing the if (cpu_has_sse4_1) from this patch and similar places, because there is no runtime CPU dispatching happening for SSE optimized code paths in action and just adds extra overhead (unnecessary branches) to the generated code. No. Sorry, I realize I misread your previous question: I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you to build the code and then use it only on systems that actually support it. All of this could have been pretty easily answered by a few greps though... I wonder what difference it would make to have an option to compile out the run-time check code to avoid the additional overhead in cases where the builder *knows* at compile time what the run-time system is? (ie Gentoo) I think that's possible. Since cpu_has_sse4_1 and friends are simply macros, one can set them to true or 1 during compile time if it's going to be built for an SSE 4.1 capable target so your smart compiler will totally get rid of the unnecessary runtime check. I guess common_x86_features.h should be modified to something like this: #ifdef __SSE4_1__ #define cpu_has_sse4_1 1 #else #define cpu_has_sse4_1(_mesa_x86_cpu_features X86_FEATURE_SSE4_1) #endif I was thinking about doing something similar for cpu_has_xmm and cpu_has_xmm2 for x64. SSE and SSE2 are required parts of that instruction set, so they're always there. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev I can come up with a patch implementing the same for SSE, SSE2, SSE3 and SSSE3 if current approach is fine by you. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On 29.10.2014 14:05, Timothy Arceri wrote: Makes use of SSE to speed up compute of min and max elements Callgrind cpu usage results from pts benchmarks: Openarena 0.8.8: 3.67% - 1.03% UrbanTerror: 2.36% - 0.81% V5: - actually make use of the optimisation in android (Emil Velikov) - set a better array size limit for using SSE and added TODO V4: - fixed bugs with incrementing pointer and updating counters V3: - Removed sse_minmax.c from Makefile.sources - handle the first few values without SSE until the pointer is aligned and use _mm_load_si128 rather than _mm_loadu_si128 - guard the call to the SSE code better at build time V2: - removed GL* types - use _mm_store_si128() rather than _mm_store_ps() - add runtime check for SSE - use aligned attribute for local mix/max - bunch of tidyups Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- src/mesa/Android.libmesa_dricore.mk | 8 ++- src/mesa/Android.libmesa_st_mesa.mk | 5 ++ src/mesa/Makefile.am| 3 +- src/mesa/main/sse_minmax.c | 97 + src/mesa/main/sse_minmax.h | 30 src/mesa/vbo/vbo_exec_array.c | 14 -- 6 files changed, 152 insertions(+), 5 deletions(-) create mode 100644 src/mesa/main/sse_minmax.c create mode 100644 src/mesa/main/sse_minmax.h diff --git a/src/mesa/Android.libmesa_dricore.mk b/src/mesa/Android.libmesa_dricore.mk index 1e6d948..2ab593d 100644 --- a/src/mesa/Android.libmesa_dricore.mk +++ b/src/mesa/Android.libmesa_dricore.mk @@ -51,10 +51,16 @@ endif # MESA_ENABLE_ASM ifeq ($(ARCH_X86_HAVE_SSE4_1),true) LOCAL_SRC_FILES += \ - $(SRCDIR)main/streaming-load-memcpy.c + $(SRCDIR)main/streaming-load-memcpy.c \ + $(SRCDIR)main/sse_minmax.c LOCAL_CFLAGS := -msse4.1 endif +ifeq ($(ARCH_X86_HAVE_SSE4_1),true) +LOCAL_CFLAGS += \ + -DUSE_SSE41 +endif + LOCAL_C_INCLUDES := \ $(call intermediates-dir-for STATIC_LIBRARIES,libmesa_program,,) \ $(MESA_TOP)/src \ diff --git a/src/mesa/Android.libmesa_st_mesa.mk b/src/mesa/Android.libmesa_st_mesa.mk index 8b8d652..618d6bf 100644 --- a/src/mesa/Android.libmesa_st_mesa.mk +++ b/src/mesa/Android.libmesa_st_mesa.mk @@ -48,6 +48,11 @@ ifeq ($(TARGET_ARCH),x86) endif # x86 endif # MESA_ENABLE_ASM +ifeq ($(ARCH_X86_HAVE_SSE4_1),true) +LOCAL_CFLAGS := \ + -DUSE_SSE41 +endif + LOCAL_C_INCLUDES := \ $(call intermediates-dir-for STATIC_LIBRARIES,libmesa_program,,) \ $(MESA_TOP)/src/gallium/auxiliary \ diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am index e71bccb..932db4f 100644 --- a/src/mesa/Makefile.am +++ b/src/mesa/Makefile.am @@ -151,7 +151,8 @@ libmesagallium_la_LIBADD = \ $(ARCH_LIBS) libmesa_sse41_la_SOURCES = \ - main/streaming-load-memcpy.c + main/streaming-load-memcpy.c \ + main/sse_minmax.c libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 pkgconfigdir = $(libdir)/pkgconfig diff --git a/src/mesa/main/sse_minmax.c b/src/mesa/main/sse_minmax.c new file mode 100644 index 000..91a55e5 --- /dev/null +++ b/src/mesa/main/sse_minmax.c @@ -0,0 +1,97 @@ +/* + * Copyright © 2014 Timothy Arceri + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Author: + *Timothy Arceri t_arc...@yahoo.com.au + * + */ + +#ifdef __SSE4_1__ +#include main/sse_minmax.h +#include smmintrin.h +#include stdint.h + +void +_mesa_uint_array_min_max(const unsigned *ui_indices, unsigned *min_index, + unsigned *max_index, const unsigned count) +{ + unsigned max_ui = 0; + unsigned min_ui = ~0U; + unsigned i = 0; + unsigned aligned_count = count; + + /* handle the first few values without SSE until the pointer is aligned */ + while (((uintptr_t)ui_indices 15) aligned_count) { + if
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On Wed, Nov 5, 2014 at 12:54 PM, Matt Turner matts...@gmail.com wrote: On Wed, Nov 5, 2014 at 12:50 PM, Timothy Arceri t_arc...@yahoo.com.au wrote: There have been quite a few eyes over this now but nobody has given it a reviewed by yet. Would be nice to get it in before the code freeze. Any takers? Yes, I'll make sure that happens. I made a couple of trivial changes to the commit message and added some spaces between __m128i and * in casts and pushed it with review. Thanks! ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
How and when is cpu_has_sse4_1 true? Is it controllable at runtime through setting some environmental variable? or is it set once during startup by detecting CPU features? I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. Best regards, Siavash Eliasi. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On Thu, Nov 6, 2014 at 1:30 AM, Siavash Eliasi siavashser...@gmail.com wrote: How and when is cpu_has_sse4_1 true? Is it controllable at runtime through setting some environmental variable? or is it set once during startup by detecting CPU features? It's actually a macro, but yes, see the end of src/mesa/x86/common_x86.c. It's set by using the CPUID instruction to detect SSE 4.1 capabilities. if (ecx bit_SSE4_1) _mesa_x86_cpu_features |= X86_FEATURE_SSE4_1; I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. Right. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi siavashser...@gmail.com wrote: Then I do recommend removing the if (cpu_has_sse4_1) from this patch and similar places, because there is no runtime CPU dispatching happening for SSE optimized code paths in action and just adds extra overhead (unnecessary branches) to the generated code. No. Sorry, I realize I misread your previous question: I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you to build the code and then use it only on systems that actually support it. All of this could have been pretty easily answered by a few greps though... ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
There have been quite a few eyes over this now but nobody has given it a reviewed by yet. Would be nice to get it in before the code freeze. Any takers? On Wed, 2014-10-29 at 23:05 +1100, Timothy Arceri wrote: Makes use of SSE to speed up compute of min and max elements Callgrind cpu usage results from pts benchmarks: Openarena 0.8.8: 3.67% - 1.03% UrbanTerror: 2.36% - 0.81% V5: - actually make use of the optimisation in android (Emil Velikov) - set a better array size limit for using SSE and added TODO V4: - fixed bugs with incrementing pointer and updating counters V3: - Removed sse_minmax.c from Makefile.sources - handle the first few values without SSE until the pointer is aligned and use _mm_load_si128 rather than _mm_loadu_si128 - guard the call to the SSE code better at build time V2: - removed GL* types - use _mm_store_si128() rather than _mm_store_ps() - add runtime check for SSE - use aligned attribute for local mix/max - bunch of tidyups Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- src/mesa/Android.libmesa_dricore.mk | 8 ++- src/mesa/Android.libmesa_st_mesa.mk | 5 ++ src/mesa/Makefile.am| 3 +- src/mesa/main/sse_minmax.c | 97 + src/mesa/main/sse_minmax.h | 30 src/mesa/vbo/vbo_exec_array.c | 14 -- 6 files changed, 152 insertions(+), 5 deletions(-) create mode 100644 src/mesa/main/sse_minmax.c create mode 100644 src/mesa/main/sse_minmax.h ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On Wed, Nov 5, 2014 at 12:50 PM, Timothy Arceri t_arc...@yahoo.com.au wrote: There have been quite a few eyes over this now but nobody has given it a reviewed by yet. Would be nice to get it in before the code freeze. Any takers? Yes, I'll make sure that happens. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
Makes use of SSE to speed up compute of min and max elements Callgrind cpu usage results from pts benchmarks: Openarena 0.8.8: 3.67% - 1.03% UrbanTerror: 2.36% - 0.81% V5: - actually make use of the optimisation in android (Emil Velikov) - set a better array size limit for using SSE and added TODO V4: - fixed bugs with incrementing pointer and updating counters V3: - Removed sse_minmax.c from Makefile.sources - handle the first few values without SSE until the pointer is aligned and use _mm_load_si128 rather than _mm_loadu_si128 - guard the call to the SSE code better at build time V2: - removed GL* types - use _mm_store_si128() rather than _mm_store_ps() - add runtime check for SSE - use aligned attribute for local mix/max - bunch of tidyups Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- src/mesa/Android.libmesa_dricore.mk | 8 ++- src/mesa/Android.libmesa_st_mesa.mk | 5 ++ src/mesa/Makefile.am| 3 +- src/mesa/main/sse_minmax.c | 97 + src/mesa/main/sse_minmax.h | 30 src/mesa/vbo/vbo_exec_array.c | 14 -- 6 files changed, 152 insertions(+), 5 deletions(-) create mode 100644 src/mesa/main/sse_minmax.c create mode 100644 src/mesa/main/sse_minmax.h diff --git a/src/mesa/Android.libmesa_dricore.mk b/src/mesa/Android.libmesa_dricore.mk index 1e6d948..2ab593d 100644 --- a/src/mesa/Android.libmesa_dricore.mk +++ b/src/mesa/Android.libmesa_dricore.mk @@ -51,10 +51,16 @@ endif # MESA_ENABLE_ASM ifeq ($(ARCH_X86_HAVE_SSE4_1),true) LOCAL_SRC_FILES += \ - $(SRCDIR)main/streaming-load-memcpy.c + $(SRCDIR)main/streaming-load-memcpy.c \ + $(SRCDIR)main/sse_minmax.c LOCAL_CFLAGS := -msse4.1 endif +ifeq ($(ARCH_X86_HAVE_SSE4_1),true) +LOCAL_CFLAGS += \ + -DUSE_SSE41 +endif + LOCAL_C_INCLUDES := \ $(call intermediates-dir-for STATIC_LIBRARIES,libmesa_program,,) \ $(MESA_TOP)/src \ diff --git a/src/mesa/Android.libmesa_st_mesa.mk b/src/mesa/Android.libmesa_st_mesa.mk index 8b8d652..618d6bf 100644 --- a/src/mesa/Android.libmesa_st_mesa.mk +++ b/src/mesa/Android.libmesa_st_mesa.mk @@ -48,6 +48,11 @@ ifeq ($(TARGET_ARCH),x86) endif # x86 endif # MESA_ENABLE_ASM +ifeq ($(ARCH_X86_HAVE_SSE4_1),true) +LOCAL_CFLAGS := \ + -DUSE_SSE41 +endif + LOCAL_C_INCLUDES := \ $(call intermediates-dir-for STATIC_LIBRARIES,libmesa_program,,) \ $(MESA_TOP)/src/gallium/auxiliary \ diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am index e71bccb..932db4f 100644 --- a/src/mesa/Makefile.am +++ b/src/mesa/Makefile.am @@ -151,7 +151,8 @@ libmesagallium_la_LIBADD = \ $(ARCH_LIBS) libmesa_sse41_la_SOURCES = \ - main/streaming-load-memcpy.c + main/streaming-load-memcpy.c \ + main/sse_minmax.c libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 pkgconfigdir = $(libdir)/pkgconfig diff --git a/src/mesa/main/sse_minmax.c b/src/mesa/main/sse_minmax.c new file mode 100644 index 000..91a55e5 --- /dev/null +++ b/src/mesa/main/sse_minmax.c @@ -0,0 +1,97 @@ +/* + * Copyright © 2014 Timothy Arceri + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Author: + *Timothy Arceri t_arc...@yahoo.com.au + * + */ + +#ifdef __SSE4_1__ +#include main/sse_minmax.h +#include smmintrin.h +#include stdint.h + +void +_mesa_uint_array_min_max(const unsigned *ui_indices, unsigned *min_index, + unsigned *max_index, const unsigned count) +{ + unsigned max_ui = 0; + unsigned min_ui = ~0U; + unsigned i = 0; + unsigned aligned_count = count; + + /* handle the first few values without SSE until the pointer is aligned */ + while (((uintptr_t)ui_indices 15) aligned_count) { + if (*ui_indices max_ui) + max_ui = *ui_indices; + if (*ui_indices min_ui) + min_ui = *ui_indices; + + aligned_count--; + ui_indices++; + } + +