On Thu, 2014-10-23 at 09:20 -0700, Matt Turner wrote: > On Thu, Oct 23, 2014 at 2:13 AM, Timothy Arceri < > t_arc...@yahoo.com.au> wrote: > > On Wed, 2014-10-22 at 22:49 -0700, Matt Turner wrote: > > > On Wed, Oct 22, 2014 at 10:30 PM, Matt Turner <matts...@gmail.com > > > > wrote: > > > > On Wed, Oct 22, 2014 at 9:02 PM, Timothy Arceri < > > > > t_arc...@yahoo.com.au> wrote: > > > > > I almost wasn't going to bother sending this out since it > > > > > uses SSE4.1 > > > > > and its recommended to use glDrawRangeElements anyway. But > > > > > since these games > > > > > are still ofter used for benchmarking I thought I'd see if > > > > > anyone is > > > > > interested in this. I only optimised GL_UNSIGNED_INT as that > > > > > was the > > > > > only place these games were hitting but I guess it wouldn't > > > > > hurt > > > > > to optimse the other cases too. > > > > > > > > I think it's kind of neat! > > > > > > > > It might also be fun to try to do this with OpenMP. OpenMP 3.1 > > > > (supported since gcc-4.7) supports min/max reduction operators. > > > > I've never really looked into OpenMP before, but very cool :) > > > > It seems simd support wasn't added until 4.0 (gcc-4.9) so using 3.1 > > would require threading. Probably best just to go with 4.0. > > Oh, that's unfortunate. I didn't notice because I'm using 4.9.1 and > was too preoccupied with finding out when min/max reductions had been > added. > > > > I think all you'd need to do for that is to add this pragma > > > immediately before the for loop in vbo_exec_array.c: > > > > > > #if _OPENMP > ... (have to figure out the date for OMP 3.1) > > > #pragma omp simd reduction(max:max_ui) reduction(min:min_ui). > > > #endif > > > > > > and then change the inner loop to use ternary for min/max: > > > > > > max_ui = ui_indices[i] > max_ui ? ui_indices[i] : max_ui; > > > min_ui = ui_indices[i] < min_ui ? ui_indices[i] : min_ui; > > > > > > I tested it with a little function and confirmed that it > > > generates > > > SSE4.1/AVX2 instructions (and even a bunch of SSE2 instructions > > > when > > > 4.1 isn't available!) depending on the -march= value I pass. > > > > I assume this means there isn't a way to tell OpenMP to build > > multiple > > versions and select the best one at runtime, so distros would > > always > > just ship SSE2? Anyway I'm going to give the SSE2 code a run on my > > (6 > > year old) desktop and see how it performs. I will also compare it > > to my > > SSE4.1 code on my laptop maybe it won't be to big of a difference. > > I couldn't find a way. :( > > I suspect the SSE 4.1 path you proposed will be the best solution > since we can use it with runtime detection. We might also simply try > using OpenMP in the sse_minmax.c file, since it'll be built with > -msse4.1 and seeing how the generated code compares. > > While on x86-64 we can at least assume SSE 2, we can't make any > assumptions on 32-bit, which most games still are. It doesn't hurt to have a compile-time option/detection though, not everybody uses generic code on random computers, that is pre-compiled binary distributions.
signature.asc
Description: This is a digitally signed message part
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev