Hello. I'd get rid of "_mm_set1_ps" inside "_mesa_clamp_float_rgba" by
passing _m128 version of min/max directly, so "_mm_set1_ps" will be
moved out of the for loop.
I'd also unroll the "_mesa_streaming_clamp_float_rgba" loop to minimize
the loop overhead (and utilize out of order execution as a bonus),
because nothing compute intensive is happening there. You can also use
prefetching (_mm_prefetch) there to improve performance by reading data
ahead from memory.
Best regards,
Siavash Eliasi.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev