Hello. I'd get rid of "_mm_set1_ps" inside "_mesa_clamp_float_rgba" by passing _m128 version of min/max directly, so "_mm_set1_ps" will be moved out of the for loop.

I'd also unroll the "_mesa_streaming_clamp_float_rgba" loop to minimize the loop overhead (and utilize out of order execution as a bonus), because nothing compute intensive is happening there. You can also use prefetching (_mm_prefetch) there to improve performance by reading data ahead from memory.

Best regards,
Siavash Eliasi.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to