Since we have SIMD32 support available for fragment shaders, it would be nice to actually enable them. The changes proposed here are not meant as the final solution to SIMD32 selection by any means, they're meant to be a way to enable SIMD32 in case a customer absolutely needs them to be enabled for performance before we actually have a proper heuristic in place. The heuristic is mainly trying to limit regressions.
These heuristics look at a couple of things to make a choice regarding SIMD32 shaders. 1) Number of enabled MRTs 2) Number of grouped texture fetches 3) Instruction count ratio between SIMD16 and SIMD32 Reasons being, multiple writes tends to trash the render cache, multiple grouped texture fetches tend to trash the sampler and L3 caches, and with these things being equal SIMD32 usually tends to still perform better or equally well, as long as it can compensate for latency, even if it has a bit more instructions than its SIMD16 counterpart. A proper heuristic would be looking at whether the shader *actually* can compensate for latency in any way, which requires some integration to the scheduler. But as of at this moment, the scheduler reports kind of weird numbers for the cycle counts. To alleviate problems regarding SIMD32, the scheduler should also try to schedule texture fetches in smaller groups in general. The default values have been tweaked in a way that we most of the time get benefits and not a lot of regressions from enabling SIMD32. In my runs, mostly with BXT, the biggest boosts and regressions are as follows: +38.5% in GLBench5 ALU2 -7.1% in GLBenchmark fill test Depending on the platform, the results may differ, SKL both regresses and gains less, BSW regresses more and gains less than BXT. As this is an experimental patch, it is not on by default but has to be enabled via INTEL_DEBUG, just like forcing SIMD32 on. Further more, the different mechanisms of the heuristic can be controlled via environment variables/drirc. Toni Lönnberg (7): i965: SIMD32 heuristics debug flag i965: SIMD32 heuristics control data i965: SIMD32 heuristics control data from drirc mesa: Helper functions for counting set bits in a mask i965/fs: Save the instruction count of each dispatch width i965/fs: SIMD32 selection heuristic based on grouped texture fetches i965/fs: Enable all SIMD32 heuristics src/intel/common/gen_debug.c | 1 + src/intel/common/gen_debug.h | 3 +- src/intel/compiler/brw_compiler.h | 11 ++++++ src/intel/compiler/brw_fs.cpp | 63 +++++++++++++++++++++++++++++--- src/intel/compiler/brw_fs.h | 4 ++ src/intel/compiler/brw_fs_generator.cpp | 12 ++++++ src/mesa/drivers/dri/i965/brw_context.c | 13 +++++++ src/mesa/drivers/dri/i965/intel_screen.c | 27 ++++++++++++++ src/util/bitscan.h | 25 +++++++++++++ 9 files changed, 152 insertions(+), 7 deletions(-) -- 2.7.4 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev