Am 27.08.2013 23:52, schrieb Vadim Girlin: > On 08/28/2013 12:43 AM, Marek Olšák wrote: >> Shader variants are BAD, BAD, BAD. Have you ever played an AAA game >> with a Mesa driver that likes to compile shader variants on first use? >> It's HORRIBLE. > > I don't think that shader variants are bad, but it's definitely bad when > we are compiling variants that are never used. Currently glxgears > compiles 18 ps/vs shaders. In my branch with initial GS support [1] I > switched handling of the shaders to deferred compilation, that is, > shaders are compiled only before the actual draw. I found later that > it's not really required for GS, but IIRC this change results in only 5 > shaders being compiled for glxgears instead of 18. It seems most of the > useless variants are results of state changes between creation of the > shader state (initial compilation) and actual draw call. > > I had some concerns about increased overhead with those changes, and > it's actually noticeable with drawoverhead demo, but I didn't see any > regressions with a few real apps that I tested, e.g. glxgears even > showed slightly better performance with these changes. Probably I also > implemented it in a not very optimal way (I was mostly concentrated on > GS support) and the overhead can be reduced. > > One more thing is duplicate shaders, I've analyzed shader dumps from > Unigine Heaven 3.0 some time ago and found that from about 320 compiled > shaders, only about 180 (50%) were unique, others were duplicates > (detected by comparing the bytecode dumps for them in an automated way), > maybe they had different shader keys (which still resulted in the same > bytecode), but I suspect duplicate pipe shaders were also involved. > Unfortunately I didn't have a time to investigate it more thoroughly > since then. > > So my point is that we don't really need to eliminate shader variants, > first we need to eliminate compilation of unused variants and duplicate > shaders. Also we might want to consider offloading of the compilation to > separate thread(s) and caching of shader binaries between runs.
Hmm ok that seems a way more complicated problem than I thought :-). Compile early and you might compile variants you will never use, compile late and the delay might be noticeable. I just thought it might be unlikely you'd actually need two variants - e.g. some depth exporting shader is probably unlikely to use alpha test. But ok I guess it shouldn't write color in this case, so even then it might never be worth bothering. Was just a random idea ;-). Roland > > Vadim > > [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders > >> >> What the patch does is probably the right solution. At least >> alpha-test state changes don't cause shader recompilation and >> re-binding, which also negatively affects performance. Ideally we >> shouldn't depend on the framebuffer state at all, but we need to >> emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we >> should always be fine with key.nr_cbufs forced to 8 for any shader >> without that property. I expect app developers to do the right thing >> and not write outputs they don't need. >> >> Marek >> >> On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger >> <srol...@vmware.com> wrote: >>> Not that I'm qualified to review r600 code, but couldn't you create >>> different shader variants depending on whether you need alpha test? At >>> least I would assume shader exports aren't free. >>> >>> Roland >>> >>> Am 27.08.2013 19:56, schrieb Vadim Girlin: >>>> We need to export at least one color if the shader writes it, >>>> even when nr_cbufs==0. >>>> >>>> Signed-off-by: Vadim Girlin <vadimgir...@gmail.com> >>>> --- >>>> >>>> Tested on evergreen with multiple combinations of backends - no >>>> regressions, >>>> fixes some tests: >>>> >>>> default - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff >>>> default+sb - fixes fb-alphatest-nocolor and fb_alphatest-nocolor-ff >>>> llvm - fixes about 25 tests related to depth/stencil >>>> llvm+sb - fixes about 300 tests (llvm's depth/stencil issues and >>>> regressions cased by reordering of exports in sb) >>>> >>>> With this patch, there are no regressions with default+sb vs default. >>>> There is one regression with llvm+sb vs llvm - >>>> fs-texturegrad-miplevels, >>>> AFAICS it's a problem with llvm backend uncovered by sb - >>>> SET_GRADIENTS_V/H >>>> instructions are not placed in the same TEX clause with >>>> corresponding SAMPLE_G. >>>> >>>> src/gallium/drivers/r600/r600_shader.c | 7 ++++--- >>>> 1 file changed, 4 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/src/gallium/drivers/r600/r600_shader.c >>>> b/src/gallium/drivers/r600/r600_shader.c >>>> index 300b5c4..f7eab76 100644 >>>> --- a/src/gallium/drivers/r600/r600_shader.c >>>> +++ b/src/gallium/drivers/r600/r600_shader.c >>>> @@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct >>>> r600_screen *rscreen, >>>> unsigned opcode; >>>> int i, j, k, r = 0; >>>> int next_pos_base = 60, next_param_base = 0; >>>> + int max_color_exports = MAX2(key.nr_cbufs, 1); >>>> /* Declarations used by llvm code */ >>>> bool use_llvm = false; >>>> bool indirect_gprs; >>>> @@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct >>>> r600_screen *rscreen, >>>> radeon_llvm_ctx.face_gpr = ctx.face_gpr; >>>> radeon_llvm_ctx.r600_inputs = ctx.shader->input; >>>> radeon_llvm_ctx.r600_outputs = ctx.shader->output; >>>> - radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs >>>> , 1); >>>> + radeon_llvm_ctx.color_buffer_count = max_color_exports; >>>> radeon_llvm_ctx.chip_class = ctx.bc->chip_class; >>>> radeon_llvm_ctx.fs_color_all = shader->fs_write_all >>>> && (rscreen->chip_class >= EVERGREEN); >>>> radeon_llvm_ctx.stream_outputs = &so; >>>> @@ -1440,7 +1441,7 @@ static int r600_shader_from_tgsi(struct >>>> r600_screen *rscreen, >>>> case TGSI_PROCESSOR_FRAGMENT: >>>> if (shader->output[i].name == >>>> TGSI_SEMANTIC_COLOR) { >>>> /* never export more colors than the >>>> number of CBs */ >>>> - if (shader->output[i].sid >= >>>> key.nr_cbufs) { >>>> + if (shader->output[i].sid >= >>>> max_color_exports) { >>>> /* skip export */ >>>> j--; >>>> continue; >>>> @@ -1450,7 +1451,7 @@ static int r600_shader_from_tgsi(struct >>>> r600_screen *rscreen, >>>> output[j].type = >>>> V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PIXEL; >>>> shader->nr_ps_color_exports++; >>>> if (shader->fs_write_all && >>>> (rscreen->chip_class >= EVERGREEN)) { >>>> - for (k = 1; k < key.nr_cbufs; >>>> k++) { >>>> + for (k = 1; k < >>>> max_color_exports; k++) { >>>> j++; >>>> memset(&output[j], 0, >>>> sizeof(struct r600_bytecode_output)); >>>> output[j].gpr = >>>> shader->output[i].gpr; >>>> >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev