Yeah, st/mesa also compiles shaders on the first use, so we've got 3 places to fix: Wine, st/mesa, the driver.
Marek On Wed, Aug 28, 2013 at 2:07 AM, Vadim Girlin <vadimgir...@gmail.com> wrote: > On 08/28/2013 02:59 AM, Marek Olšák wrote: >> >> First, you won't really see any significant continual difference in >> frame rate no matter how many shader variants you have unless you are >> very CPU-bound. The problem is shader compilation on the first use, >> that's where you get a big hiccup. Try Skyrim for example: You have to >> first look around and see every object that's around you and get >> unpleasant stuttering before you can actually go on and play the game. >> Yes, this also Wine's fault that it compiles shaders on the first use >> too, but we don't have to be as bad as Wine, do we? Valve also >> reported shader recompilations on the first use being a serious issue >> with open source drivers. > > > I perfectly understand that deferred compilation is exactly the problem that > makes the games freeze due to shader compilation on first use when something > new appears on the screen, but I don't think we can solve this problem in > the *driver* by trying to compile early, because AFAICS currently the > shaders are passed to the driver too late anyway, and this happens not only > with wine. E.g. when I run Heaven in a window with "MESA_GLSL=dump > R600_DEBUG=ps,vs", so that I can see Heaven's window and console output at > the same time, what I see is that most of GL dumps happen while Heaven shows > splash screen with loading progress, but most of the driver's dumps appear > on the first frame and few more times during benchmark. It looks like > compilation is deferred somewhere in the stack before the driver, or am I > missing something? > > Vadim > > > >> >> Marek >> >> On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin <vadimgir...@gmail.com> >> wrote: >>> >>> On 08/28/2013 12:43 AM, Marek Olšák wrote: >>>> >>>> >>>> Shader variants are BAD, BAD, BAD. Have you ever played an AAA game >>>> with a Mesa driver that likes to compile shader variants on first use? >>>> It's HORRIBLE. >>> >>> >>> >>> I don't think that shader variants are bad, but it's definitely bad when >>> we >>> are compiling variants that are never used. Currently glxgears compiles >>> 18 >>> ps/vs shaders. In my branch with initial GS support [1] I switched >>> handling >>> of the shaders to deferred compilation, that is, shaders are compiled >>> only >>> before the actual draw. I found later that it's not really required for >>> GS, >>> but IIRC this change results in only 5 shaders being compiled for >>> glxgears >>> instead of 18. It seems most of the useless variants are results of state >>> changes between creation of the shader state (initial compilation) and >>> actual draw call. >>> >>> I had some concerns about increased overhead with those changes, and it's >>> actually noticeable with drawoverhead demo, but I didn't see any >>> regressions >>> with a few real apps that I tested, e.g. glxgears even showed slightly >>> better performance with these changes. Probably I also implemented it in >>> a >>> not very optimal way (I was mostly concentrated on GS support) and the >>> overhead can be reduced. >>> >>> One more thing is duplicate shaders, I've analyzed shader dumps from >>> Unigine >>> Heaven 3.0 some time ago and found that from about 320 compiled shaders, >>> only about 180 (50%) were unique, others were duplicates (detected by >>> comparing the bytecode dumps for them in an automated way), maybe they >>> had >>> different shader keys (which still resulted in the same bytecode), but I >>> suspect duplicate pipe shaders were also involved. Unfortunately I didn't >>> have a time to investigate it more thoroughly since then. >>> >>> So my point is that we don't really need to eliminate shader variants, >>> first >>> we need to eliminate compilation of unused variants and duplicate >>> shaders. >>> Also we might want to consider offloading of the compilation to separate >>> thread(s) and caching of shader binaries between runs. >>> >>> Vadim >>> >>> [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders >>> >>> >>>> >>>> What the patch does is probably the right solution. At least >>>> alpha-test state changes don't cause shader recompilation and >>>> re-binding, which also negatively affects performance. Ideally we >>>> shouldn't depend on the framebuffer state at all, but we need to >>>> emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we >>>> should always be fine with key.nr_cbufs forced to 8 for any shader >>>> without that property. I expect app developers to do the right thing >>>> and not write outputs they don't need. >>>> >>>> Marek >>>> >>>> On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger <srol...@vmware.com> >>>> wrote: >>>>> >>>>> >>>>> Not that I'm qualified to review r600 code, but couldn't you create >>>>> different shader variants depending on whether you need alpha test? At >>>>> least I would assume shader exports aren't free. >>>>> >>>>> Roland >>>>> >>>>> Am 27.08.2013 19:56, schrieb Vadim Girlin: >>>>>> >>>>>> >>>>>> We need to export at least one color if the shader writes it, >>>>>> even when nr_cbufs==0. >>>>>> >>>>>> Signed-off-by: Vadim Girlin <vadimgir...@gmail.com> >>>>>> --- >>>>>> >>>>>> Tested on evergreen with multiple combinations of backends - no >>>>>> regressions, >>>>>> fixes some tests: >>>>>> >>>>>> default - fixes fb-alphatest-nocolor and >>>>>> fb_alphatest-nocolor-ff >>>>>> default+sb - fixes fb-alphatest-nocolor and >>>>>> fb_alphatest-nocolor-ff >>>>>> llvm - fixes about 25 tests related to depth/stencil >>>>>> llvm+sb - fixes about 300 tests (llvm's depth/stencil issues >>>>>> and >>>>>> regressions cased by reordering of exports in sb) >>>>>> >>>>>> With this patch, there are no regressions with default+sb vs default. >>>>>> There is one regression with llvm+sb vs llvm - >>>>>> fs-texturegrad-miplevels, >>>>>> AFAICS it's a problem with llvm backend uncovered by sb - >>>>>> SET_GRADIENTS_V/H >>>>>> instructions are not placed in the same TEX clause with corresponding >>>>>> SAMPLE_G. >>>>>> >>>>>> src/gallium/drivers/r600/r600_shader.c | 7 ++++--- >>>>>> 1 file changed, 4 insertions(+), 3 deletions(-) >>>>>> >>>>>> diff --git a/src/gallium/drivers/r600/r600_shader.c >>>>>> b/src/gallium/drivers/r600/r600_shader.c >>>>>> index 300b5c4..f7eab76 100644 >>>>>> --- a/src/gallium/drivers/r600/r600_shader.c >>>>>> +++ b/src/gallium/drivers/r600/r600_shader.c >>>>>> @@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(struct >>>>>> r600_screen >>>>>> *rscreen, >>>>>> unsigned opcode; >>>>>> int i, j, k, r = 0; >>>>>> int next_pos_base = 60, next_param_base = 0; >>>>>> + int max_color_exports = MAX2(key.nr_cbufs, 1); >>>>>> /* Declarations used by llvm code */ >>>>>> bool use_llvm = false; >>>>>> bool indirect_gprs; >>>>>> @@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct >>>>>> r600_screen *rscreen, >>>>>> radeon_llvm_ctx.face_gpr = ctx.face_gpr; >>>>>> radeon_llvm_ctx.r600_inputs = ctx.shader->input; >>>>>> radeon_llvm_ctx.r600_outputs = ctx.shader->output; >>>>>> - radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , >>>>>> 1); >>>>>> + radeon_llvm_ctx.color_buffer_count = max_color_exports; >>>>>> radeon_llvm_ctx.chip_class = ctx.bc->chip_class; >>>>>> radeon_llvm_ctx.fs_color_all = shader->fs_write_all && >>>>>> (rscreen->chip_class >= EVERGREEN); >>>>>> radeon_llvm_ctx.stream_outputs = &so; >>>>>> @@ -1440,7 +1441,7 @@ static int r600_shader_from_tgsi(struct >>>>>> r600_screen *rscreen, >>>>>> case TGSI_PROCESSOR_FRAGMENT: >>>>>> if (shader->output[i].name == >>>>>> TGSI_SEMANTIC_COLOR) { >>>>>> /* never export more colors than the >>>>>> number of CBs */ >>>>>> - if (shader->output[i].sid >= >>>>>> key.nr_cbufs) >>>>>> { >>>>>> + if (shader->output[i].sid >= >>>>>> max_color_exports) { >>>>>> /* skip export */ >>>>>> j--; >>>>>> continue; >>>>>> @@ -1450,7 +1451,7 @@ static int r600_shader_from_tgsi(struct >>>>>> r600_screen *rscreen, >>>>>> output[j].type = >>>>>> V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PIXEL; >>>>>> shader->nr_ps_color_exports++; >>>>>> if (shader->fs_write_all && >>>>>> (rscreen->chip_class >= EVERGREEN)) { >>>>>> - for (k = 1; k < key.nr_cbufs; >>>>>> k++) >>>>>> { >>>>>> + for (k = 1; k < >>>>>> max_color_exports; >>>>>> k++) { >>>>>> j++; >>>>>> memset(&output[j], 0, >>>>>> sizeof(struct r600_bytecode_output)); >>>>>> output[j].gpr = >>>>>> shader->output[i].gpr; >>>>>> >>>>> _______________________________________________ >>>>> mesa-dev mailing list >>>>> mesa-dev@lists.freedesktop.org >>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >>>> >>>> >>>> _______________________________________________ >>>> mesa-dev mailing list >>>> mesa-dev@lists.freedesktop.org >>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >>>> >>> >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev