Am 19.02.2016 um 11:57 schrieb Iago Toral: > On Thu, 2016-02-18 at 16:13 -0800, Ian Romanick wrote: >> On 02/18/2016 01:47 PM, Ian Romanick wrote: >>> On 02/18/2016 08:44 AM, Neil Roberts wrote: >>>> I made a pathological test case (attached) which repeatedly renders into >>>> an MSAA FBO and then blits it to the screen and measures the framerate. >>>> It checks it with a range of different sample counts. The rendering is >>>> done either by rendering two triangles to fill the framebuffer or by >>>> calling glClear. >>>> >>>> The percentage increase in framerate after applying the patch is like >>>> this: >>>> >>>> With triangles to fill buffer: >>>> >>>> 16 62,27% >>>> 8 48,59% >>>> 4 27,72% >>>> 2 5,34% >>>> 0 0,58% >>>> >>>> With glClear: >>>> >>>> 16 -5,20% >>>> 8 -7,08% >>>> 4 -2,45% >>>> 2 -20,76% >>>> 0 3,71% >>> >>> I find this result interesting. In the samples=0 case, the before and >>> after shaders should be identical. I dug into this a bit, and I think >>> the problem is the previous patch. Using do { ... } while (false) in >>> the macro makes the compiler generate some really, really bad code. >>> Just deleting those two lines makes a huge difference. >>> >>> With do { ... } while (false): >>> >>> SIMD8 shader: 228 instructions. 4 loops. 1554 cycles. 0:0 spills:fills. >>> Promoted 0 constants. Compacted 3648 to 2192 bytes (40%) >>> >>> Without: >>> >>> SIMD8 shader: 159 instructions. 0 loops. 380 cycles. 0:0 spills:fills. >>> Promoted 0 constants. Compacted 2544 to 1600 bytes (37%) >>> SIMD16 shader: 159 instructions. 0 loops. 898 cycles. 0:0 spills:fills. >>> Promoted 0 constants. Compacted 2544 to 1600 bytes (37%) >>> >>> This is for samples=16 and sampler2DMS. The cycle counts are bogus >>> because the 4 loops have their cycle counts statically multiplied by >>> 10... even though the loop will only execute once. >>> >>> So... I'll try some more experiments. I also wonder about changing the >>> SAMPLES > 1 to SAMPLES > 2. >> >> This is weird, and I'm not sure what to make of it. I ran your test at >> a bunch of commits on my branch. Getting rid of gl_FragColor was the >> big win. The GL_EXT_shader_samples_identical patch is still the big >> loss. I'm assuming that causes enough register pressure over the >> previous commit to lose SIMD16. The following patch just removes the >> "do {" and "} while (false)". That gets back some of the losses, >> but not all of it. > > I don't know much about this, but using shader_samples_identical should > only give a benefit if we actually get identical samples, otherwise it > means more work, right? I noticed that the test renders a quad with > random colors for each vertex that will be interpolated across the > region, so could it be that we are hitting few cases of identical > samples? Shouldn't the results be expected to be better if we rendered a > flat color instead?
Unless you run that shader drawing random colors at per-sample frequency (using per-sample interpolation), you still get identical samples (for each pixel). Except at the edges of the quad. (I didn't actually look at the test...) Roland > >> The raw per-commit data is below. >> >> 5c7f974 Android: disable unused-parameter warning >> With triangles to fill buffer: >> >> 8,419.603882 >> 4,633.834045 >> 2,595.628113 >> 0,795.798157 >> >> With glClear: >> >> 8,450.937958 >> 4,594.000610 >> 2,641.354553 >> 0,767.400818 >> >> 89b8c0a meta/blit: Replace the gl_FragColor write >> With triangles to fill buffer: >> >> 8,597.943054 >> 4,803.083862 >> 2,942.862549 >> 0,1129.432983 >> >> With glClear: >> >> 8,643.500671 >> 4,980.392151 >> 2,982.704407 >> 0,1116.569946 >> >> 7025d43 meta/blit: Don't dynamically select the MSAA "merge" function >> With triangles to fill buffer: >> >> 8,599.412598 >> 4,954.745056 >> 2,933.184021 >> 0,1129.305420 >> >> With glClear: >> >> 8,582.106079 >> 4,965.810303 >> 2,975.419434 >> 0,1169.180420 >> >> b895a4d meta/blit: Use a single, master shader template for MSAA-SS blits >> With triangles to fill buffer: >> >> 8,591.786011 >> 4,932.835815 >> 2,910.498047 >> 0,1072.501099 >> >> With glClear: >> >> 8,623.441406 >> 4,952.562378 >> 2,941.442261 >> 0,1081.314819 >> >> 197e4be meta/blit: Use GL_EXT_shader_samples_identical in MSAA-SS resolve >> blit >> With triangles to fill buffer: >> >> 8,559.252869 >> 4,795.355103 >> 2,781.494202 >> 0,941.619568 >> >> With glClear: >> >> 8,577.400513 >> 4,806.321533 >> 2,811.030029 >> 0,957.579224 >> >> ec01613 fix loop nonsense derp >> With triangles to fill buffer: >> >> 8,559.221558 >> 4,788.767944 >> 2,786.287170 >> 0,1078.632324 >> >> With glClear: >> >> 8,575.771545 >> 4,956.205750 >> 2,813.206482 >> 0,1111.111084 >> >>> I will probably also make a patch to fix the damage done by the do { ... >>> } while (false). That's just comedy. >>> >>>> It seems like a pretty convincing win for the triangle case but the >>>> clear case makes it slightly worse. Presumably this is because we don't >>>> do anything to detect the value stored in the MCS buffer when doing a >>>> fast clear so the fast path isn't taken and the shader being more >>>> complicated makes it slower. >>>> >>>> Not sure if we want to try and do anything about that because >>>> potentially the cleared pixels aren't very common in a framebuffer from >>>> a real use case so it might not really matter. >>>> >>>> Currently we don't use SIMD16 for 16x MSAA because we can't allocate the >>>> registers well enough to make it worthwhile. This patch makes that >>>> problem a bit more interesting because even if we end up spilling a lot >>>> it might still be worth doing SIMD16 because the cases where the spilled >>>> instructions are hit would be much less common. >>>> >>>> - Neil >>>> >>>> Ian Romanick <i...@freedesktop.org> writes: >>>> >>>>> From: Ian Romanick <ian.d.roman...@intel.com> >>>>> >>>>> Somewhat surprisingly, this didn't have any affect on performance in the >>>>> benchmarks that Martin tried for me. >>>>> >>>>> Signed-off-by: Ian Romanick <ian.d.roman...@intel.com> >>>>> --- >>>>> src/mesa/drivers/common/meta_blit.c | 10 +++++++++- >>>>> 1 file changed, 9 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/src/mesa/drivers/common/meta_blit.c >>>>> b/src/mesa/drivers/common/meta_blit.c >>>>> index 28aabd3..c0ec51f 100644 >>>>> --- a/src/mesa/drivers/common/meta_blit.c >>>>> +++ b/src/mesa/drivers/common/meta_blit.c >>>>> @@ -530,6 +530,7 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, >>>>> fs_source = ralloc_asprintf(mem_ctx, >>>>> "#version 130\n" >>>>> "#extension >>>>> GL_ARB_texture_multisample: require\n" >>>>> + "#extension >>>>> GL_EXT_shader_samples_identical: enable\n" >>>>> "#define gvec4 %svec4\n" >>>>> "uniform %ssampler2DMS%s >>>>> texSampler;\n" >>>>> "in %s texCoords;\n" >>>>> @@ -569,7 +570,14 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, >>>>> " i%s tc = i%s(texCoords);\n" >>>>> " int i;\n" >>>>> "\n" >>>>> - " for (i = 0; i < SAMPLES; i++)\n" >>>>> + " S[0] = texelFetch(texSampler, >>>>> tc, 0);\n" >>>>> + "#if >>>>> defined(GL_EXT_shader_samples_identical) && SAMPLES > 1\n" >>>>> + " if >>>>> (textureSamplesIdenticalEXT(texSampler, tc)) {\n" >>>>> + " emit2(S[0]);\n" >>>>> + " return;\n" >>>>> + " }\n" >>>>> + "#endif\n" >>>>> + " for (i = 1; i < SAMPLES; i++)\n" >>>>> " S[i] = >>>>> texelFetch(texSampler, tc, i);\n" >>>>> "\n" >>>>> " REDUCE(s16, s32);\n" >>>>> -- >>>>> 2.5.0 >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >> >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev