On 02/18/2016 08:44 AM, Neil Roberts wrote: > I made a pathological test case (attached) which repeatedly renders into > an MSAA FBO and then blits it to the screen and measures the framerate. > It checks it with a range of different sample counts. The rendering is > done either by rendering two triangles to fill the framebuffer or by > calling glClear. > > The percentage increase in framerate after applying the patch is like > this: > > With triangles to fill buffer: > > 16 62,27% > 8 48,59% > 4 27,72% > 2 5,34% > 0 0,58% > > With glClear: > > 16 -5,20% > 8 -7,08% > 4 -2,45% > 2 -20,76% > 0 3,71%
I find this result interesting. In the samples=0 case, the before and after shaders should be identical. I dug into this a bit, and I think the problem is the previous patch. Using do { ... } while (false) in the macro makes the compiler generate some really, really bad code. Just deleting those two lines makes a huge difference. With do { ... } while (false): SIMD8 shader: 228 instructions. 4 loops. 1554 cycles. 0:0 spills:fills. Promoted 0 constants. Compacted 3648 to 2192 bytes (40%) Without: SIMD8 shader: 159 instructions. 0 loops. 380 cycles. 0:0 spills:fills. Promoted 0 constants. Compacted 2544 to 1600 bytes (37%) SIMD16 shader: 159 instructions. 0 loops. 898 cycles. 0:0 spills:fills. Promoted 0 constants. Compacted 2544 to 1600 bytes (37%) This is for samples=16 and sampler2DMS. The cycle counts are bogus because the 4 loops have their cycle counts statically multiplied by 10... even though the loop will only execute once. So... I'll try some more experiments. I also wonder about changing the SAMPLES > 1 to SAMPLES > 2. I will probably also make a patch to fix the damage done by the do { ... } while (false). That's just comedy. > It seems like a pretty convincing win for the triangle case but the > clear case makes it slightly worse. Presumably this is because we don't > do anything to detect the value stored in the MCS buffer when doing a > fast clear so the fast path isn't taken and the shader being more > complicated makes it slower. > > Not sure if we want to try and do anything about that because > potentially the cleared pixels aren't very common in a framebuffer from > a real use case so it might not really matter. > > Currently we don't use SIMD16 for 16x MSAA because we can't allocate the > registers well enough to make it worthwhile. This patch makes that > problem a bit more interesting because even if we end up spilling a lot > it might still be worth doing SIMD16 because the cases where the spilled > instructions are hit would be much less common. > > - Neil > > > > > Ian Romanick <i...@freedesktop.org> writes: > >> From: Ian Romanick <ian.d.roman...@intel.com> >> >> Somewhat surprisingly, this didn't have any affect on performance in the >> benchmarks that Martin tried for me. >> >> Signed-off-by: Ian Romanick <ian.d.roman...@intel.com> >> --- >> src/mesa/drivers/common/meta_blit.c | 10 +++++++++- >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> diff --git a/src/mesa/drivers/common/meta_blit.c >> b/src/mesa/drivers/common/meta_blit.c >> index 28aabd3..c0ec51f 100644 >> --- a/src/mesa/drivers/common/meta_blit.c >> +++ b/src/mesa/drivers/common/meta_blit.c >> @@ -530,6 +530,7 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, >> fs_source = ralloc_asprintf(mem_ctx, >> "#version 130\n" >> "#extension >> GL_ARB_texture_multisample: require\n" >> + "#extension >> GL_EXT_shader_samples_identical: enable\n" >> "#define gvec4 %svec4\n" >> "uniform %ssampler2DMS%s texSampler;\n" >> "in %s texCoords;\n" >> @@ -569,7 +570,14 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, >> " i%s tc = i%s(texCoords);\n" >> " int i;\n" >> "\n" >> - " for (i = 0; i < SAMPLES; i++)\n" >> + " S[0] = texelFetch(texSampler, tc, >> 0);\n" >> + "#if >> defined(GL_EXT_shader_samples_identical) && SAMPLES > 1\n" >> + " if >> (textureSamplesIdenticalEXT(texSampler, tc)) {\n" >> + " emit2(S[0]);\n" >> + " return;\n" >> + " }\n" >> + "#endif\n" >> + " for (i = 1; i < SAMPLES; i++)\n" >> " S[i] = texelFetch(texSampler, >> tc, i);\n" >> "\n" >> " REDUCE(s16, s32);\n" >> -- >> 2.5.0 >> >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev