blit: Use GL_EXT_shader_samples_identical in MSAA-SS resolve blit

Ian Romanick Thu, 18 Feb 2016 13:47:47 -0800

On 02/18/2016 08:44 AM, Neil Roberts wrote:
> I made a pathological test case (attached) which repeatedly renders into
> an MSAA FBO and then blits it to the screen and measures the framerate.
> It checks it with a range of different sample counts. The rendering is
> done either by rendering two triangles to fill the framebuffer or by
> calling glClear.
> 
> The percentage increase in framerate after applying the patch is like
> this:
> 
> With triangles to fill buffer:
>       
> 16      62,27%
> 8       48,59%
> 4       27,72%
> 2       5,34%
> 0       0,58%
>         
> With glClear:   
>         
> 16      -5,20%
> 8       -7,08%
> 4       -2,45%
> 2       -20,76%
> 0       3,71%


I find this result interesting.  In the samples=0 case, the before and
after shaders should be identical.  I dug into this a bit, and I think
the problem is the previous patch.  Using do { ... } while (false) in
the macro makes the compiler generate some really, really bad code.
Just deleting those two lines makes a huge difference.

With do { ... } while (false):

SIMD8 shader: 228 instructions. 4 loops. 1554 cycles. 0:0 spills:fills.
Promoted 0 constants. Compacted 3648 to 2192 bytes (40%)

Without:

SIMD8 shader: 159 instructions. 0 loops. 380 cycles. 0:0 spills:fills.
Promoted 0 constants. Compacted 2544 to 1600 bytes (37%)
SIMD16 shader: 159 instructions. 0 loops. 898 cycles. 0:0 spills:fills.
Promoted 0 constants. Compacted 2544 to 1600 bytes (37%)

This is for samples=16 and sampler2DMS.  The cycle counts are bogus
because the 4 loops have their cycle counts statically multiplied by
10... even though the loop will only execute once.

So... I'll try some more experiments.  I also wonder about changing the
SAMPLES > 1 to SAMPLES > 2.

I will probably also make a patch to fix the damage done by the do { ...
} while (false).  That's just comedy.

> It seems like a pretty convincing win for the triangle case but the
> clear case makes it slightly worse. Presumably this is because we don't
> do anything to detect the value stored in the MCS buffer when doing a
> fast clear so the fast path isn't taken and the shader being more
> complicated makes it slower.
> 
> Not sure if we want to try and do anything about that because
> potentially the cleared pixels aren't very common in a framebuffer from
> a real use case so it might not really matter.
> 
> Currently we don't use SIMD16 for 16x MSAA because we can't allocate the
> registers well enough to make it worthwhile. This patch makes that
> problem a bit more interesting because even if we end up spilling a lot
> it might still be worth doing SIMD16 because the cases where the spilled
> instructions are hit would be much less common.
> 
> - Neil
> 
> 
> 
> 
> Ian Romanick <i...@freedesktop.org> writes:
> 
>> From: Ian Romanick <ian.d.roman...@intel.com>
>>
>> Somewhat surprisingly, this didn't have any affect on performance in the
>> benchmarks that Martin tried for me.
>>
>> Signed-off-by: Ian Romanick <ian.d.roman...@intel.com>
>> ---
>>  src/mesa/drivers/common/meta_blit.c | 10 +++++++++-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/drivers/common/meta_blit.c 
>> b/src/mesa/drivers/common/meta_blit.c
>> index 28aabd3..c0ec51f 100644
>> --- a/src/mesa/drivers/common/meta_blit.c
>> +++ b/src/mesa/drivers/common/meta_blit.c
>> @@ -530,6 +530,7 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
>>           fs_source = ralloc_asprintf(mem_ctx,
>>                                       "#version 130\n"
>>                                       "#extension 
>> GL_ARB_texture_multisample: require\n"
>> +                                     "#extension 
>> GL_EXT_shader_samples_identical: enable\n"
>>                                       "#define gvec4 %svec4\n"
>>                                       "uniform %ssampler2DMS%s texSampler;\n"
>>                                       "in %s texCoords;\n"
>> @@ -569,7 +570,14 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
>>                                       "   i%s tc = i%s(texCoords);\n"
>>                                       "   int i;\n"
>>                                       "\n"
>> -                                     "   for (i = 0; i < SAMPLES; i++)\n"
>> +                                     "   S[0] = texelFetch(texSampler, tc, 
>> 0);\n"
>> +                                     "#if 
>> defined(GL_EXT_shader_samples_identical) && SAMPLES > 1\n"
>> +                                     "   if 
>> (textureSamplesIdenticalEXT(texSampler, tc)) {\n"
>> +                                     "      emit2(S[0]);\n"
>> +                                     "      return;\n"
>> +                                     "   }\n"
>> +                                     "#endif\n"
>> +                                     "   for (i = 1; i < SAMPLES; i++)\n"
>>                                       "      S[i] = texelFetch(texSampler, 
>> tc, i);\n"
>>                                       "\n"
>>                                       "   REDUCE(s16, s32);\n"
>> -- 
>> 2.5.0
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 22/22] meta/blit: Use GL_EXT_shader_samples_identical in MSAA-SS resolve blit

Reply via email to