Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-21 Thread Ben Widawsky
On Tue, Oct 20, 2015 at 02:48:41PM -0700, Matt Turner wrote:
> On Tue, Oct 20, 2015 at 2:41 PM, Ben Widawsky  wrote:
> > On Tue, Oct 20, 2015 at 11:56:15AM +0200, Neil Roberts wrote:
> >> In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4
> >> message types because I found by experimentation that it doesn't work.
> >> I wrote in the comment that I couldn't find any documentation for this
> >> problem. However I've now found the documentation and it has
> >> additional restrictions on further message types so this patch updates
> >> the comment and adds the others.
> >> ---
> >>
> >> That paragraph in the spec also mentions further restrictions that we
> >> should probably worry about like that the shader shouldn't combine
> >> this optimisation with any other render target data port read/writes.
> >>
> >> It also has a fairly pessimistic note saying the optimisation is only
> >> really good for large polygons in a GUI-like workload. I wonder
> >> whether we should be doing some more benchmarking to decide whether
> >> it's really a good idea to enable this as a general optimisation even
> >> for games.
> >
> > I remember seeing this before, but I cannot find it now. All I am seeing
> > regarding performance implications are the bits about requiring a header, 
> > and
> > writing to the same pixel from multiple threads. The latter one I assume is 
> > only
> > going to happen with MSAA?
> 
> No, I don't think so. As I understand it, the EUs can be executing
> fragment shaders for multiple primitives at the same time, and those
> primitives might overlap. The c in sendc means that it does some extra
> tracking to ensure that the render target writes land in the correct
> order.
> 
> Presumably by using sendc to texture directly to the render target, it
> adds some extra synchronization (before the texturing is done... or
> something?) that especially hurts when there's a lot of overlapping
> primitives (as in the case of lots of small primitives).

Ah, Neil pointed me to the blurb. Putting this here to remind myself... I think
a cheap way to measure things is to turn the sendc into a send. Things will
probably render wrong, but it should eliminate the bottleneck. If we can see
measurable perf difference with send it certainly would indicate we need to
spend time optimizing the optimization.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-21 Thread Neil Roberts
Ilia Mirkin  writes:

>> -   if (tex_inst->opcode == SHADER_OPCODE_TG4 ||
>> +   if (tex_inst->opcode == SHADER_OPCODE_TXS ||
>> +   tex_inst->opcode == SHADER_OPCODE_LOD ||
>> +   tex_inst->opcode == SHADER_OPCODE_TG4 ||
>> tex_inst->opcode == SHADER_OPCODE_TG4_OFFSET)
>
> Do you also need to include SHADER_OPCODE_SAMPLEINFO?

Oops, yes, thanks! I've gone ahead and pushed it with this additional
change.

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-20 Thread Neil Roberts
In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4
message types because I found by experimentation that it doesn't work.
I wrote in the comment that I couldn't find any documentation for this
problem. However I've now found the documentation and it has
additional restrictions on further message types so this patch updates
the comment and adds the others.
---

That paragraph in the spec also mentions further restrictions that we
should probably worry about like that the shader shouldn't combine
this optimisation with any other render target data port read/writes.

It also has a fairly pessimistic note saying the optimisation is only
really good for large polygons in a GUI-like workload. I wonder
whether we should be doing some more benchmarking to decide whether
it's really a good idea to enable this as a general optimisation even
for games.

 src/mesa/drivers/dri/i965/brw_fs.cpp | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 49323eb..bf9ff84 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2238,13 +2238,14 @@ fs_visitor::opt_sampler_eot()
if (unlikely(tex_inst->is_head_sentinel()) || !tex_inst->is_tex())
   return false;
 
-   /* This optimisation doesn't seem to work for textureGather for some
-* reason. I can't find any documentation or known workarounds to indicate
-* that this is expected, but considering that it is probably pretty
-* unlikely that a shader would directly write out the results from
-* textureGather we might as well just disable it.
+   /* 3D Sampler » Messages » Message Format
+*
+* “Response Length of zero is allowed on all SIMD8* and SIMD16* sampler
+*  messages except sample+killpix, resinfo, sampleinfo, LOD, and gather4*”
 */
-   if (tex_inst->opcode == SHADER_OPCODE_TG4 ||
+   if (tex_inst->opcode == SHADER_OPCODE_TXS ||
+   tex_inst->opcode == SHADER_OPCODE_LOD ||
+   tex_inst->opcode == SHADER_OPCODE_TG4 ||
tex_inst->opcode == SHADER_OPCODE_TG4_OFFSET)
   return false;
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-20 Thread Matt Turner
On Tue, Oct 20, 2015 at 2:56 AM, Neil Roberts  wrote:
> In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4
> message types because I found by experimentation that it doesn't work.
> I wrote in the comment that I couldn't find any documentation for this
> problem. However I've now found the documentation and it has
> additional restrictions on further message types so this patch updates
> the comment and adds the others.
> ---
>
> That paragraph in the spec also mentions further restrictions that we
> should probably worry about like that the shader shouldn't combine
> this optimisation with any other render target data port read/writes.
>
> It also has a fairly pessimistic note saying the optimisation is only
> really good for large polygons in a GUI-like workload. I wonder
> whether we should be doing some more benchmarking to decide whether
> it's really a good idea to enable this as a general optimisation even
> for games.

That's frustrating. I wish the documentation would contain notes like
this from the beginning, but maybe I should just be thankful that it
includes that text at all.

Both patches are

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-20 Thread Matt Turner
On Tue, Oct 20, 2015 at 11:57 AM, Matt Turner  wrote:
> On Tue, Oct 20, 2015 at 2:56 AM, Neil Roberts  wrote:
>> In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4
>> message types because I found by experimentation that it doesn't work.
>> I wrote in the comment that I couldn't find any documentation for this
>> problem. However I've now found the documentation and it has
>> additional restrictions on further message types so this patch updates
>> the comment and adds the others.
>> ---
>>
>> That paragraph in the spec also mentions further restrictions that we
>> should probably worry about like that the shader shouldn't combine
>> this optimisation with any other render target data port read/writes.
>>
>> It also has a fairly pessimistic note saying the optimisation is only
>> really good for large polygons in a GUI-like workload. I wonder
>> whether we should be doing some more benchmarking to decide whether
>> it's really a good idea to enable this as a general optimisation even
>> for games.
>
> That's frustrating. I wish the documentation would contain notes like
> this from the beginning, but maybe I should just be thankful that it
> includes that text at all.
>
> Both patches are
>
> Reviewed-by: Matt Turner 

Just this one actually. I thought it was part of a series.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-20 Thread Ilia Mirkin
On Tue, Oct 20, 2015 at 5:56 AM, Neil Roberts  wrote:
> In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4
> message types because I found by experimentation that it doesn't work.
> I wrote in the comment that I couldn't find any documentation for this
> problem. However I've now found the documentation and it has
> additional restrictions on further message types so this patch updates
> the comment and adds the others.
> ---
>
> That paragraph in the spec also mentions further restrictions that we
> should probably worry about like that the shader shouldn't combine
> this optimisation with any other render target data port read/writes.
>
> It also has a fairly pessimistic note saying the optimisation is only
> really good for large polygons in a GUI-like workload. I wonder
> whether we should be doing some more benchmarking to decide whether
> it's really a good idea to enable this as a general optimisation even
> for games.
>
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 49323eb..bf9ff84 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2238,13 +2238,14 @@ fs_visitor::opt_sampler_eot()
> if (unlikely(tex_inst->is_head_sentinel()) || !tex_inst->is_tex())
>return false;
>
> -   /* This optimisation doesn't seem to work for textureGather for some
> -* reason. I can't find any documentation or known workarounds to indicate
> -* that this is expected, but considering that it is probably pretty
> -* unlikely that a shader would directly write out the results from
> -* textureGather we might as well just disable it.
> +   /* 3D Sampler » Messages » Message Format
> +*
> +* “Response Length of zero is allowed on all SIMD8* and SIMD16* sampler
> +*  messages except sample+killpix, resinfo, sampleinfo, LOD, and 
> gather4*”
>  */
> -   if (tex_inst->opcode == SHADER_OPCODE_TG4 ||
> +   if (tex_inst->opcode == SHADER_OPCODE_TXS ||
> +   tex_inst->opcode == SHADER_OPCODE_LOD ||
> +   tex_inst->opcode == SHADER_OPCODE_TG4 ||
> tex_inst->opcode == SHADER_OPCODE_TG4_OFFSET)

Do you also need to include SHADER_OPCODE_SAMPLEINFO?

>return false;
>
> --
> 1.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-20 Thread Matt Turner
On Tue, Oct 20, 2015 at 2:41 PM, Ben Widawsky  wrote:
> On Tue, Oct 20, 2015 at 11:56:15AM +0200, Neil Roberts wrote:
>> In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4
>> message types because I found by experimentation that it doesn't work.
>> I wrote in the comment that I couldn't find any documentation for this
>> problem. However I've now found the documentation and it has
>> additional restrictions on further message types so this patch updates
>> the comment and adds the others.
>> ---
>>
>> That paragraph in the spec also mentions further restrictions that we
>> should probably worry about like that the shader shouldn't combine
>> this optimisation with any other render target data port read/writes.
>>
>> It also has a fairly pessimistic note saying the optimisation is only
>> really good for large polygons in a GUI-like workload. I wonder
>> whether we should be doing some more benchmarking to decide whether
>> it's really a good idea to enable this as a general optimisation even
>> for games.
>
> I remember seeing this before, but I cannot find it now. All I am seeing
> regarding performance implications are the bits about requiring a header, and
> writing to the same pixel from multiple threads. The latter one I assume is 
> only
> going to happen with MSAA?

No, I don't think so. As I understand it, the EUs can be executing
fragment shaders for multiple primitives at the same time, and those
primitives might overlap. The c in sendc means that it does some extra
tracking to ensure that the render target writes land in the correct
order.

Presumably by using sendc to texture directly to the render target, it
adds some extra synchronization (before the texturing is done... or
something?) that especially hurts when there's a lot of overlapping
primitives (as in the case of lots of small primitives).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-20 Thread Ben Widawsky
On Tue, Oct 20, 2015 at 11:56:15AM +0200, Neil Roberts wrote:
> In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4
> message types because I found by experimentation that it doesn't work.
> I wrote in the comment that I couldn't find any documentation for this
> problem. However I've now found the documentation and it has
> additional restrictions on further message types so this patch updates
> the comment and adds the others.
> ---
> 
> That paragraph in the spec also mentions further restrictions that we
> should probably worry about like that the shader shouldn't combine
> this optimisation with any other render target data port read/writes.
> 
> It also has a fairly pessimistic note saying the optimisation is only
> really good for large polygons in a GUI-like workload. I wonder
> whether we should be doing some more benchmarking to decide whether
> it's really a good idea to enable this as a general optimisation even
> for games.

I remember seeing this before, but I cannot find it now. All I am seeing
regarding performance implications are the bits about requiring a header, and
writing to the same pixel from multiple threads. The latter one I assume is only
going to happen with MSAA?


> 
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 49323eb..bf9ff84 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2238,13 +2238,14 @@ fs_visitor::opt_sampler_eot()
> if (unlikely(tex_inst->is_head_sentinel()) || !tex_inst->is_tex())
>return false;
>  
> -   /* This optimisation doesn't seem to work for textureGather for some
> -* reason. I can't find any documentation or known workarounds to indicate
> -* that this is expected, but considering that it is probably pretty
> -* unlikely that a shader would directly write out the results from
> -* textureGather we might as well just disable it.
> +   /* 3D Sampler » Messages » Message Format
> +*
> +* “Response Length of zero is allowed on all SIMD8* and SIMD16* sampler
> +*  messages except sample+killpix, resinfo, sampleinfo, LOD, and 
> gather4*”
>  */
> -   if (tex_inst->opcode == SHADER_OPCODE_TG4 ||
> +   if (tex_inst->opcode == SHADER_OPCODE_TXS ||
> +   tex_inst->opcode == SHADER_OPCODE_LOD ||
> +   tex_inst->opcode == SHADER_OPCODE_TG4 ||
> tex_inst->opcode == SHADER_OPCODE_TG4_OFFSET)
>return false;
>  
> -- 
> 1.9.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev