Re: [Mesa-dev] [PATCH] i965/fs: Set pixel/sample mask for compute shaders atomic ops

2015-02-20 Thread Francisco Jerez
Jordan Justen jordan.l.jus...@intel.com writes:

 On 2015-02-19 21:40:37, Ben Widawsky wrote:
 On Thu, Feb 19, 2015 at 03:42:05PM -0800, Jordan Justen wrote:
  For fragment programs, we pull this mask from the payload header. The same
  mask doesn't exist for compute shaders, so we set all bits to enabled.
  
  Note: this mask is ANDed with the execution mask, so some channels may not 
  end
  up issuing the atomic operation.
  
  Signed-off-by: Jordan Justen jordan.l.jus...@intel.com
  Cc: Ben Widawsky b...@bwidawsk.net
  Cc: Francisco Jerez curroje...@riseup.net
 
 Just add to the commit message that this is needed specifically because 
 compute
 is invoked as SIMD16 (and perhaps reference the other commits?) and it's:
 Reviewed-by: Ben Widawsky b...@bwidawsk.net

 Good idea. I'll add those.

 Sorry it advance...
 we may as well just go for 0x in case we ever support SIMD32.

 I had been setting all 32-bits previously. I mentioned to you that I
 thought this was needed for SIMD32. I wanted to double check it before
 sending the patch out. I think I found the field for IVB in the PRM:

 IVB Vol 4 Part 1 3.9.9.9 Message Header
 Pixel/Sample Mask

 ...and it looks like it is only 16-bits. Maybe Francisco can confirm
 that I got it right.

 I couldn't find this same information in the HSW PRMs.

 I'm not sure what that means for SIMD32.

Yeah, it's only 16 bits.  For SIMD32 it means that, *sigh*, we'll have
to split up the message in several SIMD16 chunks (my image load store
branch has to do something similar to do typed surface operations in
SIMD16 mode because there are only 8-wide variants of those messages).
To initialize the header we would just copy the first or second 16-bit
half of the sample mask, and set the quarter control bits appropriately
for each half so the execution mask is taken into account correctly.

With Ben's and Matt's suggestions this patch is:
Reviewed-by: Francisco Jerez curroje...@riseup.net


 -Jordan

  ---
   While it's fresh in our minds. :)
  
   This seems to work for gen7  gen8 CS. For CS simd16, we need the
   0x change, but it seems to work fine for simd8 as well.
  
   I also tested gen8 (simd8vs), and there were no piglit regressions.
  
   src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 8 
   1 file changed, 4 insertions(+), 4 deletions(-)
  
  diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
  b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
  index 24cc118..960a0aa 100644
  --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
  +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
  @@ -2998,9 +2998,9 @@ fs_visitor::emit_untyped_atomic(unsigned atomic_op, 
  unsigned surf_index,
  * mask sent in the header to compute the actual set of channels 
  that execute
  * the atomic operation.
  */
  -  assert(stage == MESA_SHADER_VERTEX);
  +  assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE);
 emit(MOV(component(sources[0], 7),
  -   brw_imm_ud(0xff)))-force_writemask_all = true;
  +   brw_imm_ud(0x)))-force_writemask_all = true;
  }
  length++;
   
  @@ -3061,9 +3061,9 @@ fs_visitor::emit_untyped_surface_read(unsigned 
  surf_index, fs_reg dst,
  * mask sent in the header to compute the actual set of channels 
  that execute
  * the atomic operation.
  */
  -  assert(stage == MESA_SHADER_VERTEX);
  +  assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE);
 emit(MOV(component(sources[0], 7),
  -   brw_imm_ud(0xff)))-force_writemask_all = true;
  +   brw_imm_ud(0x)))-force_writemask_all = true;
  }
   
  /* Set the surface read offset. */
  -- 
  2.1.4
  


pgpppOi6R6AAi.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Set pixel/sample mask for compute shaders atomic ops

2015-02-19 Thread Ben Widawsky
On Thu, Feb 19, 2015 at 11:25:56PM -0800, Jordan Justen wrote:
 On 2015-02-19 21:40:37, Ben Widawsky wrote:
  On Thu, Feb 19, 2015 at 03:42:05PM -0800, Jordan Justen wrote:
   For fragment programs, we pull this mask from the payload header. The same
   mask doesn't exist for compute shaders, so we set all bits to enabled.
   
   Note: this mask is ANDed with the execution mask, so some channels may 
   not end
   up issuing the atomic operation.
   
   Signed-off-by: Jordan Justen jordan.l.jus...@intel.com
   Cc: Ben Widawsky b...@bwidawsk.net
   Cc: Francisco Jerez curroje...@riseup.net
  
  Just add to the commit message that this is needed specifically because 
  compute
  is invoked as SIMD16 (and perhaps reference the other commits?) and it's:
  Reviewed-by: Ben Widawsky b...@bwidawsk.net
 
 Good idea. I'll add those.
 
  Sorry it advance...
  we may as well just go for 0x in case we ever support SIMD32.
 
 I had been setting all 32-bits previously. I mentioned to you that I
 thought this was needed for SIMD32. I wanted to double check it before
 sending the patch out. I think I found the field for IVB in the PRM:
 
 IVB Vol 4 Part 1 3.9.9.9 Message Header
 Pixel/Sample Mask
 
 ...and it looks like it is only 16-bits. Maybe Francisco can confirm
 that I got it right.
 
 I couldn't find this same information in the HSW PRMs.
 
 I'm not sure what that means for SIMD32.
 
 -Jordan

I suspect it's because the docs are super suck wrt SIMD32... but if you looked
and see nothing, just leave it be. Whomever enables SIMD32 can deal with it :-)

[snip]

 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Set pixel/sample mask for compute shaders atomic ops

2015-02-19 Thread Ben Widawsky
On Thu, Feb 19, 2015 at 03:42:05PM -0800, Jordan Justen wrote:
 For fragment programs, we pull this mask from the payload header. The same
 mask doesn't exist for compute shaders, so we set all bits to enabled.
 
 Note: this mask is ANDed with the execution mask, so some channels may not end
 up issuing the atomic operation.
 
 Signed-off-by: Jordan Justen jordan.l.jus...@intel.com
 Cc: Ben Widawsky b...@bwidawsk.net
 Cc: Francisco Jerez curroje...@riseup.net

Just add to the commit message that this is needed specifically because compute
is invoked as SIMD16 (and perhaps reference the other commits?) and it's:
Reviewed-by: Ben Widawsky b...@bwidawsk.net

Sorry it advance...
we may as well just go for 0x in case we ever support SIMD32.

 ---
  While it's fresh in our minds. :)
 
  This seems to work for gen7  gen8 CS. For CS simd16, we need the
  0x change, but it seems to work fine for simd8 as well.
 
  I also tested gen8 (simd8vs), and there were no piglit regressions.
 
  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 index 24cc118..960a0aa 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 @@ -2998,9 +2998,9 @@ fs_visitor::emit_untyped_atomic(unsigned atomic_op, 
 unsigned surf_index,
 * mask sent in the header to compute the actual set of channels that 
 execute
 * the atomic operation.
 */
 -  assert(stage == MESA_SHADER_VERTEX);
 +  assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE);
emit(MOV(component(sources[0], 7),
 -   brw_imm_ud(0xff)))-force_writemask_all = true;
 +   brw_imm_ud(0x)))-force_writemask_all = true;
 }
 length++;
  
 @@ -3061,9 +3061,9 @@ fs_visitor::emit_untyped_surface_read(unsigned 
 surf_index, fs_reg dst,
 * mask sent in the header to compute the actual set of channels that 
 execute
 * the atomic operation.
 */
 -  assert(stage == MESA_SHADER_VERTEX);
 +  assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE);
emit(MOV(component(sources[0], 7),
 -   brw_imm_ud(0xff)))-force_writemask_all = true;
 +   brw_imm_ud(0x)))-force_writemask_all = true;
 }
  
 /* Set the surface read offset. */
 -- 
 2.1.4
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Set pixel/sample mask for compute shaders atomic ops

2015-02-19 Thread Jordan Justen
On 2015-02-19 21:40:37, Ben Widawsky wrote:
 On Thu, Feb 19, 2015 at 03:42:05PM -0800, Jordan Justen wrote:
  For fragment programs, we pull this mask from the payload header. The same
  mask doesn't exist for compute shaders, so we set all bits to enabled.
  
  Note: this mask is ANDed with the execution mask, so some channels may not 
  end
  up issuing the atomic operation.
  
  Signed-off-by: Jordan Justen jordan.l.jus...@intel.com
  Cc: Ben Widawsky b...@bwidawsk.net
  Cc: Francisco Jerez curroje...@riseup.net
 
 Just add to the commit message that this is needed specifically because 
 compute
 is invoked as SIMD16 (and perhaps reference the other commits?) and it's:
 Reviewed-by: Ben Widawsky b...@bwidawsk.net

Good idea. I'll add those.

 Sorry it advance...
 we may as well just go for 0x in case we ever support SIMD32.

I had been setting all 32-bits previously. I mentioned to you that I
thought this was needed for SIMD32. I wanted to double check it before
sending the patch out. I think I found the field for IVB in the PRM:

IVB Vol 4 Part 1 3.9.9.9 Message Header
Pixel/Sample Mask

...and it looks like it is only 16-bits. Maybe Francisco can confirm
that I got it right.

I couldn't find this same information in the HSW PRMs.

I'm not sure what that means for SIMD32.

-Jordan

  ---
   While it's fresh in our minds. :)
  
   This seems to work for gen7  gen8 CS. For CS simd16, we need the
   0x change, but it seems to work fine for simd8 as well.
  
   I also tested gen8 (simd8vs), and there were no piglit regressions.
  
   src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 8 
   1 file changed, 4 insertions(+), 4 deletions(-)
  
  diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
  b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
  index 24cc118..960a0aa 100644
  --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
  +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
  @@ -2998,9 +2998,9 @@ fs_visitor::emit_untyped_atomic(unsigned atomic_op, 
  unsigned surf_index,
  * mask sent in the header to compute the actual set of channels 
  that execute
  * the atomic operation.
  */
  -  assert(stage == MESA_SHADER_VERTEX);
  +  assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE);
 emit(MOV(component(sources[0], 7),
  -   brw_imm_ud(0xff)))-force_writemask_all = true;
  +   brw_imm_ud(0x)))-force_writemask_all = true;
  }
  length++;
   
  @@ -3061,9 +3061,9 @@ fs_visitor::emit_untyped_surface_read(unsigned 
  surf_index, fs_reg dst,
  * mask sent in the header to compute the actual set of channels 
  that execute
  * the atomic operation.
  */
  -  assert(stage == MESA_SHADER_VERTEX);
  +  assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE);
 emit(MOV(component(sources[0], 7),
  -   brw_imm_ud(0xff)))-force_writemask_all = true;
  +   brw_imm_ud(0x)))-force_writemask_all = true;
  }
   
  /* Set the surface read offset. */
  -- 
  2.1.4
  
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Set pixel/sample mask for compute shaders atomic ops

2015-02-19 Thread Matt Turner
On Thu, Feb 19, 2015 at 3:42 PM, Jordan Justen
jordan.l.jus...@intel.com wrote:
 For fragment programs, we pull this mask from the payload header. The same
 mask doesn't exist for compute shaders, so we set all bits to enabled.

 Note: this mask is ANDed with the execution mask, so some channels may not end
 up issuing the atomic operation.

 Signed-off-by: Jordan Justen jordan.l.jus...@intel.com
 Cc: Ben Widawsky b...@bwidawsk.net
 Cc: Francisco Jerez curroje...@riseup.net
 ---
  While it's fresh in our minds. :)

  This seems to work for gen7  gen8 CS. For CS simd16, we need the
  0x change, but it seems to work fine for simd8 as well.

  I also tested gen8 (simd8vs), and there were no piglit regressions.

  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 index 24cc118..960a0aa 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 @@ -2998,9 +2998,9 @@ fs_visitor::emit_untyped_atomic(unsigned atomic_op, 
 unsigned surf_index,
 * mask sent in the header to compute the actual set of channels that 
 execute
 * the atomic operation.
 */
 -  assert(stage == MESA_SHADER_VERTEX);
 +  assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE);
emit(MOV(component(sources[0], 7),
 -   brw_imm_ud(0xff)))-force_writemask_all = true;
 +   brw_imm_ud(0x)))-force_writemask_all = true;

brw_imm_ud returns a brw_reg, which gets converted to an fs_reg with
type = HW_REG. That's not what we want (we consider HW_REGs to be
barriers to instruction scheduling, for instance). Change it to
fs_reg(0x) while you're modifying it.

 }
 length++;

 @@ -3061,9 +3061,9 @@ fs_visitor::emit_untyped_surface_read(unsigned 
 surf_index, fs_reg dst,
 * mask sent in the header to compute the actual set of channels that 
 execute
 * the atomic operation.
 */
 -  assert(stage == MESA_SHADER_VERTEX);
 +  assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE);
emit(MOV(component(sources[0], 7),
 -   brw_imm_ud(0xff)))-force_writemask_all = true;
 +   brw_imm_ud(0x)))-force_writemask_all = true;

Same thing here.

 }

 /* Set the surface read offset. */
 --
 2.1.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Set pixel/sample mask for compute shaders atomic ops

2015-02-19 Thread Jordan Justen
For fragment programs, we pull this mask from the payload header. The same
mask doesn't exist for compute shaders, so we set all bits to enabled.

Note: this mask is ANDed with the execution mask, so some channels may not end
up issuing the atomic operation.

Signed-off-by: Jordan Justen jordan.l.jus...@intel.com
Cc: Ben Widawsky b...@bwidawsk.net
Cc: Francisco Jerez curroje...@riseup.net
---
 While it's fresh in our minds. :)

 This seems to work for gen7  gen8 CS. For CS simd16, we need the
 0x change, but it seems to work fine for simd8 as well.

 I also tested gen8 (simd8vs), and there were no piglit regressions.

 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 24cc118..960a0aa 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -2998,9 +2998,9 @@ fs_visitor::emit_untyped_atomic(unsigned atomic_op, 
unsigned surf_index,
* mask sent in the header to compute the actual set of channels that 
execute
* the atomic operation.
*/
-  assert(stage == MESA_SHADER_VERTEX);
+  assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE);
   emit(MOV(component(sources[0], 7),
-   brw_imm_ud(0xff)))-force_writemask_all = true;
+   brw_imm_ud(0x)))-force_writemask_all = true;
}
length++;
 
@@ -3061,9 +3061,9 @@ fs_visitor::emit_untyped_surface_read(unsigned 
surf_index, fs_reg dst,
* mask sent in the header to compute the actual set of channels that 
execute
* the atomic operation.
*/
-  assert(stage == MESA_SHADER_VERTEX);
+  assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE);
   emit(MOV(component(sources[0], 7),
-   brw_imm_ud(0xff)))-force_writemask_all = true;
+   brw_imm_ud(0x)))-force_writemask_all = true;
}
 
/* Set the surface read offset. */
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev