Re: [Mesa-dev] [PATCH] i965/fs: Set pixel/sample mask for compute shaders atomic ops
Jordan Justen jordan.l.jus...@intel.com writes: On 2015-02-19 21:40:37, Ben Widawsky wrote: On Thu, Feb 19, 2015 at 03:42:05PM -0800, Jordan Justen wrote: For fragment programs, we pull this mask from the payload header. The same mask doesn't exist for compute shaders, so we set all bits to enabled. Note: this mask is ANDed with the execution mask, so some channels may not end up issuing the atomic operation. Signed-off-by: Jordan Justen jordan.l.jus...@intel.com Cc: Ben Widawsky b...@bwidawsk.net Cc: Francisco Jerez curroje...@riseup.net Just add to the commit message that this is needed specifically because compute is invoked as SIMD16 (and perhaps reference the other commits?) and it's: Reviewed-by: Ben Widawsky b...@bwidawsk.net Good idea. I'll add those. Sorry it advance... we may as well just go for 0x in case we ever support SIMD32. I had been setting all 32-bits previously. I mentioned to you that I thought this was needed for SIMD32. I wanted to double check it before sending the patch out. I think I found the field for IVB in the PRM: IVB Vol 4 Part 1 3.9.9.9 Message Header Pixel/Sample Mask ...and it looks like it is only 16-bits. Maybe Francisco can confirm that I got it right. I couldn't find this same information in the HSW PRMs. I'm not sure what that means for SIMD32. Yeah, it's only 16 bits. For SIMD32 it means that, *sigh*, we'll have to split up the message in several SIMD16 chunks (my image load store branch has to do something similar to do typed surface operations in SIMD16 mode because there are only 8-wide variants of those messages). To initialize the header we would just copy the first or second 16-bit half of the sample mask, and set the quarter control bits appropriately for each half so the execution mask is taken into account correctly. With Ben's and Matt's suggestions this patch is: Reviewed-by: Francisco Jerez curroje...@riseup.net -Jordan --- While it's fresh in our minds. :) This seems to work for gen7 gen8 CS. For CS simd16, we need the 0x change, but it seems to work fine for simd8 as well. I also tested gen8 (simd8vs), and there were no piglit regressions. src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 24cc118..960a0aa 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -2998,9 +2998,9 @@ fs_visitor::emit_untyped_atomic(unsigned atomic_op, unsigned surf_index, * mask sent in the header to compute the actual set of channels that execute * the atomic operation. */ - assert(stage == MESA_SHADER_VERTEX); + assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE); emit(MOV(component(sources[0], 7), - brw_imm_ud(0xff)))-force_writemask_all = true; + brw_imm_ud(0x)))-force_writemask_all = true; } length++; @@ -3061,9 +3061,9 @@ fs_visitor::emit_untyped_surface_read(unsigned surf_index, fs_reg dst, * mask sent in the header to compute the actual set of channels that execute * the atomic operation. */ - assert(stage == MESA_SHADER_VERTEX); + assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE); emit(MOV(component(sources[0], 7), - brw_imm_ud(0xff)))-force_writemask_all = true; + brw_imm_ud(0x)))-force_writemask_all = true; } /* Set the surface read offset. */ -- 2.1.4 pgpppOi6R6AAi.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Set pixel/sample mask for compute shaders atomic ops
On Thu, Feb 19, 2015 at 11:25:56PM -0800, Jordan Justen wrote: On 2015-02-19 21:40:37, Ben Widawsky wrote: On Thu, Feb 19, 2015 at 03:42:05PM -0800, Jordan Justen wrote: For fragment programs, we pull this mask from the payload header. The same mask doesn't exist for compute shaders, so we set all bits to enabled. Note: this mask is ANDed with the execution mask, so some channels may not end up issuing the atomic operation. Signed-off-by: Jordan Justen jordan.l.jus...@intel.com Cc: Ben Widawsky b...@bwidawsk.net Cc: Francisco Jerez curroje...@riseup.net Just add to the commit message that this is needed specifically because compute is invoked as SIMD16 (and perhaps reference the other commits?) and it's: Reviewed-by: Ben Widawsky b...@bwidawsk.net Good idea. I'll add those. Sorry it advance... we may as well just go for 0x in case we ever support SIMD32. I had been setting all 32-bits previously. I mentioned to you that I thought this was needed for SIMD32. I wanted to double check it before sending the patch out. I think I found the field for IVB in the PRM: IVB Vol 4 Part 1 3.9.9.9 Message Header Pixel/Sample Mask ...and it looks like it is only 16-bits. Maybe Francisco can confirm that I got it right. I couldn't find this same information in the HSW PRMs. I'm not sure what that means for SIMD32. -Jordan I suspect it's because the docs are super suck wrt SIMD32... but if you looked and see nothing, just leave it be. Whomever enables SIMD32 can deal with it :-) [snip] ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Set pixel/sample mask for compute shaders atomic ops
On Thu, Feb 19, 2015 at 03:42:05PM -0800, Jordan Justen wrote: For fragment programs, we pull this mask from the payload header. The same mask doesn't exist for compute shaders, so we set all bits to enabled. Note: this mask is ANDed with the execution mask, so some channels may not end up issuing the atomic operation. Signed-off-by: Jordan Justen jordan.l.jus...@intel.com Cc: Ben Widawsky b...@bwidawsk.net Cc: Francisco Jerez curroje...@riseup.net Just add to the commit message that this is needed specifically because compute is invoked as SIMD16 (and perhaps reference the other commits?) and it's: Reviewed-by: Ben Widawsky b...@bwidawsk.net Sorry it advance... we may as well just go for 0x in case we ever support SIMD32. --- While it's fresh in our minds. :) This seems to work for gen7 gen8 CS. For CS simd16, we need the 0x change, but it seems to work fine for simd8 as well. I also tested gen8 (simd8vs), and there were no piglit regressions. src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 24cc118..960a0aa 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -2998,9 +2998,9 @@ fs_visitor::emit_untyped_atomic(unsigned atomic_op, unsigned surf_index, * mask sent in the header to compute the actual set of channels that execute * the atomic operation. */ - assert(stage == MESA_SHADER_VERTEX); + assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE); emit(MOV(component(sources[0], 7), - brw_imm_ud(0xff)))-force_writemask_all = true; + brw_imm_ud(0x)))-force_writemask_all = true; } length++; @@ -3061,9 +3061,9 @@ fs_visitor::emit_untyped_surface_read(unsigned surf_index, fs_reg dst, * mask sent in the header to compute the actual set of channels that execute * the atomic operation. */ - assert(stage == MESA_SHADER_VERTEX); + assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE); emit(MOV(component(sources[0], 7), - brw_imm_ud(0xff)))-force_writemask_all = true; + brw_imm_ud(0x)))-force_writemask_all = true; } /* Set the surface read offset. */ -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Set pixel/sample mask for compute shaders atomic ops
On 2015-02-19 21:40:37, Ben Widawsky wrote: On Thu, Feb 19, 2015 at 03:42:05PM -0800, Jordan Justen wrote: For fragment programs, we pull this mask from the payload header. The same mask doesn't exist for compute shaders, so we set all bits to enabled. Note: this mask is ANDed with the execution mask, so some channels may not end up issuing the atomic operation. Signed-off-by: Jordan Justen jordan.l.jus...@intel.com Cc: Ben Widawsky b...@bwidawsk.net Cc: Francisco Jerez curroje...@riseup.net Just add to the commit message that this is needed specifically because compute is invoked as SIMD16 (and perhaps reference the other commits?) and it's: Reviewed-by: Ben Widawsky b...@bwidawsk.net Good idea. I'll add those. Sorry it advance... we may as well just go for 0x in case we ever support SIMD32. I had been setting all 32-bits previously. I mentioned to you that I thought this was needed for SIMD32. I wanted to double check it before sending the patch out. I think I found the field for IVB in the PRM: IVB Vol 4 Part 1 3.9.9.9 Message Header Pixel/Sample Mask ...and it looks like it is only 16-bits. Maybe Francisco can confirm that I got it right. I couldn't find this same information in the HSW PRMs. I'm not sure what that means for SIMD32. -Jordan --- While it's fresh in our minds. :) This seems to work for gen7 gen8 CS. For CS simd16, we need the 0x change, but it seems to work fine for simd8 as well. I also tested gen8 (simd8vs), and there were no piglit regressions. src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 24cc118..960a0aa 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -2998,9 +2998,9 @@ fs_visitor::emit_untyped_atomic(unsigned atomic_op, unsigned surf_index, * mask sent in the header to compute the actual set of channels that execute * the atomic operation. */ - assert(stage == MESA_SHADER_VERTEX); + assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE); emit(MOV(component(sources[0], 7), - brw_imm_ud(0xff)))-force_writemask_all = true; + brw_imm_ud(0x)))-force_writemask_all = true; } length++; @@ -3061,9 +3061,9 @@ fs_visitor::emit_untyped_surface_read(unsigned surf_index, fs_reg dst, * mask sent in the header to compute the actual set of channels that execute * the atomic operation. */ - assert(stage == MESA_SHADER_VERTEX); + assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE); emit(MOV(component(sources[0], 7), - brw_imm_ud(0xff)))-force_writemask_all = true; + brw_imm_ud(0x)))-force_writemask_all = true; } /* Set the surface read offset. */ -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Set pixel/sample mask for compute shaders atomic ops
On Thu, Feb 19, 2015 at 3:42 PM, Jordan Justen jordan.l.jus...@intel.com wrote: For fragment programs, we pull this mask from the payload header. The same mask doesn't exist for compute shaders, so we set all bits to enabled. Note: this mask is ANDed with the execution mask, so some channels may not end up issuing the atomic operation. Signed-off-by: Jordan Justen jordan.l.jus...@intel.com Cc: Ben Widawsky b...@bwidawsk.net Cc: Francisco Jerez curroje...@riseup.net --- While it's fresh in our minds. :) This seems to work for gen7 gen8 CS. For CS simd16, we need the 0x change, but it seems to work fine for simd8 as well. I also tested gen8 (simd8vs), and there were no piglit regressions. src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 24cc118..960a0aa 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -2998,9 +2998,9 @@ fs_visitor::emit_untyped_atomic(unsigned atomic_op, unsigned surf_index, * mask sent in the header to compute the actual set of channels that execute * the atomic operation. */ - assert(stage == MESA_SHADER_VERTEX); + assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE); emit(MOV(component(sources[0], 7), - brw_imm_ud(0xff)))-force_writemask_all = true; + brw_imm_ud(0x)))-force_writemask_all = true; brw_imm_ud returns a brw_reg, which gets converted to an fs_reg with type = HW_REG. That's not what we want (we consider HW_REGs to be barriers to instruction scheduling, for instance). Change it to fs_reg(0x) while you're modifying it. } length++; @@ -3061,9 +3061,9 @@ fs_visitor::emit_untyped_surface_read(unsigned surf_index, fs_reg dst, * mask sent in the header to compute the actual set of channels that execute * the atomic operation. */ - assert(stage == MESA_SHADER_VERTEX); + assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE); emit(MOV(component(sources[0], 7), - brw_imm_ud(0xff)))-force_writemask_all = true; + brw_imm_ud(0x)))-force_writemask_all = true; Same thing here. } /* Set the surface read offset. */ -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Set pixel/sample mask for compute shaders atomic ops
For fragment programs, we pull this mask from the payload header. The same mask doesn't exist for compute shaders, so we set all bits to enabled. Note: this mask is ANDed with the execution mask, so some channels may not end up issuing the atomic operation. Signed-off-by: Jordan Justen jordan.l.jus...@intel.com Cc: Ben Widawsky b...@bwidawsk.net Cc: Francisco Jerez curroje...@riseup.net --- While it's fresh in our minds. :) This seems to work for gen7 gen8 CS. For CS simd16, we need the 0x change, but it seems to work fine for simd8 as well. I also tested gen8 (simd8vs), and there were no piglit regressions. src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 24cc118..960a0aa 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -2998,9 +2998,9 @@ fs_visitor::emit_untyped_atomic(unsigned atomic_op, unsigned surf_index, * mask sent in the header to compute the actual set of channels that execute * the atomic operation. */ - assert(stage == MESA_SHADER_VERTEX); + assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE); emit(MOV(component(sources[0], 7), - brw_imm_ud(0xff)))-force_writemask_all = true; + brw_imm_ud(0x)))-force_writemask_all = true; } length++; @@ -3061,9 +3061,9 @@ fs_visitor::emit_untyped_surface_read(unsigned surf_index, fs_reg dst, * mask sent in the header to compute the actual set of channels that execute * the atomic operation. */ - assert(stage == MESA_SHADER_VERTEX); + assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE); emit(MOV(component(sources[0], 7), - brw_imm_ud(0xff)))-force_writemask_all = true; + brw_imm_ud(0x)))-force_writemask_all = true; } /* Set the surface read offset. */ -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev