Re: [Mesa-dev] [PATCH 2/2] i965: Implement ARB_compute_variable_group_size.

2018-06-04 Thread Plamena Manolova
On Mon, 4 Jun 2018 at 21:36, Ilia Mirkin  wrote:

> On Mon, Jun 4, 2018 at 4:30 PM, Plamena Manolova
>  wrote:
> > Thank you for the review Ilia!
> >
> > On Fri, 1 Jun 2018 at 23:44, Ilia Mirkin  wrote:
> >>
> >> On Fri, Jun 1, 2018 at 6:21 PM, Plamena Manolova
> >>  wrote:
> >> > +  /*
> >> > +   * If the local work group size is variable we have to use a
> >> > dispatch
> >> > +   * width of 32 here, since at this point we don't know the
> actual
> >> > size of
> >> > +   * the workload.
> >> > +   */
> >> > +  min_dispatch_width = 32;
> >>
> >> Is that a good idea? You are able to specify a different maximum when
> >> using a variable size (MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB)
> >> s.t. this is 16 (or even 8, although that may be too few for practical
> >> use) -- that way you would just set the max to 768 or whatever on
> >> gen8+.
> >
> >
> > That's a good point, MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB is the
> > same on all platforms, so it makes sense to use simd16 instead. Thank you
> > for noticing that.
>
> Well IIRC gen7/gen7.5 can do something like 1536 invocations with
> SIMD16, while gen8 are down to 768. Either way, not forcing SIMD32 may
> be nice -- but that's something for you Intel folk to decide. I just
> wanted to point out that you could have different max's for "regular"
> compute (where the min max is 1024) and variable-size groups.
>

Ah, I see what you mean now. I took a closer look at
MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB and it's
related to the maximum number of CS threads (which of course varies).


>   -ilia
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: Implement ARB_compute_variable_group_size.

2018-06-04 Thread Ilia Mirkin
On Mon, Jun 4, 2018 at 4:30 PM, Plamena Manolova
 wrote:
> Thank you for the review Ilia!
>
> On Fri, 1 Jun 2018 at 23:44, Ilia Mirkin  wrote:
>>
>> On Fri, Jun 1, 2018 at 6:21 PM, Plamena Manolova
>>  wrote:
>> > +  /*
>> > +   * If the local work group size is variable we have to use a
>> > dispatch
>> > +   * width of 32 here, since at this point we don't know the actual
>> > size of
>> > +   * the workload.
>> > +   */
>> > +  min_dispatch_width = 32;
>>
>> Is that a good idea? You are able to specify a different maximum when
>> using a variable size (MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB)
>> s.t. this is 16 (or even 8, although that may be too few for practical
>> use) -- that way you would just set the max to 768 or whatever on
>> gen8+.
>
>
> That's a good point, MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB is the
> same on all platforms, so it makes sense to use simd16 instead. Thank you
> for noticing that.

Well IIRC gen7/gen7.5 can do something like 1536 invocations with
SIMD16, while gen8 are down to 768. Either way, not forcing SIMD32 may
be nice -- but that's something for you Intel folk to decide. I just
wanted to point out that you could have different max's for "regular"
compute (where the min max is 1024) and variable-size groups.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: Implement ARB_compute_variable_group_size.

2018-06-04 Thread Plamena Manolova
Thank you for the review Ilia!

On Fri, 1 Jun 2018 at 23:44, Ilia Mirkin  wrote:

> On Fri, Jun 1, 2018 at 6:21 PM, Plamena Manolova
>  wrote:
> > This patch adds the implentation of ARB_compute_variable_group_size
> > for i965. We do this by storing the group size in a buffer surface,
> > similarly to the work group number.
> >
> > Signed-off-by: Plamena Manolova 
> > ---
> >  docs/features.txt|  2 +-
> >  docs/relnotes/18.2.0.html|  1 +
> >  src/compiler/nir/nir_lower_system_values.c   | 14 
> >  src/intel/compiler/brw_compiler.h|  2 +
> >  src/intel/compiler/brw_fs.cpp| 45 
> >  src/intel/compiler/brw_fs_nir.cpp| 20 ++
> >  src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 87
> +---
> >  src/mesa/drivers/dri/i965/brw_compute.c  | 25 ++-
> >  src/mesa/drivers/dri/i965/brw_context.h  |  1 +
> >  src/mesa/drivers/dri/i965/brw_cs.c   |  4 ++
> >  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 27 +++-
> >  src/mesa/drivers/dri/i965/intel_extensions.c |  1 +
> >  12 files changed, 187 insertions(+), 42 deletions(-)
> >
> > diff --git a/docs/features.txt b/docs/features.txt
> > index ed4050cf98..7c3c856d73 100644
> > --- a/docs/features.txt
> > +++ b/docs/features.txt
> > @@ -298,7 +298,7 @@ Khronos, ARB, and OES extensions that are not part
> of any OpenGL or OpenGL ES ve
> >
> >GL_ARB_bindless_texture   DONE (nvc0,
> radeonsi)
> >GL_ARB_cl_event   not started
> > -  GL_ARB_compute_variable_group_sizeDONE (nvc0,
> radeonsi)
> > +  GL_ARB_compute_variable_group_sizeDONE (nvc0,
> radeonsi, i965)
> >GL_ARB_ES3_2_compatibilityDONE
> (i965/gen8+)
> >GL_ARB_fragment_shader_interlock  DONE (i965)
> >GL_ARB_gpu_shader_int64   DONE
> (i965/gen8+, nvc0, radeonsi, softpipe, llvmpipe)
> > diff --git a/docs/relnotes/18.2.0.html b/docs/relnotes/18.2.0.html
> > index a3f44a29dc..4ceeb7471f 100644
> > --- a/docs/relnotes/18.2.0.html
> > +++ b/docs/relnotes/18.2.0.html
> > @@ -45,6 +45,7 @@ Note: some of the new features are only available with
> certain drivers.
> >
> >  
> >  GL_ARB_fragment_shader_interlock on i965
> > +GL_ARB_compute_variable_group_size on i965
> >  
> >
> >  Bug fixes
> > diff --git a/src/compiler/nir/nir_lower_system_values.c
> b/src/compiler/nir/nir_lower_system_values.c
> > index 487da04262..0af6d69426 100644
> > --- a/src/compiler/nir/nir_lower_system_values.c
> > +++ b/src/compiler/nir/nir_lower_system_values.c
> > @@ -57,6 +57,15 @@ convert_block(nir_block *block, nir_builder *b)
> >*gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID"
> >*/
> >
> > +
> > +  /*
> > +   * If the local work group size is variable we can't lower
> the global
> > +   * invocation id here.
> > +   */
> > +  if (b->shader->info.cs.local_size_variable) {
> > + break;
> > +  }
> > +
>
> There appears to be some tabs vs spaces thing here.
>
> >   nir_const_value local_size;
> >   memset(&local_size, 0, sizeof(local_size));
> >   local_size.u32[0] = b->shader->info.cs.local_size[0];
> > @@ -102,6 +111,11 @@ convert_block(nir_block *block, nir_builder *b)
> >}
> >
> >case SYSTEM_VALUE_LOCAL_GROUP_SIZE: {
> > + /* If the local work group size is variable we can't lower it
> here */
> > + if (b->shader->info.cs.local_size_variable) {
> > +break;
> > + }
> > +
> >   nir_const_value local_size;
> >   memset(&local_size, 0, sizeof(local_size));
> >   local_size.u32[0] = b->shader->info.cs.local_size[0];
> > diff --git a/src/intel/compiler/brw_compiler.h
> b/src/intel/compiler/brw_compiler.h
> > index 8b4e6fe2e2..f54952c28f 100644
> > --- a/src/intel/compiler/brw_compiler.h
> > +++ b/src/intel/compiler/brw_compiler.h
> > @@ -759,6 +759,7 @@ struct brw_cs_prog_data {
> > unsigned threads;
> > bool uses_barrier;
> > bool uses_num_work_groups;
> > +   bool uses_variable_group_size;
> >
> > struct {
> >struct brw_push_const_block cross_thread;
> > @@ -771,6 +772,7 @@ struct brw_cs_prog_data {
> > * surface indices the CS-specific surfaces
> > */
> >uint32_t work_groups_start;
> > +  uint32_t work_group_size_start;
> >/** @} */
> > } binding_table;
> >  };
> > diff --git a/src/intel/compiler/brw_fs.cpp
> b/src/intel/compiler/brw_fs.cpp
> > index d67c0a4192..28730af47b 100644
> > --- a/src/intel/compiler/brw_fs.cpp
> > +++ b/src/intel/compiler/brw_fs.cpp
> > @@ -7228,18 +7228,32 @@ brw_compile_cs(const struct brw_compiler
> *compiler, void *log_data,
> > int shader_time_index,
> >

Re: [Mesa-dev] [PATCH 2/2] i965: Implement ARB_compute_variable_group_size.

2018-06-04 Thread Plamena Manolova
Thank you for reviewing this Jordan!

On Fri, 1 Jun 2018 at 23:45, Jordan Justen 
wrote:

> On 2018-06-01 15:21:34, Plamena Manolova wrote:
> > This patch adds the implentation of ARB_compute_variable_group_size
> > for i965. We do this by storing the group size in a buffer surface,
> > similarly to the work group number.
> >
> > Signed-off-by: Plamena Manolova 
> > ---
> >  docs/features.txt|  2 +-
> >  docs/relnotes/18.2.0.html|  1 +
> >  src/compiler/nir/nir_lower_system_values.c   | 14 
> >  src/intel/compiler/brw_compiler.h|  2 +
> >  src/intel/compiler/brw_fs.cpp| 45 
> >  src/intel/compiler/brw_fs_nir.cpp| 20 ++
> >  src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 87
> +---
> >  src/mesa/drivers/dri/i965/brw_compute.c  | 25 ++-
> >  src/mesa/drivers/dri/i965/brw_context.h  |  1 +
> >  src/mesa/drivers/dri/i965/brw_cs.c   |  4 ++
> >  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 27 +++-
> >  src/mesa/drivers/dri/i965/intel_extensions.c |  1 +
> >  12 files changed, 187 insertions(+), 42 deletions(-)
> >
> > diff --git a/docs/features.txt b/docs/features.txt
> > index ed4050cf98..7c3c856d73 100644
> > --- a/docs/features.txt
> > +++ b/docs/features.txt
> > @@ -298,7 +298,7 @@ Khronos, ARB, and OES extensions that are not part
> of any OpenGL or OpenGL ES ve
> >
> >GL_ARB_bindless_texture   DONE (nvc0,
> radeonsi)
> >GL_ARB_cl_event   not started
> > -  GL_ARB_compute_variable_group_sizeDONE (nvc0,
> radeonsi)
> > +  GL_ARB_compute_variable_group_sizeDONE (nvc0,
> radeonsi, i965)
> >GL_ARB_ES3_2_compatibilityDONE
> (i965/gen8+)
> >GL_ARB_fragment_shader_interlock  DONE (i965)
> >GL_ARB_gpu_shader_int64   DONE
> (i965/gen8+, nvc0, radeonsi, softpipe, llvmpipe)
> > diff --git a/docs/relnotes/18.2.0.html b/docs/relnotes/18.2.0.html
> > index a3f44a29dc..4ceeb7471f 100644
> > --- a/docs/relnotes/18.2.0.html
> > +++ b/docs/relnotes/18.2.0.html
> > @@ -45,6 +45,7 @@ Note: some of the new features are only available with
> certain drivers.
> >
> >  
> >  GL_ARB_fragment_shader_interlock on i965
> > +GL_ARB_compute_variable_group_size on i965
> >  
> >
> >  Bug fixes
> > diff --git a/src/compiler/nir/nir_lower_system_values.c
> b/src/compiler/nir/nir_lower_system_values.c
> > index 487da04262..0af6d69426 100644
> > --- a/src/compiler/nir/nir_lower_system_values.c
> > +++ b/src/compiler/nir/nir_lower_system_values.c
> > @@ -57,6 +57,15 @@ convert_block(nir_block *block, nir_builder *b)
> >*gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID"
> >*/
> >
> > +
>
> Extra line.
>
> > +  /*
> > +   * If the local work group size is variable we can't lower
> the global
> > +   * invocation id here.
> > +   */
> > +  if (b->shader->info.cs.local_size_variable) {
> > + break;
> > +  }
> > +
>
> The indent looks off here. One extra space?
>
> >   nir_const_value local_size;
> >   memset(&local_size, 0, sizeof(local_size));
> >   local_size.u32[0] = b->shader->info.cs.local_size[0];
> > @@ -102,6 +111,11 @@ convert_block(nir_block *block, nir_builder *b)
> >}
> >
> >case SYSTEM_VALUE_LOCAL_GROUP_SIZE: {
> > + /* If the local work group size is variable we can't lower it
> here */
> > + if (b->shader->info.cs.local_size_variable) {
> > +break;
> > + }
> > +
> >   nir_const_value local_size;
> >   memset(&local_size, 0, sizeof(local_size));
> >   local_size.u32[0] = b->shader->info.cs.local_size[0];
> > diff --git a/src/intel/compiler/brw_compiler.h
> b/src/intel/compiler/brw_compiler.h
> > index 8b4e6fe2e2..f54952c28f 100644
> > --- a/src/intel/compiler/brw_compiler.h
> > +++ b/src/intel/compiler/brw_compiler.h
> > @@ -759,6 +759,7 @@ struct brw_cs_prog_data {
> > unsigned threads;
> > bool uses_barrier;
> > bool uses_num_work_groups;
> > +   bool uses_variable_group_size;
> >
> > struct {
> >struct brw_push_const_block cross_thread;
> > @@ -771,6 +772,7 @@ struct brw_cs_prog_data {
> > * surface indices the CS-specific surfaces
> > */
> >uint32_t work_groups_start;
> > +  uint32_t work_group_size_start;
> >/** @} */
> > } binding_table;
> >  };
> > diff --git a/src/intel/compiler/brw_fs.cpp
> b/src/intel/compiler/brw_fs.cpp
> > index d67c0a4192..28730af47b 100644
> > --- a/src/intel/compiler/brw_fs.cpp
> > +++ b/src/intel/compiler/brw_fs.cpp
> > @@ -7228,18 +7228,32 @@ brw_compile_cs(const struct brw_compiler
> *compiler, void *log_data,
> > int shader_time_ind

Re: [Mesa-dev] [PATCH 2/2] i965: Implement ARB_compute_variable_group_size.

2018-06-04 Thread Plamena Manolova
Ah, that's true, I'll switch them around. Thank you for reviewing Matt!

On Sat, 2 Jun 2018 at 04:58, Matt Turner  wrote:

> On Fri, Jun 1, 2018 at 3:21 PM, Plamena Manolova
>  wrote:
> > This patch adds the implentation of ARB_compute_variable_group_size
> > for i965. We do this by storing the group size in a buffer surface,
> > similarly to the work group number.
> >
> > Signed-off-by: Plamena Manolova 
> > ---
> >  docs/features.txt|  2 +-
> >  docs/relnotes/18.2.0.html|  1 +
> >  src/compiler/nir/nir_lower_system_values.c   | 14 
> >  src/intel/compiler/brw_compiler.h|  2 +
> >  src/intel/compiler/brw_fs.cpp| 45 
> >  src/intel/compiler/brw_fs_nir.cpp| 20 ++
> >  src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 87
> +---
> >  src/mesa/drivers/dri/i965/brw_compute.c  | 25 ++-
> >  src/mesa/drivers/dri/i965/brw_context.h  |  1 +
> >  src/mesa/drivers/dri/i965/brw_cs.c   |  4 ++
> >  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 27 +++-
> >  src/mesa/drivers/dri/i965/intel_extensions.c |  1 +
> >  12 files changed, 187 insertions(+), 42 deletions(-)
> >
> > diff --git a/docs/features.txt b/docs/features.txt
> > index ed4050cf98..7c3c856d73 100644
> > --- a/docs/features.txt
> > +++ b/docs/features.txt
> > @@ -298,7 +298,7 @@ Khronos, ARB, and OES extensions that are not part
> of any OpenGL or OpenGL ES ve
> >
> >GL_ARB_bindless_texture   DONE (nvc0,
> radeonsi)
> >GL_ARB_cl_event   not started
> > -  GL_ARB_compute_variable_group_sizeDONE (nvc0,
> radeonsi)
> > +  GL_ARB_compute_variable_group_sizeDONE (nvc0,
> radeonsi, i965)
>
> I think these are typically in alphabetical order, so i965 goes first.
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: Implement ARB_compute_variable_group_size.

2018-06-01 Thread Matt Turner
On Fri, Jun 1, 2018 at 3:21 PM, Plamena Manolova
 wrote:
> This patch adds the implentation of ARB_compute_variable_group_size
> for i965. We do this by storing the group size in a buffer surface,
> similarly to the work group number.
>
> Signed-off-by: Plamena Manolova 
> ---
>  docs/features.txt|  2 +-
>  docs/relnotes/18.2.0.html|  1 +
>  src/compiler/nir/nir_lower_system_values.c   | 14 
>  src/intel/compiler/brw_compiler.h|  2 +
>  src/intel/compiler/brw_fs.cpp| 45 
>  src/intel/compiler/brw_fs_nir.cpp| 20 ++
>  src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 87 
> +---
>  src/mesa/drivers/dri/i965/brw_compute.c  | 25 ++-
>  src/mesa/drivers/dri/i965/brw_context.h  |  1 +
>  src/mesa/drivers/dri/i965/brw_cs.c   |  4 ++
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 27 +++-
>  src/mesa/drivers/dri/i965/intel_extensions.c |  1 +
>  12 files changed, 187 insertions(+), 42 deletions(-)
>
> diff --git a/docs/features.txt b/docs/features.txt
> index ed4050cf98..7c3c856d73 100644
> --- a/docs/features.txt
> +++ b/docs/features.txt
> @@ -298,7 +298,7 @@ Khronos, ARB, and OES extensions that are not part of any 
> OpenGL or OpenGL ES ve
>
>GL_ARB_bindless_texture   DONE (nvc0, radeonsi)
>GL_ARB_cl_event   not started
> -  GL_ARB_compute_variable_group_sizeDONE (nvc0, radeonsi)
> +  GL_ARB_compute_variable_group_sizeDONE (nvc0, 
> radeonsi, i965)

I think these are typically in alphabetical order, so i965 goes first.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: Implement ARB_compute_variable_group_size.

2018-06-01 Thread Jordan Justen
On 2018-06-01 15:21:34, Plamena Manolova wrote:
> This patch adds the implentation of ARB_compute_variable_group_size
> for i965. We do this by storing the group size in a buffer surface,
> similarly to the work group number.
> 
> Signed-off-by: Plamena Manolova 
> ---
>  docs/features.txt|  2 +-
>  docs/relnotes/18.2.0.html|  1 +
>  src/compiler/nir/nir_lower_system_values.c   | 14 
>  src/intel/compiler/brw_compiler.h|  2 +
>  src/intel/compiler/brw_fs.cpp| 45 
>  src/intel/compiler/brw_fs_nir.cpp| 20 ++
>  src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 87 
> +---
>  src/mesa/drivers/dri/i965/brw_compute.c  | 25 ++-
>  src/mesa/drivers/dri/i965/brw_context.h  |  1 +
>  src/mesa/drivers/dri/i965/brw_cs.c   |  4 ++
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 27 +++-
>  src/mesa/drivers/dri/i965/intel_extensions.c |  1 +
>  12 files changed, 187 insertions(+), 42 deletions(-)
> 
> diff --git a/docs/features.txt b/docs/features.txt
> index ed4050cf98..7c3c856d73 100644
> --- a/docs/features.txt
> +++ b/docs/features.txt
> @@ -298,7 +298,7 @@ Khronos, ARB, and OES extensions that are not part of any 
> OpenGL or OpenGL ES ve
>  
>GL_ARB_bindless_texture   DONE (nvc0, radeonsi)
>GL_ARB_cl_event   not started
> -  GL_ARB_compute_variable_group_sizeDONE (nvc0, radeonsi)
> +  GL_ARB_compute_variable_group_sizeDONE (nvc0, 
> radeonsi, i965)
>GL_ARB_ES3_2_compatibilityDONE (i965/gen8+)
>GL_ARB_fragment_shader_interlock  DONE (i965)
>GL_ARB_gpu_shader_int64   DONE (i965/gen8+, 
> nvc0, radeonsi, softpipe, llvmpipe)
> diff --git a/docs/relnotes/18.2.0.html b/docs/relnotes/18.2.0.html
> index a3f44a29dc..4ceeb7471f 100644
> --- a/docs/relnotes/18.2.0.html
> +++ b/docs/relnotes/18.2.0.html
> @@ -45,6 +45,7 @@ Note: some of the new features are only available with 
> certain drivers.
>  
>  
>  GL_ARB_fragment_shader_interlock on i965
> +GL_ARB_compute_variable_group_size on i965
>  
>  
>  Bug fixes
> diff --git a/src/compiler/nir/nir_lower_system_values.c 
> b/src/compiler/nir/nir_lower_system_values.c
> index 487da04262..0af6d69426 100644
> --- a/src/compiler/nir/nir_lower_system_values.c
> +++ b/src/compiler/nir/nir_lower_system_values.c
> @@ -57,6 +57,15 @@ convert_block(nir_block *block, nir_builder *b)
>*gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID"
>*/
>  
> +

Extra line.

> +  /*
> +   * If the local work group size is variable we can't lower the 
> global
> +   * invocation id here.
> +   */
> +  if (b->shader->info.cs.local_size_variable) {
> + break;
> +  }
> +

The indent looks off here. One extra space?

>   nir_const_value local_size;
>   memset(&local_size, 0, sizeof(local_size));
>   local_size.u32[0] = b->shader->info.cs.local_size[0];
> @@ -102,6 +111,11 @@ convert_block(nir_block *block, nir_builder *b)
>}
>  
>case SYSTEM_VALUE_LOCAL_GROUP_SIZE: {
> + /* If the local work group size is variable we can't lower it here 
> */
> + if (b->shader->info.cs.local_size_variable) {
> +break;
> + }
> +
>   nir_const_value local_size;
>   memset(&local_size, 0, sizeof(local_size));
>   local_size.u32[0] = b->shader->info.cs.local_size[0];
> diff --git a/src/intel/compiler/brw_compiler.h 
> b/src/intel/compiler/brw_compiler.h
> index 8b4e6fe2e2..f54952c28f 100644
> --- a/src/intel/compiler/brw_compiler.h
> +++ b/src/intel/compiler/brw_compiler.h
> @@ -759,6 +759,7 @@ struct brw_cs_prog_data {
> unsigned threads;
> bool uses_barrier;
> bool uses_num_work_groups;
> +   bool uses_variable_group_size;
>  
> struct {
>struct brw_push_const_block cross_thread;
> @@ -771,6 +772,7 @@ struct brw_cs_prog_data {
> * surface indices the CS-specific surfaces
> */
>uint32_t work_groups_start;
> +  uint32_t work_group_size_start;
>/** @} */
> } binding_table;
>  };
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index d67c0a4192..28730af47b 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -7228,18 +7228,32 @@ brw_compile_cs(const struct brw_compiler *compiler, 
> void *log_data,
> int shader_time_index,
> char **error_str)
>  {
> -   prog_data->local_size[0] = src_shader->info.cs.local_size[0];
> -   prog_data->local_size[1] = src_shader->info.cs.local_size[1];
> -   prog_data->local_size[2] = src_shader->info.cs.local_size[2];
> -   unsigned local_workgroup_size =
> -

Re: [Mesa-dev] [PATCH 2/2] i965: Implement ARB_compute_variable_group_size.

2018-06-01 Thread Ilia Mirkin
On Fri, Jun 1, 2018 at 6:21 PM, Plamena Manolova
 wrote:
> This patch adds the implentation of ARB_compute_variable_group_size
> for i965. We do this by storing the group size in a buffer surface,
> similarly to the work group number.
>
> Signed-off-by: Plamena Manolova 
> ---
>  docs/features.txt|  2 +-
>  docs/relnotes/18.2.0.html|  1 +
>  src/compiler/nir/nir_lower_system_values.c   | 14 
>  src/intel/compiler/brw_compiler.h|  2 +
>  src/intel/compiler/brw_fs.cpp| 45 
>  src/intel/compiler/brw_fs_nir.cpp| 20 ++
>  src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 87 
> +---
>  src/mesa/drivers/dri/i965/brw_compute.c  | 25 ++-
>  src/mesa/drivers/dri/i965/brw_context.h  |  1 +
>  src/mesa/drivers/dri/i965/brw_cs.c   |  4 ++
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 27 +++-
>  src/mesa/drivers/dri/i965/intel_extensions.c |  1 +
>  12 files changed, 187 insertions(+), 42 deletions(-)
>
> diff --git a/docs/features.txt b/docs/features.txt
> index ed4050cf98..7c3c856d73 100644
> --- a/docs/features.txt
> +++ b/docs/features.txt
> @@ -298,7 +298,7 @@ Khronos, ARB, and OES extensions that are not part of any 
> OpenGL or OpenGL ES ve
>
>GL_ARB_bindless_texture   DONE (nvc0, radeonsi)
>GL_ARB_cl_event   not started
> -  GL_ARB_compute_variable_group_sizeDONE (nvc0, radeonsi)
> +  GL_ARB_compute_variable_group_sizeDONE (nvc0, 
> radeonsi, i965)
>GL_ARB_ES3_2_compatibilityDONE (i965/gen8+)
>GL_ARB_fragment_shader_interlock  DONE (i965)
>GL_ARB_gpu_shader_int64   DONE (i965/gen8+, 
> nvc0, radeonsi, softpipe, llvmpipe)
> diff --git a/docs/relnotes/18.2.0.html b/docs/relnotes/18.2.0.html
> index a3f44a29dc..4ceeb7471f 100644
> --- a/docs/relnotes/18.2.0.html
> +++ b/docs/relnotes/18.2.0.html
> @@ -45,6 +45,7 @@ Note: some of the new features are only available with 
> certain drivers.
>
>  
>  GL_ARB_fragment_shader_interlock on i965
> +GL_ARB_compute_variable_group_size on i965
>  
>
>  Bug fixes
> diff --git a/src/compiler/nir/nir_lower_system_values.c 
> b/src/compiler/nir/nir_lower_system_values.c
> index 487da04262..0af6d69426 100644
> --- a/src/compiler/nir/nir_lower_system_values.c
> +++ b/src/compiler/nir/nir_lower_system_values.c
> @@ -57,6 +57,15 @@ convert_block(nir_block *block, nir_builder *b)
>*gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID"
>*/
>
> +
> +  /*
> +   * If the local work group size is variable we can't lower the 
> global
> +   * invocation id here.
> +   */
> +  if (b->shader->info.cs.local_size_variable) {
> + break;
> +  }
> +

There appears to be some tabs vs spaces thing here.

>   nir_const_value local_size;
>   memset(&local_size, 0, sizeof(local_size));
>   local_size.u32[0] = b->shader->info.cs.local_size[0];
> @@ -102,6 +111,11 @@ convert_block(nir_block *block, nir_builder *b)
>}
>
>case SYSTEM_VALUE_LOCAL_GROUP_SIZE: {
> + /* If the local work group size is variable we can't lower it here 
> */
> + if (b->shader->info.cs.local_size_variable) {
> +break;
> + }
> +
>   nir_const_value local_size;
>   memset(&local_size, 0, sizeof(local_size));
>   local_size.u32[0] = b->shader->info.cs.local_size[0];
> diff --git a/src/intel/compiler/brw_compiler.h 
> b/src/intel/compiler/brw_compiler.h
> index 8b4e6fe2e2..f54952c28f 100644
> --- a/src/intel/compiler/brw_compiler.h
> +++ b/src/intel/compiler/brw_compiler.h
> @@ -759,6 +759,7 @@ struct brw_cs_prog_data {
> unsigned threads;
> bool uses_barrier;
> bool uses_num_work_groups;
> +   bool uses_variable_group_size;
>
> struct {
>struct brw_push_const_block cross_thread;
> @@ -771,6 +772,7 @@ struct brw_cs_prog_data {
> * surface indices the CS-specific surfaces
> */
>uint32_t work_groups_start;
> +  uint32_t work_group_size_start;
>/** @} */
> } binding_table;
>  };
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index d67c0a4192..28730af47b 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -7228,18 +7228,32 @@ brw_compile_cs(const struct brw_compiler *compiler, 
> void *log_data,
> int shader_time_index,
> char **error_str)
>  {
> -   prog_data->local_size[0] = src_shader->info.cs.local_size[0];
> -   prog_data->local_size[1] = src_shader->info.cs.local_size[1];
> -   prog_data->local_size[2] = src_shader->info.cs.local_size[2];
> -   unsigned local_workgroup_size =
> -  src_s