Re: [Mesa-dev] [PATCH 2/2] i965: Implement ARB_compute_variable_group_size.
On Mon, 4 Jun 2018 at 21:36, Ilia Mirkin wrote: > On Mon, Jun 4, 2018 at 4:30 PM, Plamena Manolova > wrote: > > Thank you for the review Ilia! > > > > On Fri, 1 Jun 2018 at 23:44, Ilia Mirkin wrote: > >> > >> On Fri, Jun 1, 2018 at 6:21 PM, Plamena Manolova > >> wrote: > >> > + /* > >> > + * If the local work group size is variable we have to use a > >> > dispatch > >> > + * width of 32 here, since at this point we don't know the > actual > >> > size of > >> > + * the workload. > >> > + */ > >> > + min_dispatch_width = 32; > >> > >> Is that a good idea? You are able to specify a different maximum when > >> using a variable size (MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB) > >> s.t. this is 16 (or even 8, although that may be too few for practical > >> use) -- that way you would just set the max to 768 or whatever on > >> gen8+. > > > > > > That's a good point, MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB is the > > same on all platforms, so it makes sense to use simd16 instead. Thank you > > for noticing that. > > Well IIRC gen7/gen7.5 can do something like 1536 invocations with > SIMD16, while gen8 are down to 768. Either way, not forcing SIMD32 may > be nice -- but that's something for you Intel folk to decide. I just > wanted to point out that you could have different max's for "regular" > compute (where the min max is 1024) and variable-size groups. > Ah, I see what you mean now. I took a closer look at MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB and it's related to the maximum number of CS threads (which of course varies). > -ilia > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: Implement ARB_compute_variable_group_size.
On Mon, Jun 4, 2018 at 4:30 PM, Plamena Manolova wrote: > Thank you for the review Ilia! > > On Fri, 1 Jun 2018 at 23:44, Ilia Mirkin wrote: >> >> On Fri, Jun 1, 2018 at 6:21 PM, Plamena Manolova >> wrote: >> > + /* >> > + * If the local work group size is variable we have to use a >> > dispatch >> > + * width of 32 here, since at this point we don't know the actual >> > size of >> > + * the workload. >> > + */ >> > + min_dispatch_width = 32; >> >> Is that a good idea? You are able to specify a different maximum when >> using a variable size (MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB) >> s.t. this is 16 (or even 8, although that may be too few for practical >> use) -- that way you would just set the max to 768 or whatever on >> gen8+. > > > That's a good point, MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB is the > same on all platforms, so it makes sense to use simd16 instead. Thank you > for noticing that. Well IIRC gen7/gen7.5 can do something like 1536 invocations with SIMD16, while gen8 are down to 768. Either way, not forcing SIMD32 may be nice -- but that's something for you Intel folk to decide. I just wanted to point out that you could have different max's for "regular" compute (where the min max is 1024) and variable-size groups. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: Implement ARB_compute_variable_group_size.
Thank you for the review Ilia! On Fri, 1 Jun 2018 at 23:44, Ilia Mirkin wrote: > On Fri, Jun 1, 2018 at 6:21 PM, Plamena Manolova > wrote: > > This patch adds the implentation of ARB_compute_variable_group_size > > for i965. We do this by storing the group size in a buffer surface, > > similarly to the work group number. > > > > Signed-off-by: Plamena Manolova > > --- > > docs/features.txt| 2 +- > > docs/relnotes/18.2.0.html| 1 + > > src/compiler/nir/nir_lower_system_values.c | 14 > > src/intel/compiler/brw_compiler.h| 2 + > > src/intel/compiler/brw_fs.cpp| 45 > > src/intel/compiler/brw_fs_nir.cpp| 20 ++ > > src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 87 > +--- > > src/mesa/drivers/dri/i965/brw_compute.c | 25 ++- > > src/mesa/drivers/dri/i965/brw_context.h | 1 + > > src/mesa/drivers/dri/i965/brw_cs.c | 4 ++ > > src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 27 +++- > > src/mesa/drivers/dri/i965/intel_extensions.c | 1 + > > 12 files changed, 187 insertions(+), 42 deletions(-) > > > > diff --git a/docs/features.txt b/docs/features.txt > > index ed4050cf98..7c3c856d73 100644 > > --- a/docs/features.txt > > +++ b/docs/features.txt > > @@ -298,7 +298,7 @@ Khronos, ARB, and OES extensions that are not part > of any OpenGL or OpenGL ES ve > > > >GL_ARB_bindless_texture DONE (nvc0, > radeonsi) > >GL_ARB_cl_event not started > > - GL_ARB_compute_variable_group_sizeDONE (nvc0, > radeonsi) > > + GL_ARB_compute_variable_group_sizeDONE (nvc0, > radeonsi, i965) > >GL_ARB_ES3_2_compatibilityDONE > (i965/gen8+) > >GL_ARB_fragment_shader_interlock DONE (i965) > >GL_ARB_gpu_shader_int64 DONE > (i965/gen8+, nvc0, radeonsi, softpipe, llvmpipe) > > diff --git a/docs/relnotes/18.2.0.html b/docs/relnotes/18.2.0.html > > index a3f44a29dc..4ceeb7471f 100644 > > --- a/docs/relnotes/18.2.0.html > > +++ b/docs/relnotes/18.2.0.html > > @@ -45,6 +45,7 @@ Note: some of the new features are only available with > certain drivers. > > > > > > GL_ARB_fragment_shader_interlock on i965 > > +GL_ARB_compute_variable_group_size on i965 > > > > > > Bug fixes > > diff --git a/src/compiler/nir/nir_lower_system_values.c > b/src/compiler/nir/nir_lower_system_values.c > > index 487da04262..0af6d69426 100644 > > --- a/src/compiler/nir/nir_lower_system_values.c > > +++ b/src/compiler/nir/nir_lower_system_values.c > > @@ -57,6 +57,15 @@ convert_block(nir_block *block, nir_builder *b) > >*gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID" > >*/ > > > > + > > + /* > > + * If the local work group size is variable we can't lower > the global > > + * invocation id here. > > + */ > > + if (b->shader->info.cs.local_size_variable) { > > + break; > > + } > > + > > There appears to be some tabs vs spaces thing here. > > > nir_const_value local_size; > > memset(&local_size, 0, sizeof(local_size)); > > local_size.u32[0] = b->shader->info.cs.local_size[0]; > > @@ -102,6 +111,11 @@ convert_block(nir_block *block, nir_builder *b) > >} > > > >case SYSTEM_VALUE_LOCAL_GROUP_SIZE: { > > + /* If the local work group size is variable we can't lower it > here */ > > + if (b->shader->info.cs.local_size_variable) { > > +break; > > + } > > + > > nir_const_value local_size; > > memset(&local_size, 0, sizeof(local_size)); > > local_size.u32[0] = b->shader->info.cs.local_size[0]; > > diff --git a/src/intel/compiler/brw_compiler.h > b/src/intel/compiler/brw_compiler.h > > index 8b4e6fe2e2..f54952c28f 100644 > > --- a/src/intel/compiler/brw_compiler.h > > +++ b/src/intel/compiler/brw_compiler.h > > @@ -759,6 +759,7 @@ struct brw_cs_prog_data { > > unsigned threads; > > bool uses_barrier; > > bool uses_num_work_groups; > > + bool uses_variable_group_size; > > > > struct { > >struct brw_push_const_block cross_thread; > > @@ -771,6 +772,7 @@ struct brw_cs_prog_data { > > * surface indices the CS-specific surfaces > > */ > >uint32_t work_groups_start; > > + uint32_t work_group_size_start; > >/** @} */ > > } binding_table; > > }; > > diff --git a/src/intel/compiler/brw_fs.cpp > b/src/intel/compiler/brw_fs.cpp > > index d67c0a4192..28730af47b 100644 > > --- a/src/intel/compiler/brw_fs.cpp > > +++ b/src/intel/compiler/brw_fs.cpp > > @@ -7228,18 +7228,32 @@ brw_compile_cs(const struct brw_compiler > *compiler, void *log_data, > > int shader_time_index, > >
Re: [Mesa-dev] [PATCH 2/2] i965: Implement ARB_compute_variable_group_size.
Thank you for reviewing this Jordan! On Fri, 1 Jun 2018 at 23:45, Jordan Justen wrote: > On 2018-06-01 15:21:34, Plamena Manolova wrote: > > This patch adds the implentation of ARB_compute_variable_group_size > > for i965. We do this by storing the group size in a buffer surface, > > similarly to the work group number. > > > > Signed-off-by: Plamena Manolova > > --- > > docs/features.txt| 2 +- > > docs/relnotes/18.2.0.html| 1 + > > src/compiler/nir/nir_lower_system_values.c | 14 > > src/intel/compiler/brw_compiler.h| 2 + > > src/intel/compiler/brw_fs.cpp| 45 > > src/intel/compiler/brw_fs_nir.cpp| 20 ++ > > src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 87 > +--- > > src/mesa/drivers/dri/i965/brw_compute.c | 25 ++- > > src/mesa/drivers/dri/i965/brw_context.h | 1 + > > src/mesa/drivers/dri/i965/brw_cs.c | 4 ++ > > src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 27 +++- > > src/mesa/drivers/dri/i965/intel_extensions.c | 1 + > > 12 files changed, 187 insertions(+), 42 deletions(-) > > > > diff --git a/docs/features.txt b/docs/features.txt > > index ed4050cf98..7c3c856d73 100644 > > --- a/docs/features.txt > > +++ b/docs/features.txt > > @@ -298,7 +298,7 @@ Khronos, ARB, and OES extensions that are not part > of any OpenGL or OpenGL ES ve > > > >GL_ARB_bindless_texture DONE (nvc0, > radeonsi) > >GL_ARB_cl_event not started > > - GL_ARB_compute_variable_group_sizeDONE (nvc0, > radeonsi) > > + GL_ARB_compute_variable_group_sizeDONE (nvc0, > radeonsi, i965) > >GL_ARB_ES3_2_compatibilityDONE > (i965/gen8+) > >GL_ARB_fragment_shader_interlock DONE (i965) > >GL_ARB_gpu_shader_int64 DONE > (i965/gen8+, nvc0, radeonsi, softpipe, llvmpipe) > > diff --git a/docs/relnotes/18.2.0.html b/docs/relnotes/18.2.0.html > > index a3f44a29dc..4ceeb7471f 100644 > > --- a/docs/relnotes/18.2.0.html > > +++ b/docs/relnotes/18.2.0.html > > @@ -45,6 +45,7 @@ Note: some of the new features are only available with > certain drivers. > > > > > > GL_ARB_fragment_shader_interlock on i965 > > +GL_ARB_compute_variable_group_size on i965 > > > > > > Bug fixes > > diff --git a/src/compiler/nir/nir_lower_system_values.c > b/src/compiler/nir/nir_lower_system_values.c > > index 487da04262..0af6d69426 100644 > > --- a/src/compiler/nir/nir_lower_system_values.c > > +++ b/src/compiler/nir/nir_lower_system_values.c > > @@ -57,6 +57,15 @@ convert_block(nir_block *block, nir_builder *b) > >*gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID" > >*/ > > > > + > > Extra line. > > > + /* > > + * If the local work group size is variable we can't lower > the global > > + * invocation id here. > > + */ > > + if (b->shader->info.cs.local_size_variable) { > > + break; > > + } > > + > > The indent looks off here. One extra space? > > > nir_const_value local_size; > > memset(&local_size, 0, sizeof(local_size)); > > local_size.u32[0] = b->shader->info.cs.local_size[0]; > > @@ -102,6 +111,11 @@ convert_block(nir_block *block, nir_builder *b) > >} > > > >case SYSTEM_VALUE_LOCAL_GROUP_SIZE: { > > + /* If the local work group size is variable we can't lower it > here */ > > + if (b->shader->info.cs.local_size_variable) { > > +break; > > + } > > + > > nir_const_value local_size; > > memset(&local_size, 0, sizeof(local_size)); > > local_size.u32[0] = b->shader->info.cs.local_size[0]; > > diff --git a/src/intel/compiler/brw_compiler.h > b/src/intel/compiler/brw_compiler.h > > index 8b4e6fe2e2..f54952c28f 100644 > > --- a/src/intel/compiler/brw_compiler.h > > +++ b/src/intel/compiler/brw_compiler.h > > @@ -759,6 +759,7 @@ struct brw_cs_prog_data { > > unsigned threads; > > bool uses_barrier; > > bool uses_num_work_groups; > > + bool uses_variable_group_size; > > > > struct { > >struct brw_push_const_block cross_thread; > > @@ -771,6 +772,7 @@ struct brw_cs_prog_data { > > * surface indices the CS-specific surfaces > > */ > >uint32_t work_groups_start; > > + uint32_t work_group_size_start; > >/** @} */ > > } binding_table; > > }; > > diff --git a/src/intel/compiler/brw_fs.cpp > b/src/intel/compiler/brw_fs.cpp > > index d67c0a4192..28730af47b 100644 > > --- a/src/intel/compiler/brw_fs.cpp > > +++ b/src/intel/compiler/brw_fs.cpp > > @@ -7228,18 +7228,32 @@ brw_compile_cs(const struct brw_compiler > *compiler, void *log_data, > > int shader_time_ind
Re: [Mesa-dev] [PATCH 2/2] i965: Implement ARB_compute_variable_group_size.
Ah, that's true, I'll switch them around. Thank you for reviewing Matt! On Sat, 2 Jun 2018 at 04:58, Matt Turner wrote: > On Fri, Jun 1, 2018 at 3:21 PM, Plamena Manolova > wrote: > > This patch adds the implentation of ARB_compute_variable_group_size > > for i965. We do this by storing the group size in a buffer surface, > > similarly to the work group number. > > > > Signed-off-by: Plamena Manolova > > --- > > docs/features.txt| 2 +- > > docs/relnotes/18.2.0.html| 1 + > > src/compiler/nir/nir_lower_system_values.c | 14 > > src/intel/compiler/brw_compiler.h| 2 + > > src/intel/compiler/brw_fs.cpp| 45 > > src/intel/compiler/brw_fs_nir.cpp| 20 ++ > > src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 87 > +--- > > src/mesa/drivers/dri/i965/brw_compute.c | 25 ++- > > src/mesa/drivers/dri/i965/brw_context.h | 1 + > > src/mesa/drivers/dri/i965/brw_cs.c | 4 ++ > > src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 27 +++- > > src/mesa/drivers/dri/i965/intel_extensions.c | 1 + > > 12 files changed, 187 insertions(+), 42 deletions(-) > > > > diff --git a/docs/features.txt b/docs/features.txt > > index ed4050cf98..7c3c856d73 100644 > > --- a/docs/features.txt > > +++ b/docs/features.txt > > @@ -298,7 +298,7 @@ Khronos, ARB, and OES extensions that are not part > of any OpenGL or OpenGL ES ve > > > >GL_ARB_bindless_texture DONE (nvc0, > radeonsi) > >GL_ARB_cl_event not started > > - GL_ARB_compute_variable_group_sizeDONE (nvc0, > radeonsi) > > + GL_ARB_compute_variable_group_sizeDONE (nvc0, > radeonsi, i965) > > I think these are typically in alphabetical order, so i965 goes first. > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: Implement ARB_compute_variable_group_size.
On Fri, Jun 1, 2018 at 3:21 PM, Plamena Manolova wrote: > This patch adds the implentation of ARB_compute_variable_group_size > for i965. We do this by storing the group size in a buffer surface, > similarly to the work group number. > > Signed-off-by: Plamena Manolova > --- > docs/features.txt| 2 +- > docs/relnotes/18.2.0.html| 1 + > src/compiler/nir/nir_lower_system_values.c | 14 > src/intel/compiler/brw_compiler.h| 2 + > src/intel/compiler/brw_fs.cpp| 45 > src/intel/compiler/brw_fs_nir.cpp| 20 ++ > src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 87 > +--- > src/mesa/drivers/dri/i965/brw_compute.c | 25 ++- > src/mesa/drivers/dri/i965/brw_context.h | 1 + > src/mesa/drivers/dri/i965/brw_cs.c | 4 ++ > src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 27 +++- > src/mesa/drivers/dri/i965/intel_extensions.c | 1 + > 12 files changed, 187 insertions(+), 42 deletions(-) > > diff --git a/docs/features.txt b/docs/features.txt > index ed4050cf98..7c3c856d73 100644 > --- a/docs/features.txt > +++ b/docs/features.txt > @@ -298,7 +298,7 @@ Khronos, ARB, and OES extensions that are not part of any > OpenGL or OpenGL ES ve > >GL_ARB_bindless_texture DONE (nvc0, radeonsi) >GL_ARB_cl_event not started > - GL_ARB_compute_variable_group_sizeDONE (nvc0, radeonsi) > + GL_ARB_compute_variable_group_sizeDONE (nvc0, > radeonsi, i965) I think these are typically in alphabetical order, so i965 goes first. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: Implement ARB_compute_variable_group_size.
On 2018-06-01 15:21:34, Plamena Manolova wrote: > This patch adds the implentation of ARB_compute_variable_group_size > for i965. We do this by storing the group size in a buffer surface, > similarly to the work group number. > > Signed-off-by: Plamena Manolova > --- > docs/features.txt| 2 +- > docs/relnotes/18.2.0.html| 1 + > src/compiler/nir/nir_lower_system_values.c | 14 > src/intel/compiler/brw_compiler.h| 2 + > src/intel/compiler/brw_fs.cpp| 45 > src/intel/compiler/brw_fs_nir.cpp| 20 ++ > src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 87 > +--- > src/mesa/drivers/dri/i965/brw_compute.c | 25 ++- > src/mesa/drivers/dri/i965/brw_context.h | 1 + > src/mesa/drivers/dri/i965/brw_cs.c | 4 ++ > src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 27 +++- > src/mesa/drivers/dri/i965/intel_extensions.c | 1 + > 12 files changed, 187 insertions(+), 42 deletions(-) > > diff --git a/docs/features.txt b/docs/features.txt > index ed4050cf98..7c3c856d73 100644 > --- a/docs/features.txt > +++ b/docs/features.txt > @@ -298,7 +298,7 @@ Khronos, ARB, and OES extensions that are not part of any > OpenGL or OpenGL ES ve > >GL_ARB_bindless_texture DONE (nvc0, radeonsi) >GL_ARB_cl_event not started > - GL_ARB_compute_variable_group_sizeDONE (nvc0, radeonsi) > + GL_ARB_compute_variable_group_sizeDONE (nvc0, > radeonsi, i965) >GL_ARB_ES3_2_compatibilityDONE (i965/gen8+) >GL_ARB_fragment_shader_interlock DONE (i965) >GL_ARB_gpu_shader_int64 DONE (i965/gen8+, > nvc0, radeonsi, softpipe, llvmpipe) > diff --git a/docs/relnotes/18.2.0.html b/docs/relnotes/18.2.0.html > index a3f44a29dc..4ceeb7471f 100644 > --- a/docs/relnotes/18.2.0.html > +++ b/docs/relnotes/18.2.0.html > @@ -45,6 +45,7 @@ Note: some of the new features are only available with > certain drivers. > > > GL_ARB_fragment_shader_interlock on i965 > +GL_ARB_compute_variable_group_size on i965 > > > Bug fixes > diff --git a/src/compiler/nir/nir_lower_system_values.c > b/src/compiler/nir/nir_lower_system_values.c > index 487da04262..0af6d69426 100644 > --- a/src/compiler/nir/nir_lower_system_values.c > +++ b/src/compiler/nir/nir_lower_system_values.c > @@ -57,6 +57,15 @@ convert_block(nir_block *block, nir_builder *b) >*gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID" >*/ > > + Extra line. > + /* > + * If the local work group size is variable we can't lower the > global > + * invocation id here. > + */ > + if (b->shader->info.cs.local_size_variable) { > + break; > + } > + The indent looks off here. One extra space? > nir_const_value local_size; > memset(&local_size, 0, sizeof(local_size)); > local_size.u32[0] = b->shader->info.cs.local_size[0]; > @@ -102,6 +111,11 @@ convert_block(nir_block *block, nir_builder *b) >} > >case SYSTEM_VALUE_LOCAL_GROUP_SIZE: { > + /* If the local work group size is variable we can't lower it here > */ > + if (b->shader->info.cs.local_size_variable) { > +break; > + } > + > nir_const_value local_size; > memset(&local_size, 0, sizeof(local_size)); > local_size.u32[0] = b->shader->info.cs.local_size[0]; > diff --git a/src/intel/compiler/brw_compiler.h > b/src/intel/compiler/brw_compiler.h > index 8b4e6fe2e2..f54952c28f 100644 > --- a/src/intel/compiler/brw_compiler.h > +++ b/src/intel/compiler/brw_compiler.h > @@ -759,6 +759,7 @@ struct brw_cs_prog_data { > unsigned threads; > bool uses_barrier; > bool uses_num_work_groups; > + bool uses_variable_group_size; > > struct { >struct brw_push_const_block cross_thread; > @@ -771,6 +772,7 @@ struct brw_cs_prog_data { > * surface indices the CS-specific surfaces > */ >uint32_t work_groups_start; > + uint32_t work_group_size_start; >/** @} */ > } binding_table; > }; > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp > index d67c0a4192..28730af47b 100644 > --- a/src/intel/compiler/brw_fs.cpp > +++ b/src/intel/compiler/brw_fs.cpp > @@ -7228,18 +7228,32 @@ brw_compile_cs(const struct brw_compiler *compiler, > void *log_data, > int shader_time_index, > char **error_str) > { > - prog_data->local_size[0] = src_shader->info.cs.local_size[0]; > - prog_data->local_size[1] = src_shader->info.cs.local_size[1]; > - prog_data->local_size[2] = src_shader->info.cs.local_size[2]; > - unsigned local_workgroup_size = > -
Re: [Mesa-dev] [PATCH 2/2] i965: Implement ARB_compute_variable_group_size.
On Fri, Jun 1, 2018 at 6:21 PM, Plamena Manolova wrote: > This patch adds the implentation of ARB_compute_variable_group_size > for i965. We do this by storing the group size in a buffer surface, > similarly to the work group number. > > Signed-off-by: Plamena Manolova > --- > docs/features.txt| 2 +- > docs/relnotes/18.2.0.html| 1 + > src/compiler/nir/nir_lower_system_values.c | 14 > src/intel/compiler/brw_compiler.h| 2 + > src/intel/compiler/brw_fs.cpp| 45 > src/intel/compiler/brw_fs_nir.cpp| 20 ++ > src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 87 > +--- > src/mesa/drivers/dri/i965/brw_compute.c | 25 ++- > src/mesa/drivers/dri/i965/brw_context.h | 1 + > src/mesa/drivers/dri/i965/brw_cs.c | 4 ++ > src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 27 +++- > src/mesa/drivers/dri/i965/intel_extensions.c | 1 + > 12 files changed, 187 insertions(+), 42 deletions(-) > > diff --git a/docs/features.txt b/docs/features.txt > index ed4050cf98..7c3c856d73 100644 > --- a/docs/features.txt > +++ b/docs/features.txt > @@ -298,7 +298,7 @@ Khronos, ARB, and OES extensions that are not part of any > OpenGL or OpenGL ES ve > >GL_ARB_bindless_texture DONE (nvc0, radeonsi) >GL_ARB_cl_event not started > - GL_ARB_compute_variable_group_sizeDONE (nvc0, radeonsi) > + GL_ARB_compute_variable_group_sizeDONE (nvc0, > radeonsi, i965) >GL_ARB_ES3_2_compatibilityDONE (i965/gen8+) >GL_ARB_fragment_shader_interlock DONE (i965) >GL_ARB_gpu_shader_int64 DONE (i965/gen8+, > nvc0, radeonsi, softpipe, llvmpipe) > diff --git a/docs/relnotes/18.2.0.html b/docs/relnotes/18.2.0.html > index a3f44a29dc..4ceeb7471f 100644 > --- a/docs/relnotes/18.2.0.html > +++ b/docs/relnotes/18.2.0.html > @@ -45,6 +45,7 @@ Note: some of the new features are only available with > certain drivers. > > > GL_ARB_fragment_shader_interlock on i965 > +GL_ARB_compute_variable_group_size on i965 > > > Bug fixes > diff --git a/src/compiler/nir/nir_lower_system_values.c > b/src/compiler/nir/nir_lower_system_values.c > index 487da04262..0af6d69426 100644 > --- a/src/compiler/nir/nir_lower_system_values.c > +++ b/src/compiler/nir/nir_lower_system_values.c > @@ -57,6 +57,15 @@ convert_block(nir_block *block, nir_builder *b) >*gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID" >*/ > > + > + /* > + * If the local work group size is variable we can't lower the > global > + * invocation id here. > + */ > + if (b->shader->info.cs.local_size_variable) { > + break; > + } > + There appears to be some tabs vs spaces thing here. > nir_const_value local_size; > memset(&local_size, 0, sizeof(local_size)); > local_size.u32[0] = b->shader->info.cs.local_size[0]; > @@ -102,6 +111,11 @@ convert_block(nir_block *block, nir_builder *b) >} > >case SYSTEM_VALUE_LOCAL_GROUP_SIZE: { > + /* If the local work group size is variable we can't lower it here > */ > + if (b->shader->info.cs.local_size_variable) { > +break; > + } > + > nir_const_value local_size; > memset(&local_size, 0, sizeof(local_size)); > local_size.u32[0] = b->shader->info.cs.local_size[0]; > diff --git a/src/intel/compiler/brw_compiler.h > b/src/intel/compiler/brw_compiler.h > index 8b4e6fe2e2..f54952c28f 100644 > --- a/src/intel/compiler/brw_compiler.h > +++ b/src/intel/compiler/brw_compiler.h > @@ -759,6 +759,7 @@ struct brw_cs_prog_data { > unsigned threads; > bool uses_barrier; > bool uses_num_work_groups; > + bool uses_variable_group_size; > > struct { >struct brw_push_const_block cross_thread; > @@ -771,6 +772,7 @@ struct brw_cs_prog_data { > * surface indices the CS-specific surfaces > */ >uint32_t work_groups_start; > + uint32_t work_group_size_start; >/** @} */ > } binding_table; > }; > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp > index d67c0a4192..28730af47b 100644 > --- a/src/intel/compiler/brw_fs.cpp > +++ b/src/intel/compiler/brw_fs.cpp > @@ -7228,18 +7228,32 @@ brw_compile_cs(const struct brw_compiler *compiler, > void *log_data, > int shader_time_index, > char **error_str) > { > - prog_data->local_size[0] = src_shader->info.cs.local_size[0]; > - prog_data->local_size[1] = src_shader->info.cs.local_size[1]; > - prog_data->local_size[2] = src_shader->info.cs.local_size[2]; > - unsigned local_workgroup_size = > - src_s