[Mesa-dev] [PATCH 8/8] i965: Use the correct number of threads for compute shaders.

2016-06-10 Thread Kenneth Graunke
We were programming the number of threads per subslice, when we should
have been programming the total number of threads on the GPU as a whole.

Thanks to Curro and Jordan for helping track this down!

On Skylake GT3e:
- Improves performance in Unreal's Elemental Demo by roughly 1.5-1.7x.
- Improves performance in Synmark's Gl43CSDof by roughly 3.7x.
- Improves performance in Synmark's Gl43GSCloth by roughly 1.18x.

On Broadwell GT2:
- Improves performance in Unreal's Elemental Demo by roughly 1.2-1.5x.
- Improves performance in Synmark's Gl43CSDof by roughly 2.0x.
- Improves performance in Synmark's Gl43GSCloth by 1.47035% +/-
  0.255654% (n=25).

On Haswell GT3e:
- Improves performance in Unreal's Elemental Demo (in GL 4.3 mode)
  by roughly 1.10x.
- Improves performance in Synmark's Gl43CSDof by roughly 1.18x.
- Decreases performance in Synmark's Gl43CSCloth by -1.99484% +/-
  0.432771% (n=64).

On Ivybridge GT2:
- Improves performance in Unreal's Elemental Demo (in GL 4.2 mode)
  by roughly 1.03x.
- Improves performance in Synmark's G/43CSDof by roughly 1.25x.
- No change in Synmark's Gl43CSCloth (n=28).

Cc: "12.0" 
Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/gen7_cs_state.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c 
b/src/mesa/drivers/dri/i965/gen7_cs_state.c
index 9d83837..ba558a6 100644
--- a/src/mesa/drivers/dri/i965/gen7_cs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c
@@ -95,7 +95,9 @@ brw_upload_cs_state(struct brw_context *brw)
const uint32_t vfe_num_urb_entries = brw->gen >= 8 ? 2 : 0;
const uint32_t vfe_gpgpu_mode =
   brw->gen == 7 ? SET_FIELD(1, GEN7_MEDIA_VFE_STATE_GPGPU_MODE) : 0;
-   OUT_BATCH(SET_FIELD(brw->max_cs_threads - 1, MEDIA_VFE_STATE_MAX_THREADS) |
+   const uint32_t subslices = MAX2(brw->intelScreen->subslice_total, 1);
+   OUT_BATCH(SET_FIELD(brw->max_cs_threads * subslices - 1,
+   MEDIA_VFE_STATE_MAX_THREADS) |
  SET_FIELD(vfe_num_urb_entries, MEDIA_VFE_STATE_URB_ENTRIES) |
  SET_FIELD(1, MEDIA_VFE_STATE_RESET_GTW_TIMER) |
  SET_FIELD(1, MEDIA_VFE_STATE_BYPASS_GTW) |
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8] i965: Use the correct number of threads for compute shaders.

2016-06-11 Thread Jordan Justen
Series Reviewed-by: Jordan Justen 

On 2016-06-10 13:05:20, Kenneth Graunke wrote:
> We were programming the number of threads per subslice, when we should
> have been programming the total number of threads on the GPU as a whole.
> 
> Thanks to Curro and Jordan for helping track this down!
> 
> On Skylake GT3e:
> - Improves performance in Unreal's Elemental Demo by roughly 1.5-1.7x.
> - Improves performance in Synmark's Gl43CSDof by roughly 3.7x.
> - Improves performance in Synmark's Gl43GSCloth by roughly 1.18x.
> 
> On Broadwell GT2:
> - Improves performance in Unreal's Elemental Demo by roughly 1.2-1.5x.
> - Improves performance in Synmark's Gl43CSDof by roughly 2.0x.
> - Improves performance in Synmark's Gl43GSCloth by 1.47035% +/-
>   0.255654% (n=25).
> 
> On Haswell GT3e:
> - Improves performance in Unreal's Elemental Demo (in GL 4.3 mode)
>   by roughly 1.10x.
> - Improves performance in Synmark's Gl43CSDof by roughly 1.18x.
> - Decreases performance in Synmark's Gl43CSCloth by -1.99484% +/-
>   0.432771% (n=64).
> 
> On Ivybridge GT2:
> - Improves performance in Unreal's Elemental Demo (in GL 4.2 mode)
>   by roughly 1.03x.
> - Improves performance in Synmark's G/43CSDof by roughly 1.25x.
> - No change in Synmark's Gl43CSCloth (n=28).
> 
> Cc: "12.0" 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/gen7_cs_state.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c 
> b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> index 9d83837..ba558a6 100644
> --- a/src/mesa/drivers/dri/i965/gen7_cs_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> @@ -95,7 +95,9 @@ brw_upload_cs_state(struct brw_context *brw)
> const uint32_t vfe_num_urb_entries = brw->gen >= 8 ? 2 : 0;
> const uint32_t vfe_gpgpu_mode =
>brw->gen == 7 ? SET_FIELD(1, GEN7_MEDIA_VFE_STATE_GPGPU_MODE) : 0;
> -   OUT_BATCH(SET_FIELD(brw->max_cs_threads - 1, MEDIA_VFE_STATE_MAX_THREADS) 
> |
> +   const uint32_t subslices = MAX2(brw->intelScreen->subslice_total, 1);
> +   OUT_BATCH(SET_FIELD(brw->max_cs_threads * subslices - 1,
> +   MEDIA_VFE_STATE_MAX_THREADS) |
>   SET_FIELD(vfe_num_urb_entries, MEDIA_VFE_STATE_URB_ENTRIES) |
>   SET_FIELD(1, MEDIA_VFE_STATE_RESET_GTW_TIMER) |
>   SET_FIELD(1, MEDIA_VFE_STATE_BYPASS_GTW) |
> -- 
> 2.8.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev