Re: [Mesa-dev] [PATCH 4/8] i965: Account for poor address calculations in Haswell CS scratch size.

2016-06-11 Thread Jordan Justen
On 2016-06-10 13:05:16, Kenneth Graunke wrote:
> Curro figured this out by investigating the simulator.  Apparently
> there's also a workaround in the Windows driver.  I'm not sure it's
> actually documented anywhere.
> 
> We were underallocating the scratch buffer by a factor of 128/70.
> 
> Cc: "12.0" 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_cs.c | 21 -
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_cs.c 
> b/src/mesa/drivers/dri/i965/brw_cs.c
> index c8598d6..329adff 100644
> --- a/src/mesa/drivers/dri/i965/brw_cs.c
> +++ b/src/mesa/drivers/dri/i965/brw_cs.c
> @@ -150,9 +150,28 @@ brw_codegen_cs_prog(struct brw_context *brw,
>  
> if (prog_data.base.total_scratch) {
>const unsigned subslices = MAX2(brw->intelScreen->subslice_total, 1);
> +
> +  /* WaCSScratchSize:hsw
> +   *
> +   * Haswell's scratch space address calculation appears to be sparse
> +   * rather than tightly packed.  The Thread ID has bits indicating
> +   * which subslice, EU within a subslice, and thread within an EU
> +   * it is.  There's a maximum of two slices and two subslices, so these
> +   * can be stored with a single bit.  Even though there are only 10 EUs
> +   * per subslice, this is stored in 4 bits, so there's an effective
> +   * maximum value of 16 EUs.  Similarly, although there are only 7
> +   * threads per EU, this is stored in a 3 bit number, giving an 
> effective
> +   * maximum value of 8 threads per EU.
> +   *
> +   * This means that we need to use 16 * 8 instead of 10 * 7 for the
> +   * number of threads per subslice.
> +   */
> +  const unsigned threads_per_subslice =

How about naming the variable something like scratch_ids_per_subslice?

-Jordan

> + brw->is_haswell ? 16 * 8 : brw->max_cs_threads;
> +
>brw_get_scratch_bo(brw, >cs.base.scratch_bo,
>   prog_data.base.total_scratch *
> - brw->max_cs_threads * subslices);
> + threads_per_subslice * subslices);
> }
>  
> if (unlikely(INTEL_DEBUG & DEBUG_CS))
> -- 
> 2.8.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/8] i965: Account for poor address calculations in Haswell CS scratch size.

2016-06-10 Thread Kenneth Graunke
Curro figured this out by investigating the simulator.  Apparently
there's also a workaround in the Windows driver.  I'm not sure it's
actually documented anywhere.

We were underallocating the scratch buffer by a factor of 128/70.

Cc: "12.0" 
Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_cs.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_cs.c 
b/src/mesa/drivers/dri/i965/brw_cs.c
index c8598d6..329adff 100644
--- a/src/mesa/drivers/dri/i965/brw_cs.c
+++ b/src/mesa/drivers/dri/i965/brw_cs.c
@@ -150,9 +150,28 @@ brw_codegen_cs_prog(struct brw_context *brw,
 
if (prog_data.base.total_scratch) {
   const unsigned subslices = MAX2(brw->intelScreen->subslice_total, 1);
+
+  /* WaCSScratchSize:hsw
+   *
+   * Haswell's scratch space address calculation appears to be sparse
+   * rather than tightly packed.  The Thread ID has bits indicating
+   * which subslice, EU within a subslice, and thread within an EU
+   * it is.  There's a maximum of two slices and two subslices, so these
+   * can be stored with a single bit.  Even though there are only 10 EUs
+   * per subslice, this is stored in 4 bits, so there's an effective
+   * maximum value of 16 EUs.  Similarly, although there are only 7
+   * threads per EU, this is stored in a 3 bit number, giving an effective
+   * maximum value of 8 threads per EU.
+   *
+   * This means that we need to use 16 * 8 instead of 10 * 7 for the
+   * number of threads per subslice.
+   */
+  const unsigned threads_per_subslice =
+ brw->is_haswell ? 16 * 8 : brw->max_cs_threads;
+
   brw_get_scratch_bo(brw, >cs.base.scratch_bo,
  prog_data.base.total_scratch *
- brw->max_cs_threads * subslices);
+ threads_per_subslice * subslices);
}
 
if (unlikely(INTEL_DEBUG & DEBUG_CS))
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev