Re: [Mesa-dev] [RFC 2/2] i965: add support for image AoA

2015-08-17 Thread Francisco Jerez
Timothy Arceri t_arc...@yahoo.com.au writes:

 On Sun, 2015-08-16 at 13:15 +0300, Francisco Jerez wrote:
 Timothy Arceri t_arc...@yahoo.com.au writes:
 
  On Sat, 2015-08-15 at 17:33 +0300, Francisco Jerez wrote:
   Timothy Arceri t_arc...@yahoo.com.au writes:
   
On Wed, 2015-08-12 at 19:39 +1000, Timothy Arceri wrote:
 Cc: Francisco Jerez curroje...@riseup.net
 ---
  This patch works for direct indexing of images, but I'm having some 
 trouble
  getting indirect to work.
 
  For example for:
  tests/spec/arb_arrays_of_arrays/execution/image_store/basic
 -imageStore
  -non-const-uniform-index.shader_test
 
  Which has and image writeonly uniform image2D tex[2][2]
 
  Indirect indexing will work for tex[0][0] and text[0][1] but not 
 for
  tex[1][0] and tex[1][1] they seem to always end up refering to the
  image in 0.

Just to add some more to this, I'm pretty sure my code is generating 
the
correct offsets. If I hardcode the img_offset offset to 72 to get the 
uniform
value of tex[1][0] I get the value I expected, but if I set 
image.reladdr 
to a
register that contains 72 I don't what I expect.

If I change the array to be a single dimension e.g. tex[4] and I 
hardcode 
the
offset as described above then it works as expected for both 
scenarios, it
also works if I split the offset across img_offset and image.reladdr, 
there is
something going on with image.reladdr for multi-dimensional arrays 
that I 
can'
t quite put my finger on.

Any hints appreciated.

   Odd, can you attach an assembly dump?
   
   Thanks.
  
  I wasn't sure what would be the most helpful so I've attached a few 
  different
  dumps.
  
  image_dump = 1D array indirect piglit test, without this patch 
  (Result=pass)
  image_dump2 = 2D array indirect piglit test, with this patch (Result=fail)
  image_dump3 = 1D array indirect piglit test, with this patch (Result=pass)
  
  image_dump4 = 1D array indirect piglit test, hardcoded register with 72 
  offset
(Result=pass)
  image_dump5 = 2D array indirect piglit test, hardcoded register with 72 
  offset
  (Result=fail)
  
  image_dump4 vs image_dump5 is interesting because the output matches which 
  is
  what I would have expected, but the result differs. Then with the offset 
  below
  it seems to work as expected suggesting everything else is setup 
  correctly.
  
  image_dump6 = 1D array indirect piglit test, hardcoded 72 offset in 
  img_offset
 (Result=pass)
  image_dump7 = 2D array indirect piglit test, hardcoded 72 offset in 
  img_offset
 (Result=pass)
   
 
 Yeah... The assembly output looks correct to me in the cases that fail.
 AFAICT what all the failing cases have in common is that the array of
 image uniforms is demoted to pull constants (see demote_pull_constants()
 and move_uniform_array_access_to_pull_constants()),
  so most likely
 something goes wrong while doing that for multidimensional arrays.  I
 have the suspicion that this is the source of the problem:
 
  --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  @@ -226,6 +226,7 @@ fs_visitor::nir_setup_uniform(nir_variable *var)
 * our name.
 */
  unsigned index = var-data.driver_location;
  +   bool set_image_location = true;
  for (unsigned u = 0; u  shader_prog-NumUniformStorage; u++) {
 struct gl_uniform_storage *storage = shader_prog
  -UniformStorage[u];
  
  @@ -244,7 +245,13 @@ fs_visitor::nir_setup_uniform(nir_variable *var)
 * because their size is driver-specific, so we need to allocate
 * space for them here at the end of the parameter array.
 */
  - var-data.driver_location = uniforms;
  + if (set_image_location) {
  +/* For arrays of arrays we only want to set this once at the 
  base
  + * location.
  + */
  +var-data.driver_location = uniforms;
  +set_image_location = false;
  + }
param_size[uniforms] =
   BRW_IMAGE_PARAM_SIZE * MAX2(storage-array_elements, 1);
 
 The assumption that this code was previously making was that a given
 image array would be stored in a single gl_uniform_storage entry,
 otherwise param_size[var-data.driver_location] won't match the real
 size of the array and move_uniform_array_access_to_pull_constants() will
 have no idea how large the array really is, so it will only be able to
 move part of the array to the pull constant buffer and later on the
 indirect array access will read past the end of the initialized area of
 the constant buffer.

 Thanks for the reply. I'm still getting to know the i965 backend it would have
 taken me a while to get to this point, make a lot more sense now. The code
 below does the trick all of the indirect tests now pass.

 Thanks so much for 

Re: [Mesa-dev] [RFC 2/2] i965: add support for image AoA

2015-08-17 Thread Timothy Arceri
On Sun, 2015-08-16 at 13:15 +0300, Francisco Jerez wrote:
 Timothy Arceri t_arc...@yahoo.com.au writes:
 
  On Sat, 2015-08-15 at 17:33 +0300, Francisco Jerez wrote:
   Timothy Arceri t_arc...@yahoo.com.au writes:
   
On Wed, 2015-08-12 at 19:39 +1000, Timothy Arceri wrote:
 Cc: Francisco Jerez curroje...@riseup.net
 ---
  This patch works for direct indexing of images, but I'm having some 
 trouble
  getting indirect to work.
 
  For example for:
  tests/spec/arb_arrays_of_arrays/execution/image_store/basic
 -imageStore
  -non-const-uniform-index.shader_test
 
  Which has and image writeonly uniform image2D tex[2][2]
 
  Indirect indexing will work for tex[0][0] and text[0][1] but not 
 for
  tex[1][0] and tex[1][1] they seem to always end up refering to the
  image in 0.

Just to add some more to this, I'm pretty sure my code is generating 
the
correct offsets. If I hardcode the img_offset offset to 72 to get the 
uniform
value of tex[1][0] I get the value I expected, but if I set 
image.reladdr 
to a
register that contains 72 I don't what I expect.

If I change the array to be a single dimension e.g. tex[4] and I 
hardcode 
the
offset as described above then it works as expected for both 
scenarios, it
also works if I split the offset across img_offset and image.reladdr, 
there is
something going on with image.reladdr for multi-dimensional arrays 
that I 
can'
t quite put my finger on.

Any hints appreciated.

   Odd, can you attach an assembly dump?
   
   Thanks.
  
  I wasn't sure what would be the most helpful so I've attached a few 
  different
  dumps.
  
  image_dump = 1D array indirect piglit test, without this patch 
  (Result=pass)
  image_dump2 = 2D array indirect piglit test, with this patch (Result=fail)
  image_dump3 = 1D array indirect piglit test, with this patch (Result=pass)
  
  image_dump4 = 1D array indirect piglit test, hardcoded register with 72 
  offset
(Result=pass)
  image_dump5 = 2D array indirect piglit test, hardcoded register with 72 
  offset
  (Result=fail)
  
  image_dump4 vs image_dump5 is interesting because the output matches which 
  is
  what I would have expected, but the result differs. Then with the offset 
  below
  it seems to work as expected suggesting everything else is setup 
  correctly.
  
  image_dump6 = 1D array indirect piglit test, hardcoded 72 offset in 
  img_offset
 (Result=pass)
  image_dump7 = 2D array indirect piglit test, hardcoded 72 offset in 
  img_offset
 (Result=pass)
   
 
 Yeah... The assembly output looks correct to me in the cases that fail.
 AFAICT what all the failing cases have in common is that the array of
 image uniforms is demoted to pull constants (see demote_pull_constants()
 and move_uniform_array_access_to_pull_constants()),
  so most likely
 something goes wrong while doing that for multidimensional arrays.  I
 have the suspicion that this is the source of the problem:
 
  --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  @@ -226,6 +226,7 @@ fs_visitor::nir_setup_uniform(nir_variable *var)
 * our name.
 */
  unsigned index = var-data.driver_location;
  +   bool set_image_location = true;
  for (unsigned u = 0; u  shader_prog-NumUniformStorage; u++) {
 struct gl_uniform_storage *storage = shader_prog
  -UniformStorage[u];
  
  @@ -244,7 +245,13 @@ fs_visitor::nir_setup_uniform(nir_variable *var)
 * because their size is driver-specific, so we need to allocate
 * space for them here at the end of the parameter array.
 */
  - var-data.driver_location = uniforms;
  + if (set_image_location) {
  +/* For arrays of arrays we only want to set this once at the 
  base
  + * location.
  + */
  +var-data.driver_location = uniforms;
  +set_image_location = false;
  + }
param_size[uniforms] =
   BRW_IMAGE_PARAM_SIZE * MAX2(storage-array_elements, 1);
 
 The assumption that this code was previously making was that a given
 image array would be stored in a single gl_uniform_storage entry,
 otherwise param_size[var-data.driver_location] won't match the real
 size of the array and move_uniform_array_access_to_pull_constants() will
 have no idea how large the array really is, so it will only be able to
 move part of the array to the pull constant buffer and later on the
 indirect array access will read past the end of the initialized area of
 the constant buffer.

Thanks for the reply. I'm still getting to know the i965 backend it would have
taken me a while to get to this point, make a lot more sense now. The code
below does the trick all of the indirect tests now pass.

Thanks so much for your help :)

 
 I think something like this would do 

Re: [Mesa-dev] [RFC 2/2] i965: add support for image AoA

2015-08-16 Thread Francisco Jerez
Timothy Arceri t_arc...@yahoo.com.au writes:

 On Sat, 2015-08-15 at 17:33 +0300, Francisco Jerez wrote:
 Timothy Arceri t_arc...@yahoo.com.au writes:
 
  On Wed, 2015-08-12 at 19:39 +1000, Timothy Arceri wrote:
   Cc: Francisco Jerez curroje...@riseup.net
   ---
This patch works for direct indexing of images, but I'm having some 
   trouble
getting indirect to work.
   
For example for:
tests/spec/arb_arrays_of_arrays/execution/image_store/basic-imageStore
-non-const-uniform-index.shader_test
   
Which has and image writeonly uniform image2D tex[2][2]
   
Indirect indexing will work for tex[0][0] and text[0][1] but not for
tex[1][0] and tex[1][1] they seem to always end up refering to the
image in 0.
  
  Just to add some more to this, I'm pretty sure my code is generating the
  correct offsets. If I hardcode the img_offset offset to 72 to get the 
  uniform
  value of tex[1][0] I get the value I expected, but if I set image.reladdr 
  to a
  register that contains 72 I don't what I expect.
  
  If I change the array to be a single dimension e.g. tex[4] and I hardcode 
  the
  offset as described above then it works as expected for both scenarios, it
  also works if I split the offset across img_offset and image.reladdr, 
  there is
  something going on with image.reladdr for multi-dimensional arrays that I 
  can'
  t quite put my finger on.
  
  Any hints appreciated.
  
 Odd, can you attach an assembly dump?
 
 Thanks.

 I wasn't sure what would be the most helpful so I've attached a few different
 dumps.

 image_dump = 1D array indirect piglit test, without this patch (Result=pass)
 image_dump2 = 2D array indirect piglit test, with this patch (Result=fail)
 image_dump3 = 1D array indirect piglit test, with this patch (Result=pass)

 image_dump4 = 1D array indirect piglit test, hardcoded register with 72 offset
   (Result=pass)
 image_dump5 = 2D array indirect piglit test, hardcoded register with 72 offset
 (Result=fail)

 image_dump4 vs image_dump5 is interesting because the output matches which is
 what I would have expected, but the result differs. Then with the offset below
 it seems to work as expected suggesting everything else is setup correctly.

 image_dump6 = 1D array indirect piglit test, hardcoded 72 offset in img_offset
(Result=pass)
 image_dump7 = 2D array indirect piglit test, hardcoded 72 offset in img_offset
(Result=pass)
  

Yeah... The assembly output looks correct to me in the cases that fail.
AFAICT what all the failing cases have in common is that the array of
image uniforms is demoted to pull constants (see demote_pull_constants()
and move_uniform_array_access_to_pull_constants()), so most likely
something goes wrong while doing that for multidimensional arrays.  I
have the suspicion that this is the source of the problem:

| --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
| +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
| @@ -226,6 +226,7 @@ fs_visitor::nir_setup_uniform(nir_variable *var)
|* our name.
|*/
| unsigned index = var-data.driver_location;
| +   bool set_image_location = true;
| for (unsigned u = 0; u  shader_prog-NumUniformStorage; u++) {
|struct gl_uniform_storage *storage = shader_prog-UniformStorage[u];
|
| @@ -244,7 +245,13 @@ fs_visitor::nir_setup_uniform(nir_variable *var)
|* because their size is driver-specific, so we need to allocate
|* space for them here at the end of the parameter array.
|*/
| - var-data.driver_location = uniforms;
| + if (set_image_location) {
| +/* For arrays of arrays we only want to set this once at the base
| + * location.
| + */
| +var-data.driver_location = uniforms;
| +set_image_location = false;
| + }
|   param_size[uniforms] =
|  BRW_IMAGE_PARAM_SIZE * MAX2(storage-array_elements, 1);

The assumption that this code was previously making was that a given
image array would be stored in a single gl_uniform_storage entry,
otherwise param_size[var-data.driver_location] won't match the real
size of the array and move_uniform_array_access_to_pull_constants() will
have no idea how large the array really is, so it will only be able to
move part of the array to the pull constant buffer and later on the
indirect array access will read past the end of the initialized area of
the constant buffer.

I think something like this would do what you want:

| if (var-type-without_array()-is_image()) {
|/* Images don't get a valid location assigned by nir_lower_io()
| * because their size is driver-specific, so we need to allocate
| * space for them here at the end of the parameter array.
| */
|var-data.driver_location = uniforms;
| }
|[...]
| for (unsigned u = 0; u  shader_prog-NumUniformStorage; u++) {
|if (storage-type-is_image()) {
|   

Re: [Mesa-dev] [RFC 2/2] i965: add support for image AoA

2015-08-15 Thread Timothy Arceri
On Wed, 2015-08-12 at 19:39 +1000, Timothy Arceri wrote:
 Cc: Francisco Jerez curroje...@riseup.net
 ---
  This patch works for direct indexing of images, but I'm having some trouble
  getting indirect to work.
 
  For example for:
  tests/spec/arb_arrays_of_arrays/execution/image_store/basic-imageStore
  -non-const-uniform-index.shader_test
 
  Which has and image writeonly uniform image2D tex[2][2]
 
  Indirect indexing will work for tex[0][0] and text[0][1] but not for
  tex[1][0] and tex[1][1] they seem to always end up refering to the
  image in 0.

Just to add some more to this, I'm pretty sure my code is generating the
correct offsets. If I hardcode the img_offset offset to 72 to get the uniform
value of tex[1][0] I get the value I expected, but if I set image.reladdr to a
register that contains 72 I don't what I expect.

If I change the array to be a single dimension e.g. tex[4] and I hardcode the
offset as described above then it works as expected for both scenarios, it
also works if I split the offset across img_offset and image.reladdr, there is
something going on with image.reladdr for multi-dimensional arrays that I can'
t quite put my finger on.

Any hints appreciated.


 
  I can't quite seem to see either my mistake or what I'm missing so I 
 thought
  I'd send this out, and see if anyone has any ideas. I've also sent some
  tests with mixed direct/indirect indexing which seem to calculate the 
 correct
  offest for the direct but the indirect indexing is not working there 
 either. 
 
  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 53 ++---
 ---
  1 file changed, 36 insertions(+), 17 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 index d7a2500..a49c230 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 @@ -226,6 +226,7 @@ fs_visitor::nir_setup_uniform(nir_variable *var)
* our name.
*/
 unsigned index = var-data.driver_location;
 +   bool set_image_location = true;
 for (unsigned u = 0; u  shader_prog-NumUniformStorage; u++) {
struct gl_uniform_storage *storage = shader_prog-UniformStorage[u];
  
 @@ -244,7 +245,13 @@ fs_visitor::nir_setup_uniform(nir_variable *var)
* because their size is driver-specific, so we need to allocate
* space for them here at the end of the parameter array.
*/
 - var-data.driver_location = uniforms;
 + if (set_image_location) {
 +/* For arrays of arrays we only want to set this once at the 
 base
 + * location.
 + */
 +var-data.driver_location = uniforms;
 +set_image_location = false;
 + }
   param_size[uniforms] =
  BRW_IMAGE_PARAM_SIZE * MAX2(storage-array_elements, 1);
  
 @@ -1165,19 +1172,27 @@ fs_visitor::get_nir_image_deref(const nir_deref_var 
 *deref)
  {
 fs_reg image(UNIFORM, deref-var-data.driver_location,
  BRW_REGISTER_TYPE_UD);
 -
 -   if (deref-deref.child) {
 -  const nir_deref_array *deref_array =
 - nir_deref_as_array(deref-deref.child);
 -  assert(deref-deref.child-deref_type == nir_deref_type_array 
 - deref_array-deref.child == NULL);
 -  const unsigned size = glsl_get_length(deref-var-type);
 +   fs_reg *indirect_offset = NULL;
 +
 +   unsigned img_offset = 0;
 +   const nir_deref *tail = deref-deref;
 +   while (tail-child) {
 +  const nir_deref_array *deref_array = nir_deref_as_array(tail-child);
 +  assert(tail-child-deref_type == nir_deref_type_array);
 +  tail = tail-child;
 +  const unsigned size = glsl_get_length(tail-type);
 +  const unsigned child_array_elements = tail-child != NULL ?
 + glsl_get_aoa_size(tail-type) : 1;
const unsigned base = MIN2(deref_array-base_offset, size - 1);
 -
 -  image = offset(image, bld, base * BRW_IMAGE_PARAM_SIZE);
 +  const unsigned aoa_size = child_array_elements * 
 BRW_IMAGE_PARAM_SIZE;
 +  img_offset += base * aoa_size;
  
if (deref_array-deref_array_type == nir_deref_array_type_indirect) {
 - fs_reg *tmp = new(mem_ctx) fs_reg(vgrf(glsl_type::int_type));
 + fs_reg tmp = vgrf(glsl_type::int_type);
 + if (indirect_offset == NULL) {
 +indirect_offset = new(mem_ctx) 
 fs_reg(vgrf(glsl_type::int_type));
 +bld.MOV(*indirect_offset, fs_reg(0));
 + }
  
   if (devinfo-gen == 7  !devinfo-is_haswell) {
  /* IVB hangs when trying to access an invalid surface index 
 with
 @@ -1188,18 +1203,22 @@ fs_visitor::get_nir_image_deref(const nir_deref_var 
 *deref)
   * of the possible outcomes of the hang.  Clamp the index to
   * prevent access outside of the array bounds.
   */
 -bld.emit_minmax(*tmp, retype(get_nir_src(deref_array
 -indirect),
 -  

Re: [Mesa-dev] [RFC 2/2] i965: add support for image AoA

2015-08-15 Thread Francisco Jerez
Timothy Arceri t_arc...@yahoo.com.au writes:

 On Wed, 2015-08-12 at 19:39 +1000, Timothy Arceri wrote:
 Cc: Francisco Jerez curroje...@riseup.net
 ---
  This patch works for direct indexing of images, but I'm having some trouble
  getting indirect to work.
 
  For example for:
  tests/spec/arb_arrays_of_arrays/execution/image_store/basic-imageStore
  -non-const-uniform-index.shader_test
 
  Which has and image writeonly uniform image2D tex[2][2]
 
  Indirect indexing will work for tex[0][0] and text[0][1] but not for
  tex[1][0] and tex[1][1] they seem to always end up refering to the
  image in 0.

 Just to add some more to this, I'm pretty sure my code is generating the
 correct offsets. If I hardcode the img_offset offset to 72 to get the uniform
 value of tex[1][0] I get the value I expected, but if I set image.reladdr to a
 register that contains 72 I don't what I expect.

 If I change the array to be a single dimension e.g. tex[4] and I hardcode the
 offset as described above then it works as expected for both scenarios, it
 also works if I split the offset across img_offset and image.reladdr, there is
 something going on with image.reladdr for multi-dimensional arrays that I can'
 t quite put my finger on.

 Any hints appreciated.

Odd, can you attach an assembly dump?

Thanks.


 
  I can't quite seem to see either my mistake or what I'm missing so I 
 thought
  I'd send this out, and see if anyone has any ideas. I've also sent some
  tests with mixed direct/indirect indexing which seem to calculate the 
 correct
  offest for the direct but the indirect indexing is not working there 
 either. 
 
  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 53 ++---
 ---
  1 file changed, 36 insertions(+), 17 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 index d7a2500..a49c230 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 @@ -226,6 +226,7 @@ fs_visitor::nir_setup_uniform(nir_variable *var)
* our name.
*/
 unsigned index = var-data.driver_location;
 +   bool set_image_location = true;
 for (unsigned u = 0; u  shader_prog-NumUniformStorage; u++) {
struct gl_uniform_storage *storage = shader_prog-UniformStorage[u];
  
 @@ -244,7 +245,13 @@ fs_visitor::nir_setup_uniform(nir_variable *var)
* because their size is driver-specific, so we need to allocate
* space for them here at the end of the parameter array.
*/
 - var-data.driver_location = uniforms;
 + if (set_image_location) {
 +/* For arrays of arrays we only want to set this once at the 
 base
 + * location.
 + */
 +var-data.driver_location = uniforms;
 +set_image_location = false;
 + }
   param_size[uniforms] =
  BRW_IMAGE_PARAM_SIZE * MAX2(storage-array_elements, 1);
  
 @@ -1165,19 +1172,27 @@ fs_visitor::get_nir_image_deref(const nir_deref_var 
 *deref)
  {
 fs_reg image(UNIFORM, deref-var-data.driver_location,
  BRW_REGISTER_TYPE_UD);
 -
 -   if (deref-deref.child) {
 -  const nir_deref_array *deref_array =
 - nir_deref_as_array(deref-deref.child);
 -  assert(deref-deref.child-deref_type == nir_deref_type_array 
 - deref_array-deref.child == NULL);
 -  const unsigned size = glsl_get_length(deref-var-type);
 +   fs_reg *indirect_offset = NULL;
 +
 +   unsigned img_offset = 0;
 +   const nir_deref *tail = deref-deref;
 +   while (tail-child) {
 +  const nir_deref_array *deref_array = nir_deref_as_array(tail-child);
 +  assert(tail-child-deref_type == nir_deref_type_array);
 +  tail = tail-child;
 +  const unsigned size = glsl_get_length(tail-type);
 +  const unsigned child_array_elements = tail-child != NULL ?
 + glsl_get_aoa_size(tail-type) : 1;
const unsigned base = MIN2(deref_array-base_offset, size - 1);
 -
 -  image = offset(image, bld, base * BRW_IMAGE_PARAM_SIZE);
 +  const unsigned aoa_size = child_array_elements * 
 BRW_IMAGE_PARAM_SIZE;
 +  img_offset += base * aoa_size;
  
if (deref_array-deref_array_type == nir_deref_array_type_indirect) {
 - fs_reg *tmp = new(mem_ctx) fs_reg(vgrf(glsl_type::int_type));
 + fs_reg tmp = vgrf(glsl_type::int_type);
 + if (indirect_offset == NULL) {
 +indirect_offset = new(mem_ctx) 
 fs_reg(vgrf(glsl_type::int_type));
 +bld.MOV(*indirect_offset, fs_reg(0));
 + }
  
   if (devinfo-gen == 7  !devinfo-is_haswell) {
  /* IVB hangs when trying to access an invalid surface index 
 with
 @@ -1188,18 +1203,22 @@ fs_visitor::get_nir_image_deref(const nir_deref_var 
 *deref)
   * of the possible outcomes of the hang.  Clamp the index to
   * prevent access outside of the array bounds.
   */
 - 

[Mesa-dev] [RFC 2/2] i965: add support for image AoA

2015-08-12 Thread Timothy Arceri
Cc: Francisco Jerez curroje...@riseup.net
---
 This patch works for direct indexing of images, but I'm having some trouble
 getting indirect to work.

 For example for:
 tests/spec/arb_arrays_of_arrays/execution/image_store/basic-imageStore
 -non-const-uniform-index.shader_test

 Which has and image writeonly uniform image2D tex[2][2]

 Indirect indexing will work for tex[0][0] and text[0][1] but not for
 tex[1][0] and tex[1][1] they seem to always end up refering to the
 image in 0.

 I can't quite seem to see either my mistake or what I'm missing so I thought
 I'd send this out, and see if anyone has any ideas. I've also sent some
 tests with mixed direct/indirect indexing which seem to calculate the correct
 offest for the direct but the indirect indexing is not working there either. 

 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 53 ++--
 1 file changed, 36 insertions(+), 17 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index d7a2500..a49c230 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -226,6 +226,7 @@ fs_visitor::nir_setup_uniform(nir_variable *var)
   * our name.
   */
unsigned index = var-data.driver_location;
+   bool set_image_location = true;
for (unsigned u = 0; u  shader_prog-NumUniformStorage; u++) {
   struct gl_uniform_storage *storage = shader_prog-UniformStorage[u];
 
@@ -244,7 +245,13 @@ fs_visitor::nir_setup_uniform(nir_variable *var)
   * because their size is driver-specific, so we need to allocate
   * space for them here at the end of the parameter array.
   */
- var-data.driver_location = uniforms;
+ if (set_image_location) {
+/* For arrays of arrays we only want to set this once at the base
+ * location.
+ */
+var-data.driver_location = uniforms;
+set_image_location = false;
+ }
  param_size[uniforms] =
 BRW_IMAGE_PARAM_SIZE * MAX2(storage-array_elements, 1);
 
@@ -1165,19 +1172,27 @@ fs_visitor::get_nir_image_deref(const nir_deref_var 
*deref)
 {
fs_reg image(UNIFORM, deref-var-data.driver_location,
 BRW_REGISTER_TYPE_UD);
-
-   if (deref-deref.child) {
-  const nir_deref_array *deref_array =
- nir_deref_as_array(deref-deref.child);
-  assert(deref-deref.child-deref_type == nir_deref_type_array 
- deref_array-deref.child == NULL);
-  const unsigned size = glsl_get_length(deref-var-type);
+   fs_reg *indirect_offset = NULL;
+
+   unsigned img_offset = 0;
+   const nir_deref *tail = deref-deref;
+   while (tail-child) {
+  const nir_deref_array *deref_array = nir_deref_as_array(tail-child);
+  assert(tail-child-deref_type == nir_deref_type_array);
+  tail = tail-child;
+  const unsigned size = glsl_get_length(tail-type);
+  const unsigned child_array_elements = tail-child != NULL ?
+ glsl_get_aoa_size(tail-type) : 1;
   const unsigned base = MIN2(deref_array-base_offset, size - 1);
-
-  image = offset(image, bld, base * BRW_IMAGE_PARAM_SIZE);
+  const unsigned aoa_size = child_array_elements * BRW_IMAGE_PARAM_SIZE;
+  img_offset += base * aoa_size;
 
   if (deref_array-deref_array_type == nir_deref_array_type_indirect) {
- fs_reg *tmp = new(mem_ctx) fs_reg(vgrf(glsl_type::int_type));
+ fs_reg tmp = vgrf(glsl_type::int_type);
+ if (indirect_offset == NULL) {
+indirect_offset = new(mem_ctx) fs_reg(vgrf(glsl_type::int_type));
+bld.MOV(*indirect_offset, fs_reg(0));
+ }
 
  if (devinfo-gen == 7  !devinfo-is_haswell) {
 /* IVB hangs when trying to access an invalid surface index with
@@ -1188,18 +1203,22 @@ fs_visitor::get_nir_image_deref(const nir_deref_var 
*deref)
  * of the possible outcomes of the hang.  Clamp the index to
  * prevent access outside of the array bounds.
  */
-bld.emit_minmax(*tmp, retype(get_nir_src(deref_array-indirect),
- BRW_REGISTER_TYPE_UD),
+bld.emit_minmax(tmp, retype(get_nir_src(deref_array-indirect),
+BRW_REGISTER_TYPE_UD),
 fs_reg(size - base - 1), BRW_CONDITIONAL_L);
  } else {
-bld.MOV(*tmp, get_nir_src(deref_array-indirect));
+bld.MOV(tmp, get_nir_src(deref_array-indirect));
  }
-
- bld.MUL(*tmp, *tmp, fs_reg(BRW_IMAGE_PARAM_SIZE));
- image.reladdr = tmp;
+ bld.MUL(tmp, tmp, fs_reg(aoa_size));
+ bld.ADD(*indirect_offset, *indirect_offset, tmp);
   }
}
 
+   if (indirect_offset) {
+  image.reladdr = indirect_offset;
+   }
+   image = offset(image, bld, img_offset);
+
return image;
 }
 
-- 
2.4.3