Re: [Mesa-dev] [RFC 2/2] i965: add support for image AoA
Timothy Arceri t_arc...@yahoo.com.au writes: On Sun, 2015-08-16 at 13:15 +0300, Francisco Jerez wrote: Timothy Arceri t_arc...@yahoo.com.au writes: On Sat, 2015-08-15 at 17:33 +0300, Francisco Jerez wrote: Timothy Arceri t_arc...@yahoo.com.au writes: On Wed, 2015-08-12 at 19:39 +1000, Timothy Arceri wrote: Cc: Francisco Jerez curroje...@riseup.net --- This patch works for direct indexing of images, but I'm having some trouble getting indirect to work. For example for: tests/spec/arb_arrays_of_arrays/execution/image_store/basic -imageStore -non-const-uniform-index.shader_test Which has and image writeonly uniform image2D tex[2][2] Indirect indexing will work for tex[0][0] and text[0][1] but not for tex[1][0] and tex[1][1] they seem to always end up refering to the image in 0. Just to add some more to this, I'm pretty sure my code is generating the correct offsets. If I hardcode the img_offset offset to 72 to get the uniform value of tex[1][0] I get the value I expected, but if I set image.reladdr to a register that contains 72 I don't what I expect. If I change the array to be a single dimension e.g. tex[4] and I hardcode the offset as described above then it works as expected for both scenarios, it also works if I split the offset across img_offset and image.reladdr, there is something going on with image.reladdr for multi-dimensional arrays that I can' t quite put my finger on. Any hints appreciated. Odd, can you attach an assembly dump? Thanks. I wasn't sure what would be the most helpful so I've attached a few different dumps. image_dump = 1D array indirect piglit test, without this patch (Result=pass) image_dump2 = 2D array indirect piglit test, with this patch (Result=fail) image_dump3 = 1D array indirect piglit test, with this patch (Result=pass) image_dump4 = 1D array indirect piglit test, hardcoded register with 72 offset (Result=pass) image_dump5 = 2D array indirect piglit test, hardcoded register with 72 offset (Result=fail) image_dump4 vs image_dump5 is interesting because the output matches which is what I would have expected, but the result differs. Then with the offset below it seems to work as expected suggesting everything else is setup correctly. image_dump6 = 1D array indirect piglit test, hardcoded 72 offset in img_offset (Result=pass) image_dump7 = 2D array indirect piglit test, hardcoded 72 offset in img_offset (Result=pass) Yeah... The assembly output looks correct to me in the cases that fail. AFAICT what all the failing cases have in common is that the array of image uniforms is demoted to pull constants (see demote_pull_constants() and move_uniform_array_access_to_pull_constants()), so most likely something goes wrong while doing that for multidimensional arrays. I have the suspicion that this is the source of the problem: --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -226,6 +226,7 @@ fs_visitor::nir_setup_uniform(nir_variable *var) * our name. */ unsigned index = var-data.driver_location; + bool set_image_location = true; for (unsigned u = 0; u shader_prog-NumUniformStorage; u++) { struct gl_uniform_storage *storage = shader_prog -UniformStorage[u]; @@ -244,7 +245,13 @@ fs_visitor::nir_setup_uniform(nir_variable *var) * because their size is driver-specific, so we need to allocate * space for them here at the end of the parameter array. */ - var-data.driver_location = uniforms; + if (set_image_location) { +/* For arrays of arrays we only want to set this once at the base + * location. + */ +var-data.driver_location = uniforms; +set_image_location = false; + } param_size[uniforms] = BRW_IMAGE_PARAM_SIZE * MAX2(storage-array_elements, 1); The assumption that this code was previously making was that a given image array would be stored in a single gl_uniform_storage entry, otherwise param_size[var-data.driver_location] won't match the real size of the array and move_uniform_array_access_to_pull_constants() will have no idea how large the array really is, so it will only be able to move part of the array to the pull constant buffer and later on the indirect array access will read past the end of the initialized area of the constant buffer. Thanks for the reply. I'm still getting to know the i965 backend it would have taken me a while to get to this point, make a lot more sense now. The code below does the trick all of the indirect tests now pass. Thanks so much for
Re: [Mesa-dev] [RFC 2/2] i965: add support for image AoA
On Sun, 2015-08-16 at 13:15 +0300, Francisco Jerez wrote: Timothy Arceri t_arc...@yahoo.com.au writes: On Sat, 2015-08-15 at 17:33 +0300, Francisco Jerez wrote: Timothy Arceri t_arc...@yahoo.com.au writes: On Wed, 2015-08-12 at 19:39 +1000, Timothy Arceri wrote: Cc: Francisco Jerez curroje...@riseup.net --- This patch works for direct indexing of images, but I'm having some trouble getting indirect to work. For example for: tests/spec/arb_arrays_of_arrays/execution/image_store/basic -imageStore -non-const-uniform-index.shader_test Which has and image writeonly uniform image2D tex[2][2] Indirect indexing will work for tex[0][0] and text[0][1] but not for tex[1][0] and tex[1][1] they seem to always end up refering to the image in 0. Just to add some more to this, I'm pretty sure my code is generating the correct offsets. If I hardcode the img_offset offset to 72 to get the uniform value of tex[1][0] I get the value I expected, but if I set image.reladdr to a register that contains 72 I don't what I expect. If I change the array to be a single dimension e.g. tex[4] and I hardcode the offset as described above then it works as expected for both scenarios, it also works if I split the offset across img_offset and image.reladdr, there is something going on with image.reladdr for multi-dimensional arrays that I can' t quite put my finger on. Any hints appreciated. Odd, can you attach an assembly dump? Thanks. I wasn't sure what would be the most helpful so I've attached a few different dumps. image_dump = 1D array indirect piglit test, without this patch (Result=pass) image_dump2 = 2D array indirect piglit test, with this patch (Result=fail) image_dump3 = 1D array indirect piglit test, with this patch (Result=pass) image_dump4 = 1D array indirect piglit test, hardcoded register with 72 offset (Result=pass) image_dump5 = 2D array indirect piglit test, hardcoded register with 72 offset (Result=fail) image_dump4 vs image_dump5 is interesting because the output matches which is what I would have expected, but the result differs. Then with the offset below it seems to work as expected suggesting everything else is setup correctly. image_dump6 = 1D array indirect piglit test, hardcoded 72 offset in img_offset (Result=pass) image_dump7 = 2D array indirect piglit test, hardcoded 72 offset in img_offset (Result=pass) Yeah... The assembly output looks correct to me in the cases that fail. AFAICT what all the failing cases have in common is that the array of image uniforms is demoted to pull constants (see demote_pull_constants() and move_uniform_array_access_to_pull_constants()), so most likely something goes wrong while doing that for multidimensional arrays. I have the suspicion that this is the source of the problem: --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -226,6 +226,7 @@ fs_visitor::nir_setup_uniform(nir_variable *var) * our name. */ unsigned index = var-data.driver_location; + bool set_image_location = true; for (unsigned u = 0; u shader_prog-NumUniformStorage; u++) { struct gl_uniform_storage *storage = shader_prog -UniformStorage[u]; @@ -244,7 +245,13 @@ fs_visitor::nir_setup_uniform(nir_variable *var) * because their size is driver-specific, so we need to allocate * space for them here at the end of the parameter array. */ - var-data.driver_location = uniforms; + if (set_image_location) { +/* For arrays of arrays we only want to set this once at the base + * location. + */ +var-data.driver_location = uniforms; +set_image_location = false; + } param_size[uniforms] = BRW_IMAGE_PARAM_SIZE * MAX2(storage-array_elements, 1); The assumption that this code was previously making was that a given image array would be stored in a single gl_uniform_storage entry, otherwise param_size[var-data.driver_location] won't match the real size of the array and move_uniform_array_access_to_pull_constants() will have no idea how large the array really is, so it will only be able to move part of the array to the pull constant buffer and later on the indirect array access will read past the end of the initialized area of the constant buffer. Thanks for the reply. I'm still getting to know the i965 backend it would have taken me a while to get to this point, make a lot more sense now. The code below does the trick all of the indirect tests now pass. Thanks so much for your help :) I think something like this would do
Re: [Mesa-dev] [RFC 2/2] i965: add support for image AoA
Timothy Arceri t_arc...@yahoo.com.au writes: On Sat, 2015-08-15 at 17:33 +0300, Francisco Jerez wrote: Timothy Arceri t_arc...@yahoo.com.au writes: On Wed, 2015-08-12 at 19:39 +1000, Timothy Arceri wrote: Cc: Francisco Jerez curroje...@riseup.net --- This patch works for direct indexing of images, but I'm having some trouble getting indirect to work. For example for: tests/spec/arb_arrays_of_arrays/execution/image_store/basic-imageStore -non-const-uniform-index.shader_test Which has and image writeonly uniform image2D tex[2][2] Indirect indexing will work for tex[0][0] and text[0][1] but not for tex[1][0] and tex[1][1] they seem to always end up refering to the image in 0. Just to add some more to this, I'm pretty sure my code is generating the correct offsets. If I hardcode the img_offset offset to 72 to get the uniform value of tex[1][0] I get the value I expected, but if I set image.reladdr to a register that contains 72 I don't what I expect. If I change the array to be a single dimension e.g. tex[4] and I hardcode the offset as described above then it works as expected for both scenarios, it also works if I split the offset across img_offset and image.reladdr, there is something going on with image.reladdr for multi-dimensional arrays that I can' t quite put my finger on. Any hints appreciated. Odd, can you attach an assembly dump? Thanks. I wasn't sure what would be the most helpful so I've attached a few different dumps. image_dump = 1D array indirect piglit test, without this patch (Result=pass) image_dump2 = 2D array indirect piglit test, with this patch (Result=fail) image_dump3 = 1D array indirect piglit test, with this patch (Result=pass) image_dump4 = 1D array indirect piglit test, hardcoded register with 72 offset (Result=pass) image_dump5 = 2D array indirect piglit test, hardcoded register with 72 offset (Result=fail) image_dump4 vs image_dump5 is interesting because the output matches which is what I would have expected, but the result differs. Then with the offset below it seems to work as expected suggesting everything else is setup correctly. image_dump6 = 1D array indirect piglit test, hardcoded 72 offset in img_offset (Result=pass) image_dump7 = 2D array indirect piglit test, hardcoded 72 offset in img_offset (Result=pass) Yeah... The assembly output looks correct to me in the cases that fail. AFAICT what all the failing cases have in common is that the array of image uniforms is demoted to pull constants (see demote_pull_constants() and move_uniform_array_access_to_pull_constants()), so most likely something goes wrong while doing that for multidimensional arrays. I have the suspicion that this is the source of the problem: | --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp | +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp | @@ -226,6 +226,7 @@ fs_visitor::nir_setup_uniform(nir_variable *var) |* our name. |*/ | unsigned index = var-data.driver_location; | + bool set_image_location = true; | for (unsigned u = 0; u shader_prog-NumUniformStorage; u++) { |struct gl_uniform_storage *storage = shader_prog-UniformStorage[u]; | | @@ -244,7 +245,13 @@ fs_visitor::nir_setup_uniform(nir_variable *var) |* because their size is driver-specific, so we need to allocate |* space for them here at the end of the parameter array. |*/ | - var-data.driver_location = uniforms; | + if (set_image_location) { | +/* For arrays of arrays we only want to set this once at the base | + * location. | + */ | +var-data.driver_location = uniforms; | +set_image_location = false; | + } | param_size[uniforms] = | BRW_IMAGE_PARAM_SIZE * MAX2(storage-array_elements, 1); The assumption that this code was previously making was that a given image array would be stored in a single gl_uniform_storage entry, otherwise param_size[var-data.driver_location] won't match the real size of the array and move_uniform_array_access_to_pull_constants() will have no idea how large the array really is, so it will only be able to move part of the array to the pull constant buffer and later on the indirect array access will read past the end of the initialized area of the constant buffer. I think something like this would do what you want: | if (var-type-without_array()-is_image()) { |/* Images don't get a valid location assigned by nir_lower_io() | * because their size is driver-specific, so we need to allocate | * space for them here at the end of the parameter array. | */ |var-data.driver_location = uniforms; | } |[...] | for (unsigned u = 0; u shader_prog-NumUniformStorage; u++) { |if (storage-type-is_image()) { |
Re: [Mesa-dev] [RFC 2/2] i965: add support for image AoA
On Wed, 2015-08-12 at 19:39 +1000, Timothy Arceri wrote: Cc: Francisco Jerez curroje...@riseup.net --- This patch works for direct indexing of images, but I'm having some trouble getting indirect to work. For example for: tests/spec/arb_arrays_of_arrays/execution/image_store/basic-imageStore -non-const-uniform-index.shader_test Which has and image writeonly uniform image2D tex[2][2] Indirect indexing will work for tex[0][0] and text[0][1] but not for tex[1][0] and tex[1][1] they seem to always end up refering to the image in 0. Just to add some more to this, I'm pretty sure my code is generating the correct offsets. If I hardcode the img_offset offset to 72 to get the uniform value of tex[1][0] I get the value I expected, but if I set image.reladdr to a register that contains 72 I don't what I expect. If I change the array to be a single dimension e.g. tex[4] and I hardcode the offset as described above then it works as expected for both scenarios, it also works if I split the offset across img_offset and image.reladdr, there is something going on with image.reladdr for multi-dimensional arrays that I can' t quite put my finger on. Any hints appreciated. I can't quite seem to see either my mistake or what I'm missing so I thought I'd send this out, and see if anyone has any ideas. I've also sent some tests with mixed direct/indirect indexing which seem to calculate the correct offest for the direct but the indirect indexing is not working there either. src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 53 ++--- --- 1 file changed, 36 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index d7a2500..a49c230 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -226,6 +226,7 @@ fs_visitor::nir_setup_uniform(nir_variable *var) * our name. */ unsigned index = var-data.driver_location; + bool set_image_location = true; for (unsigned u = 0; u shader_prog-NumUniformStorage; u++) { struct gl_uniform_storage *storage = shader_prog-UniformStorage[u]; @@ -244,7 +245,13 @@ fs_visitor::nir_setup_uniform(nir_variable *var) * because their size is driver-specific, so we need to allocate * space for them here at the end of the parameter array. */ - var-data.driver_location = uniforms; + if (set_image_location) { +/* For arrays of arrays we only want to set this once at the base + * location. + */ +var-data.driver_location = uniforms; +set_image_location = false; + } param_size[uniforms] = BRW_IMAGE_PARAM_SIZE * MAX2(storage-array_elements, 1); @@ -1165,19 +1172,27 @@ fs_visitor::get_nir_image_deref(const nir_deref_var *deref) { fs_reg image(UNIFORM, deref-var-data.driver_location, BRW_REGISTER_TYPE_UD); - - if (deref-deref.child) { - const nir_deref_array *deref_array = - nir_deref_as_array(deref-deref.child); - assert(deref-deref.child-deref_type == nir_deref_type_array - deref_array-deref.child == NULL); - const unsigned size = glsl_get_length(deref-var-type); + fs_reg *indirect_offset = NULL; + + unsigned img_offset = 0; + const nir_deref *tail = deref-deref; + while (tail-child) { + const nir_deref_array *deref_array = nir_deref_as_array(tail-child); + assert(tail-child-deref_type == nir_deref_type_array); + tail = tail-child; + const unsigned size = glsl_get_length(tail-type); + const unsigned child_array_elements = tail-child != NULL ? + glsl_get_aoa_size(tail-type) : 1; const unsigned base = MIN2(deref_array-base_offset, size - 1); - - image = offset(image, bld, base * BRW_IMAGE_PARAM_SIZE); + const unsigned aoa_size = child_array_elements * BRW_IMAGE_PARAM_SIZE; + img_offset += base * aoa_size; if (deref_array-deref_array_type == nir_deref_array_type_indirect) { - fs_reg *tmp = new(mem_ctx) fs_reg(vgrf(glsl_type::int_type)); + fs_reg tmp = vgrf(glsl_type::int_type); + if (indirect_offset == NULL) { +indirect_offset = new(mem_ctx) fs_reg(vgrf(glsl_type::int_type)); +bld.MOV(*indirect_offset, fs_reg(0)); + } if (devinfo-gen == 7 !devinfo-is_haswell) { /* IVB hangs when trying to access an invalid surface index with @@ -1188,18 +1203,22 @@ fs_visitor::get_nir_image_deref(const nir_deref_var *deref) * of the possible outcomes of the hang. Clamp the index to * prevent access outside of the array bounds. */ -bld.emit_minmax(*tmp, retype(get_nir_src(deref_array -indirect), -
Re: [Mesa-dev] [RFC 2/2] i965: add support for image AoA
Timothy Arceri t_arc...@yahoo.com.au writes: On Wed, 2015-08-12 at 19:39 +1000, Timothy Arceri wrote: Cc: Francisco Jerez curroje...@riseup.net --- This patch works for direct indexing of images, but I'm having some trouble getting indirect to work. For example for: tests/spec/arb_arrays_of_arrays/execution/image_store/basic-imageStore -non-const-uniform-index.shader_test Which has and image writeonly uniform image2D tex[2][2] Indirect indexing will work for tex[0][0] and text[0][1] but not for tex[1][0] and tex[1][1] they seem to always end up refering to the image in 0. Just to add some more to this, I'm pretty sure my code is generating the correct offsets. If I hardcode the img_offset offset to 72 to get the uniform value of tex[1][0] I get the value I expected, but if I set image.reladdr to a register that contains 72 I don't what I expect. If I change the array to be a single dimension e.g. tex[4] and I hardcode the offset as described above then it works as expected for both scenarios, it also works if I split the offset across img_offset and image.reladdr, there is something going on with image.reladdr for multi-dimensional arrays that I can' t quite put my finger on. Any hints appreciated. Odd, can you attach an assembly dump? Thanks. I can't quite seem to see either my mistake or what I'm missing so I thought I'd send this out, and see if anyone has any ideas. I've also sent some tests with mixed direct/indirect indexing which seem to calculate the correct offest for the direct but the indirect indexing is not working there either. src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 53 ++--- --- 1 file changed, 36 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index d7a2500..a49c230 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -226,6 +226,7 @@ fs_visitor::nir_setup_uniform(nir_variable *var) * our name. */ unsigned index = var-data.driver_location; + bool set_image_location = true; for (unsigned u = 0; u shader_prog-NumUniformStorage; u++) { struct gl_uniform_storage *storage = shader_prog-UniformStorage[u]; @@ -244,7 +245,13 @@ fs_visitor::nir_setup_uniform(nir_variable *var) * because their size is driver-specific, so we need to allocate * space for them here at the end of the parameter array. */ - var-data.driver_location = uniforms; + if (set_image_location) { +/* For arrays of arrays we only want to set this once at the base + * location. + */ +var-data.driver_location = uniforms; +set_image_location = false; + } param_size[uniforms] = BRW_IMAGE_PARAM_SIZE * MAX2(storage-array_elements, 1); @@ -1165,19 +1172,27 @@ fs_visitor::get_nir_image_deref(const nir_deref_var *deref) { fs_reg image(UNIFORM, deref-var-data.driver_location, BRW_REGISTER_TYPE_UD); - - if (deref-deref.child) { - const nir_deref_array *deref_array = - nir_deref_as_array(deref-deref.child); - assert(deref-deref.child-deref_type == nir_deref_type_array - deref_array-deref.child == NULL); - const unsigned size = glsl_get_length(deref-var-type); + fs_reg *indirect_offset = NULL; + + unsigned img_offset = 0; + const nir_deref *tail = deref-deref; + while (tail-child) { + const nir_deref_array *deref_array = nir_deref_as_array(tail-child); + assert(tail-child-deref_type == nir_deref_type_array); + tail = tail-child; + const unsigned size = glsl_get_length(tail-type); + const unsigned child_array_elements = tail-child != NULL ? + glsl_get_aoa_size(tail-type) : 1; const unsigned base = MIN2(deref_array-base_offset, size - 1); - - image = offset(image, bld, base * BRW_IMAGE_PARAM_SIZE); + const unsigned aoa_size = child_array_elements * BRW_IMAGE_PARAM_SIZE; + img_offset += base * aoa_size; if (deref_array-deref_array_type == nir_deref_array_type_indirect) { - fs_reg *tmp = new(mem_ctx) fs_reg(vgrf(glsl_type::int_type)); + fs_reg tmp = vgrf(glsl_type::int_type); + if (indirect_offset == NULL) { +indirect_offset = new(mem_ctx) fs_reg(vgrf(glsl_type::int_type)); +bld.MOV(*indirect_offset, fs_reg(0)); + } if (devinfo-gen == 7 !devinfo-is_haswell) { /* IVB hangs when trying to access an invalid surface index with @@ -1188,18 +1203,22 @@ fs_visitor::get_nir_image_deref(const nir_deref_var *deref) * of the possible outcomes of the hang. Clamp the index to * prevent access outside of the array bounds. */ -
[Mesa-dev] [RFC 2/2] i965: add support for image AoA
Cc: Francisco Jerez curroje...@riseup.net --- This patch works for direct indexing of images, but I'm having some trouble getting indirect to work. For example for: tests/spec/arb_arrays_of_arrays/execution/image_store/basic-imageStore -non-const-uniform-index.shader_test Which has and image writeonly uniform image2D tex[2][2] Indirect indexing will work for tex[0][0] and text[0][1] but not for tex[1][0] and tex[1][1] they seem to always end up refering to the image in 0. I can't quite seem to see either my mistake or what I'm missing so I thought I'd send this out, and see if anyone has any ideas. I've also sent some tests with mixed direct/indirect indexing which seem to calculate the correct offest for the direct but the indirect indexing is not working there either. src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 53 ++-- 1 file changed, 36 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index d7a2500..a49c230 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -226,6 +226,7 @@ fs_visitor::nir_setup_uniform(nir_variable *var) * our name. */ unsigned index = var-data.driver_location; + bool set_image_location = true; for (unsigned u = 0; u shader_prog-NumUniformStorage; u++) { struct gl_uniform_storage *storage = shader_prog-UniformStorage[u]; @@ -244,7 +245,13 @@ fs_visitor::nir_setup_uniform(nir_variable *var) * because their size is driver-specific, so we need to allocate * space for them here at the end of the parameter array. */ - var-data.driver_location = uniforms; + if (set_image_location) { +/* For arrays of arrays we only want to set this once at the base + * location. + */ +var-data.driver_location = uniforms; +set_image_location = false; + } param_size[uniforms] = BRW_IMAGE_PARAM_SIZE * MAX2(storage-array_elements, 1); @@ -1165,19 +1172,27 @@ fs_visitor::get_nir_image_deref(const nir_deref_var *deref) { fs_reg image(UNIFORM, deref-var-data.driver_location, BRW_REGISTER_TYPE_UD); - - if (deref-deref.child) { - const nir_deref_array *deref_array = - nir_deref_as_array(deref-deref.child); - assert(deref-deref.child-deref_type == nir_deref_type_array - deref_array-deref.child == NULL); - const unsigned size = glsl_get_length(deref-var-type); + fs_reg *indirect_offset = NULL; + + unsigned img_offset = 0; + const nir_deref *tail = deref-deref; + while (tail-child) { + const nir_deref_array *deref_array = nir_deref_as_array(tail-child); + assert(tail-child-deref_type == nir_deref_type_array); + tail = tail-child; + const unsigned size = glsl_get_length(tail-type); + const unsigned child_array_elements = tail-child != NULL ? + glsl_get_aoa_size(tail-type) : 1; const unsigned base = MIN2(deref_array-base_offset, size - 1); - - image = offset(image, bld, base * BRW_IMAGE_PARAM_SIZE); + const unsigned aoa_size = child_array_elements * BRW_IMAGE_PARAM_SIZE; + img_offset += base * aoa_size; if (deref_array-deref_array_type == nir_deref_array_type_indirect) { - fs_reg *tmp = new(mem_ctx) fs_reg(vgrf(glsl_type::int_type)); + fs_reg tmp = vgrf(glsl_type::int_type); + if (indirect_offset == NULL) { +indirect_offset = new(mem_ctx) fs_reg(vgrf(glsl_type::int_type)); +bld.MOV(*indirect_offset, fs_reg(0)); + } if (devinfo-gen == 7 !devinfo-is_haswell) { /* IVB hangs when trying to access an invalid surface index with @@ -1188,18 +1203,22 @@ fs_visitor::get_nir_image_deref(const nir_deref_var *deref) * of the possible outcomes of the hang. Clamp the index to * prevent access outside of the array bounds. */ -bld.emit_minmax(*tmp, retype(get_nir_src(deref_array-indirect), - BRW_REGISTER_TYPE_UD), +bld.emit_minmax(tmp, retype(get_nir_src(deref_array-indirect), +BRW_REGISTER_TYPE_UD), fs_reg(size - base - 1), BRW_CONDITIONAL_L); } else { -bld.MOV(*tmp, get_nir_src(deref_array-indirect)); +bld.MOV(tmp, get_nir_src(deref_array-indirect)); } - - bld.MUL(*tmp, *tmp, fs_reg(BRW_IMAGE_PARAM_SIZE)); - image.reladdr = tmp; + bld.MUL(tmp, tmp, fs_reg(aoa_size)); + bld.ADD(*indirect_offset, *indirect_offset, tmp); } } + if (indirect_offset) { + image.reladdr = indirect_offset; + } + image = offset(image, bld, img_offset); + return image; } -- 2.4.3