Re: [Mesa-dev] [PATCH v2 1/8] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit

2018-02-28 Thread Jason Ekstrand
On Wed, Feb 28, 2018 at 2:24 AM, Chema Casanova 
wrote:

> On 27/02/18 19:53, Jason Ekstrand wrote:
> > On Tue, Feb 27, 2018 at 5:27 AM, Jose Maria Casanova Crespo
> > > wrote:
> >
> > The surfaces that backup the GPU buffers have a boundary check that
> > considers that access to partial dwords are considered out-of-bounds.
> > For example, buffers with 1,3 16-bit elements has size 2 or 6 and the
> > last two bytes would always be read as 0 or its writting ignored.
> >
> > The introduction of 16-bit types implies that we need to align the
> size
> > to 4-bytew multiples so that partial dwords could be read/written.
> > Adding an inconditional +2 size to buffers not being multiple of 2
> > solves this issue for the general cases of UBO or SSBO.
> >
> > But, when unsized arrays of 16-bit elements are used it is not
> possible
> > to know if the size was padded or not. To solve this issue the
> > implementation calculates the needed size of the buffer surfaces,
> > as suggested by Jason:
> >
> > surface_size = isl_align(buffer_size, 4) +
> >(isl_align(buffer_size, 4) - buffer_size)
> >
> > So when we calculate backwards the buffer_size in the backend we
> > update the resinfo return value with:
> >
> > buffer_size = (surface_size & ~3) - (surface_size & 3)
> >
> > It is also exposed this buffer requirements when robust buffer access
> > is enabled so these buffer sizes recommend being multiple of 4.
> >
> > v2: (Jason Ekstrand)
> > Move padding logic fron anv to isl_surface_state
> > Move calculus of original size from spirv to driver backend
> > v3: (Jason Ekstrand)
> > Rename some variables and use a similar expresion when
> calculating
> > padding than when obtaining the original buffer size.
> > Avoid use of unnecesary component call at brw_fs_nir.
> >
> > Reviewed-by: Jason Ekstrand  > >
> > ---
> >  src/intel/compiler/brw_fs_nir.cpp | 27 ++-
> >  src/intel/isl/isl_surface_state.c | 22 +-
> >  src/intel/vulkan/anv_device.c | 11 +++
> >  3 files changed, 58 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/intel/compiler/brw_fs_nir.cpp
> > b/src/intel/compiler/brw_fs_nir.cpp
> > index 8efec34cc9d..4aa411d149f 100644
> > --- a/src/intel/compiler/brw_fs_nir.cpp
> > +++ b/src/intel/compiler/brw_fs_nir.cpp
> > @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const
> > fs_builder , nir_intrinsic_instr *instr
> >inst->mlen = 1;
> >inst->size_written = 4 * REG_SIZE;
> >
> > -  bld.MOV(retype(dest, ret_payload.type),
> > component(ret_payload, 0));
> > +  /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and
> > Faulting:
> > +   *
> > +   * "Out-of-bounds checking is always performed at a DWord
> > granularity. If
> > +   * any part of the DWord is out-of-bounds then the whole
> DWord is
> > +   * considered out-of-bounds."
> > +   *
> > +   * This implies that types with size smaller than 4-bytes
> > need to be
> > +   * padded if they don't complete the last dword of the
> > buffer. But as we
> > +   * need to maintain the original size we need to reverse the
> > padding
> > +   * calculation to return the correct size to know the number
> > of elements
> > +   * of an unsized array. As we stored in the last two bits of
> > the size of
> > +   * the buffer the needed padding we calculate here:
> > +   *
> > +   * buffer_size = resinfo_size & ~3 - resinfo_size & 3
> >
> >
> > Mind putting both calculations in this comment like you do in the one
> below?
>
> Locally changed as:
>
>  * As we stored in the last two bits of the surface
>  * size the needed padding for the buffer, we calculate here the
>  * original buffer_size reversing the surface_size calculation:
>  *
>  * surface_size = isl_align(buffer_size, 4) +
>  *(isl_align(buffer_size) - buffer_size)
>  *
>  * buffer_size = surface_size & ~3 - surface_size & 3
>
> I used the same names as in isl comment, so surface_size instead of
> resinfo_size.
>

Sounds great!


> > rb still applies
>
> Thanks.
>
> > +   */
> > +
> > +  fs_reg size_aligned4 = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> > +  fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> > +  fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> > +
> > +  ubld.AND(size_padding, ret_payload, brw_imm_ud(3));
> > +  ubld.AND(size_aligned4, ret_payload, brw_imm_ud(~3));
> > +  ubld.ADD(buffer_size, size_aligned4, negate(size_padding));
> > +
> >

Re: [Mesa-dev] [PATCH v2 1/8] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit

2018-02-28 Thread Chema Casanova
On 27/02/18 19:53, Jason Ekstrand wrote:
> On Tue, Feb 27, 2018 at 5:27 AM, Jose Maria Casanova Crespo
> > wrote:
> 
> The surfaces that backup the GPU buffers have a boundary check that
> considers that access to partial dwords are considered out-of-bounds.
> For example, buffers with 1,3 16-bit elements has size 2 or 6 and the
> last two bytes would always be read as 0 or its writting ignored.
> 
> The introduction of 16-bit types implies that we need to align the size
> to 4-bytew multiples so that partial dwords could be read/written.
> Adding an inconditional +2 size to buffers not being multiple of 2
> solves this issue for the general cases of UBO or SSBO.
> 
> But, when unsized arrays of 16-bit elements are used it is not possible
> to know if the size was padded or not. To solve this issue the
> implementation calculates the needed size of the buffer surfaces,
> as suggested by Jason:
> 
> surface_size = isl_align(buffer_size, 4) +
>                (isl_align(buffer_size, 4) - buffer_size)
> 
> So when we calculate backwards the buffer_size in the backend we
> update the resinfo return value with:
> 
> buffer_size = (surface_size & ~3) - (surface_size & 3)
> 
> It is also exposed this buffer requirements when robust buffer access
> is enabled so these buffer sizes recommend being multiple of 4.
> 
> v2: (Jason Ekstrand)
>     Move padding logic fron anv to isl_surface_state
>     Move calculus of original size from spirv to driver backend
> v3: (Jason Ekstrand)
>     Rename some variables and use a similar expresion when calculating
>     padding than when obtaining the original buffer size.
>     Avoid use of unnecesary component call at brw_fs_nir.
> 
> Reviewed-by: Jason Ekstrand  >
> ---
>  src/intel/compiler/brw_fs_nir.cpp | 27 ++-
>  src/intel/isl/isl_surface_state.c | 22 +-
>  src/intel/vulkan/anv_device.c     | 11 +++
>  3 files changed, 58 insertions(+), 2 deletions(-)
> 
> diff --git a/src/intel/compiler/brw_fs_nir.cpp
> b/src/intel/compiler/brw_fs_nir.cpp
> index 8efec34cc9d..4aa411d149f 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const
> fs_builder , nir_intrinsic_instr *instr
>        inst->mlen = 1;
>        inst->size_written = 4 * REG_SIZE;
> 
> -      bld.MOV(retype(dest, ret_payload.type),
> component(ret_payload, 0));
> +      /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and
> Faulting:
> +       *
> +       * "Out-of-bounds checking is always performed at a DWord
> granularity. If
> +       * any part of the DWord is out-of-bounds then the whole DWord is
> +       * considered out-of-bounds."
> +       *
> +       * This implies that types with size smaller than 4-bytes
> need to be
> +       * padded if they don't complete the last dword of the
> buffer. But as we
> +       * need to maintain the original size we need to reverse the
> padding
> +       * calculation to return the correct size to know the number
> of elements
> +       * of an unsized array. As we stored in the last two bits of
> the size of
> +       * the buffer the needed padding we calculate here:
> +       *
> +       * buffer_size = resinfo_size & ~3 - resinfo_size & 3
> 
> 
> Mind putting both calculations in this comment like you do in the one below?

Locally changed as:

 * As we stored in the last two bits of the surface
 * size the needed padding for the buffer, we calculate here the
 * original buffer_size reversing the surface_size calculation:
 *
 * surface_size = isl_align(buffer_size, 4) +
 *(isl_align(buffer_size) - buffer_size)
 *
 * buffer_size = surface_size & ~3 - surface_size & 3

I used the same names as in isl comment, so surface_size instead of
resinfo_size.

> rb still applies

Thanks.

> +       */
> +
> +      fs_reg size_aligned4 = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> +      fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> +      fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> +
> +      ubld.AND(size_padding, ret_payload, brw_imm_ud(3));
> +      ubld.AND(size_aligned4, ret_payload, brw_imm_ud(~3));
> +      ubld.ADD(buffer_size, size_aligned4, negate(size_padding));
> +
> +      bld.MOV(retype(dest, ret_payload.type),
> component(buffer_size, 0));
> +
>        brw_mark_surface_used(prog_data, index);
>        break;
>     }
> diff --git a/src/intel/isl/isl_surface_state.c
> b/src/intel/isl/isl_surface_state.c
> index bfb27fa4a44..c205b3d2c0b 

Re: [Mesa-dev] [PATCH v2 1/8] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit

2018-02-27 Thread Jason Ekstrand
On Tue, Feb 27, 2018 at 5:27 AM, Jose Maria Casanova Crespo <
jmcasan...@igalia.com> wrote:

> The surfaces that backup the GPU buffers have a boundary check that
> considers that access to partial dwords are considered out-of-bounds.
> For example, buffers with 1,3 16-bit elements has size 2 or 6 and the
> last two bytes would always be read as 0 or its writting ignored.
>
> The introduction of 16-bit types implies that we need to align the size
> to 4-bytew multiples so that partial dwords could be read/written.
> Adding an inconditional +2 size to buffers not being multiple of 2
> solves this issue for the general cases of UBO or SSBO.
>
> But, when unsized arrays of 16-bit elements are used it is not possible
> to know if the size was padded or not. To solve this issue the
> implementation calculates the needed size of the buffer surfaces,
> as suggested by Jason:
>
> surface_size = isl_align(buffer_size, 4) +
>(isl_align(buffer_size, 4) - buffer_size)
>
> So when we calculate backwards the buffer_size in the backend we
> update the resinfo return value with:
>
> buffer_size = (surface_size & ~3) - (surface_size & 3)
>
> It is also exposed this buffer requirements when robust buffer access
> is enabled so these buffer sizes recommend being multiple of 4.
>
> v2: (Jason Ekstrand)
> Move padding logic fron anv to isl_surface_state
> Move calculus of original size from spirv to driver backend
> v3: (Jason Ekstrand)
> Rename some variables and use a similar expresion when calculating
> padding than when obtaining the original buffer size.
> Avoid use of unnecesary component call at brw_fs_nir.
>
> Reviewed-by: Jason Ekstrand 
> ---
>  src/intel/compiler/brw_fs_nir.cpp | 27 ++-
>  src/intel/isl/isl_surface_state.c | 22 +-
>  src/intel/vulkan/anv_device.c | 11 +++
>  3 files changed, 58 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_nir.cpp
> b/src/intel/compiler/brw_fs_nir.cpp
> index 8efec34cc9d..4aa411d149f 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const fs_builder
> , nir_intrinsic_instr *instr
>inst->mlen = 1;
>inst->size_written = 4 * REG_SIZE;
>
> -  bld.MOV(retype(dest, ret_payload.type), component(ret_payload, 0));
> +  /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and
> Faulting:
> +   *
> +   * "Out-of-bounds checking is always performed at a DWord
> granularity. If
> +   * any part of the DWord is out-of-bounds then the whole DWord is
> +   * considered out-of-bounds."
> +   *
> +   * This implies that types with size smaller than 4-bytes need to be
> +   * padded if they don't complete the last dword of the buffer. But
> as we
> +   * need to maintain the original size we need to reverse the padding
> +   * calculation to return the correct size to know the number of
> elements
> +   * of an unsized array. As we stored in the last two bits of the
> size of
> +   * the buffer the needed padding we calculate here:
> +   *
> +   * buffer_size = resinfo_size & ~3 - resinfo_size & 3
>

Mind putting both calculations in this comment like you do in the one below?

rb still applies


> +   */
> +
> +  fs_reg size_aligned4 = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> +  fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> +  fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> +
> +  ubld.AND(size_padding, ret_payload, brw_imm_ud(3));
> +  ubld.AND(size_aligned4, ret_payload, brw_imm_ud(~3));
> +  ubld.ADD(buffer_size, size_aligned4, negate(size_padding));
> +
> +  bld.MOV(retype(dest, ret_payload.type), component(buffer_size, 0));
> +
>brw_mark_surface_used(prog_data, index);
>break;
> }
> diff --git a/src/intel/isl/isl_surface_state.c
> b/src/intel/isl/isl_surface_state.c
> index bfb27fa4a44..c205b3d2c0b 100644
> --- a/src/intel/isl/isl_surface_state.c
> +++ b/src/intel/isl/isl_surface_state.c
> @@ -673,7 +673,27 @@ void
>  isl_genX(buffer_fill_state_s)(void *state,
>const struct isl_buffer_fill_state_info
> *restrict info)
>  {
> -   uint32_t num_elements = info->size / info->stride;
> +   uint64_t buffer_size = info->size;
> +
> +   /* Uniform and Storage buffers need to have surface size not less that
> the
> +* aligned 32-bit size of the buffer. To calculate the array lenght on
> +* unsized arrays in StorageBuffer the last 2 bits store the padding
> size
> +* added to the surface, so we can calculate latter the original buffer
> +* size to know the number of elements.
> +*
> +*  surface_size = isl_align(buffer_size, 4) +
> +* (isl_align(buffer_size) - buffer_size)
> +*
> +*  buffer_size = (surface_size & ~3) - (surface_size & 3)
> +*/

[Mesa-dev] [PATCH v2 1/8] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit

2018-02-27 Thread Jose Maria Casanova Crespo
The surfaces that backup the GPU buffers have a boundary check that
considers that access to partial dwords are considered out-of-bounds.
For example, buffers with 1,3 16-bit elements has size 2 or 6 and the
last two bytes would always be read as 0 or its writting ignored.

The introduction of 16-bit types implies that we need to align the size
to 4-bytew multiples so that partial dwords could be read/written.
Adding an inconditional +2 size to buffers not being multiple of 2
solves this issue for the general cases of UBO or SSBO.

But, when unsized arrays of 16-bit elements are used it is not possible
to know if the size was padded or not. To solve this issue the
implementation calculates the needed size of the buffer surfaces,
as suggested by Jason:

surface_size = isl_align(buffer_size, 4) +
   (isl_align(buffer_size, 4) - buffer_size)

So when we calculate backwards the buffer_size in the backend we
update the resinfo return value with:

buffer_size = (surface_size & ~3) - (surface_size & 3)

It is also exposed this buffer requirements when robust buffer access
is enabled so these buffer sizes recommend being multiple of 4.

v2: (Jason Ekstrand)
Move padding logic fron anv to isl_surface_state
Move calculus of original size from spirv to driver backend
v3: (Jason Ekstrand)
Rename some variables and use a similar expresion when calculating
padding than when obtaining the original buffer size.
Avoid use of unnecesary component call at brw_fs_nir.

Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_fs_nir.cpp | 27 ++-
 src/intel/isl/isl_surface_state.c | 22 +-
 src/intel/vulkan/anv_device.c | 11 +++
 3 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 8efec34cc9d..4aa411d149f 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
   inst->mlen = 1;
   inst->size_written = 4 * REG_SIZE;
 
-  bld.MOV(retype(dest, ret_payload.type), component(ret_payload, 0));
+  /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and Faulting:
+   *
+   * "Out-of-bounds checking is always performed at a DWord granularity. If
+   * any part of the DWord is out-of-bounds then the whole DWord is
+   * considered out-of-bounds."
+   *
+   * This implies that types with size smaller than 4-bytes need to be
+   * padded if they don't complete the last dword of the buffer. But as we
+   * need to maintain the original size we need to reverse the padding
+   * calculation to return the correct size to know the number of elements
+   * of an unsized array. As we stored in the last two bits of the size of
+   * the buffer the needed padding we calculate here:
+   *
+   * buffer_size = resinfo_size & ~3 - resinfo_size & 3
+   */
+
+  fs_reg size_aligned4 = ubld.vgrf(BRW_REGISTER_TYPE_UD);
+  fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD);
+  fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD);
+
+  ubld.AND(size_padding, ret_payload, brw_imm_ud(3));
+  ubld.AND(size_aligned4, ret_payload, brw_imm_ud(~3));
+  ubld.ADD(buffer_size, size_aligned4, negate(size_padding));
+
+  bld.MOV(retype(dest, ret_payload.type), component(buffer_size, 0));
+
   brw_mark_surface_used(prog_data, index);
   break;
}
diff --git a/src/intel/isl/isl_surface_state.c 
b/src/intel/isl/isl_surface_state.c
index bfb27fa4a44..c205b3d2c0b 100644
--- a/src/intel/isl/isl_surface_state.c
+++ b/src/intel/isl/isl_surface_state.c
@@ -673,7 +673,27 @@ void
 isl_genX(buffer_fill_state_s)(void *state,
   const struct isl_buffer_fill_state_info 
*restrict info)
 {
-   uint32_t num_elements = info->size / info->stride;
+   uint64_t buffer_size = info->size;
+
+   /* Uniform and Storage buffers need to have surface size not less that the
+* aligned 32-bit size of the buffer. To calculate the array lenght on
+* unsized arrays in StorageBuffer the last 2 bits store the padding size
+* added to the surface, so we can calculate latter the original buffer
+* size to know the number of elements.
+*
+*  surface_size = isl_align(buffer_size, 4) +
+* (isl_align(buffer_size) - buffer_size)
+*
+*  buffer_size = (surface_size & ~3) - (surface_size & 3)
+*/
+   if (info->format == ISL_FORMAT_RAW  ||
+   info->stride < isl_format_get_layout(info->format)->bpb / 8) {
+  assert(info->stride == 1);
+  uint64_t aligned_size = isl_align(buffer_size, 4);
+  buffer_size = aligned_size + (aligned_size - buffer_size);
+   }
+
+   uint32_t num_elements = buffer_size / info->stride;
 
if (GEN_GEN >= 7) {
   /* From the IVB PRM,