UBO buffers need size padding if not multiple of 32-bit

Chema Casanova Wed, 28 Feb 2018 02:25:19 -0800

On 27/02/18 19:53, Jason Ekstrand wrote:
> On Tue, Feb 27, 2018 at 5:27 AM, Jose Maria Casanova Crespo
> <jmcasan...@igalia.com <mailto:jmcasan...@igalia.com>> wrote:
> 
>     The surfaces that backup the GPU buffers have a boundary check that
>     considers that access to partial dwords are considered out-of-bounds.
>     For example, buffers with 1,3 16-bit elements has size 2 or 6 and the
>     last two bytes would always be read as 0 or its writting ignored.
> 
>     The introduction of 16-bit types implies that we need to align the size
>     to 4-bytew multiples so that partial dwords could be read/written.
>     Adding an inconditional +2 size to buffers not being multiple of 2
>     solves this issue for the general cases of UBO or SSBO.
> 
>     But, when unsized arrays of 16-bit elements are used it is not possible
>     to know if the size was padded or not. To solve this issue the
>     implementation calculates the needed size of the buffer surfaces,
>     as suggested by Jason:
> 
>     surface_size = isl_align(buffer_size, 4) +
>                    (isl_align(buffer_size, 4) - buffer_size)
> 
>     So when we calculate backwards the buffer_size in the backend we
>     update the resinfo return value with:
> 
>     buffer_size = (surface_size & ~3) - (surface_size & 3)
> 
>     It is also exposed this buffer requirements when robust buffer access
>     is enabled so these buffer sizes recommend being multiple of 4.
> 
>     v2: (Jason Ekstrand)
>         Move padding logic fron anv to isl_surface_state
>         Move calculus of original size from spirv to driver backend
>     v3: (Jason Ekstrand)
>         Rename some variables and use a similar expresion when calculating
>         padding than when obtaining the original buffer size.
>         Avoid use of unnecesary component call at brw_fs_nir.
> 
>     Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net
>     <mailto:ja...@jlekstrand.net>>
>     ---
>      src/intel/compiler/brw_fs_nir.cpp | 27 ++++++++++++++++++++++++++-
>      src/intel/isl/isl_surface_state.c | 22 +++++++++++++++++++++-
>      src/intel/vulkan/anv_device.c     | 11 +++++++++++
>      3 files changed, 58 insertions(+), 2 deletions(-)
> 
>     diff --git a/src/intel/compiler/brw_fs_nir.cpp
>     b/src/intel/compiler/brw_fs_nir.cpp
>     index 8efec34cc9d..4aa411d149f 100644
>     --- a/src/intel/compiler/brw_fs_nir.cpp
>     +++ b/src/intel/compiler/brw_fs_nir.cpp
>     @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const
>     fs_builder &bld, nir_intrinsic_instr *instr
>            inst->mlen = 1;
>            inst->size_written = 4 * REG_SIZE;
> 
>     -      bld.MOV(retype(dest, ret_payload.type),
>     component(ret_payload, 0));
>     +      /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and
>     Faulting:
>     +       *
>     +       * "Out-of-bounds checking is always performed at a DWord
>     granularity. If
>     +       * any part of the DWord is out-of-bounds then the whole DWord is
>     +       * considered out-of-bounds."
>     +       *
>     +       * This implies that types with size smaller than 4-bytes
>     need to be
>     +       * padded if they don't complete the last dword of the
>     buffer. But as we
>     +       * need to maintain the original size we need to reverse the
>     padding
>     +       * calculation to return the correct size to know the number
>     of elements
>     +       * of an unsized array. As we stored in the last two bits of
>     the size of
>     +       * the buffer the needed padding we calculate here:
>     +       *
>     +       * buffer_size = resinfo_size & ~3 - resinfo_size & 3
> 
> 
> Mind putting both calculations in this comment like you do in the one below?


Locally changed as:

 * ....    As we stored in the last two bits of the surface
 * size the needed padding for the buffer, we calculate here the
 * original buffer_size reversing the surface_size calculation:
 *
 * surface_size = isl_align(buffer_size, 4) +
 *                (isl_align(buffer_size) - buffer_size)
 *
 * buffer_size = surface_size & ~3 - surface_size & 3

I used the same names as in isl comment, so surface_size instead of
resinfo_size.

> rb still applies

Thanks.

>     +       */
>     +
>     +      fs_reg size_aligned4 = ubld.vgrf(BRW_REGISTER_TYPE_UD);
>     +      fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD);
>     +      fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD);
>     +
>     +      ubld.AND(size_padding, ret_payload, brw_imm_ud(3));
>     +      ubld.AND(size_aligned4, ret_payload, brw_imm_ud(~3));
>     +      ubld.ADD(buffer_size, size_aligned4, negate(size_padding));
>     +
>     +      bld.MOV(retype(dest, ret_payload.type),
>     component(buffer_size, 0));
>     +
>            brw_mark_surface_used(prog_data, index);
>            break;
>         }
>     diff --git a/src/intel/isl/isl_surface_state.c
>     b/src/intel/isl/isl_surface_state.c
>     index bfb27fa4a44..c205b3d2c0b 100644
>     --- a/src/intel/isl/isl_surface_state.c
>     +++ b/src/intel/isl/isl_surface_state.c
>     @@ -673,7 +673,27 @@ void
>      isl_genX(buffer_fill_state_s)(void *state,
>                                    const struct
>     isl_buffer_fill_state_info *restrict info)
>      {
>     -   uint32_t num_elements = info->size / info->stride;
>     +   uint64_t buffer_size = info->size;
>     +
>     +   /* Uniform and Storage buffers need to have surface size not
>     less that the
>     +    * aligned 32-bit size of the buffer. To calculate the array
>     lenght on
>     +    * unsized arrays in StorageBuffer the last 2 bits store the
>     padding size
>     +    * added to the surface, so we can calculate latter the original
>     buffer
>     +    * size to know the number of elements.
>     +    *
>     +    *  surface_size = isl_align(buffer_size, 4) +
>     +    *                 (isl_align(buffer_size) - buffer_size)
>     +    *
>     +    *  buffer_size = (surface_size & ~3) - (surface_size & 3)
>     +    */
>     +   if (info->format == ISL_FORMAT_RAW  ||
>     +       info->stride < isl_format_get_layout(info->format)->bpb / 8) {
>     +      assert(info->stride == 1);
>     +      uint64_t aligned_size = isl_align(buffer_size, 4);
>     +      buffer_size = aligned_size + (aligned_size - buffer_size);
>     +   }
>     +
>     +   uint32_t num_elements = buffer_size / info->stride;
> 
>         if (GEN_GEN >= 7) {
>            /* From the IVB PRM, SURFACE_STATE::Height,
>     diff --git a/src/intel/vulkan/anv_device.c
>     b/src/intel/vulkan/anv_device.c
>     index a83b7a39f6a..cedeed56219 100644
>     --- a/src/intel/vulkan/anv_device.c
>     +++ b/src/intel/vulkan/anv_device.c
>     @@ -2103,6 +2103,17 @@ void anv_GetBufferMemoryRequirements(
> 
>         pMemoryRequirements->size = buffer->size;
>         pMemoryRequirements->alignment = alignment;
>     +
>     +   /* Storage and Uniform buffers should have their size aligned to
>     +    * 32-bits to avoid boundary checks when last DWord is not complete.
>     +    * This would ensure that not internal padding would be needed for
>     +    * 16-bit types.
>     +    */
>     +   if (device->robust_buffer_access &&
>     +       (buffer->usage & VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT ||
>     +        buffer->usage & VK_BUFFER_USAGE_STORAGE_BUFFER_BIT))
>     +      pMemoryRequirements->size = align_u64(buffer->size, 4);
>     +
>         pMemoryRequirements->memoryTypeBits = memory_types;
>      }
> 
>     --
>     2.14.3
> 
>     _______________________________________________
>     mesa-dev mailing list
>     mesa-dev@lists.freedesktop.org <mailto:mesa-dev@lists.freedesktop.org>
>     https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>     <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
> 
> 
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 1/8] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit

Reply via email to