Re: [Mesa-dev] [PATCH 1/7] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit (v2)

2018-02-26 Thread Jason Ekstrand
On Mon, Feb 26, 2018 at 10:09 AM, Chema Casanova 
wrote:

> On 26/02/18 18:20, Jason Ekstrand wrote:
> > I've lost track of what's reviewed and what's not.  Could you either
> > just send a status list or do a resend once all the current comments are
> > handled?
>
> Reviewed-by and all feedback addressed
> --
> [1/7] anv/spirv: SSBO/UBO buffers needs padding size is not multiple of
> 32-bits.
> [3/7] i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout
> [5/7] spirv: Calculate properly 16-bit vector sizes
> [6/7] spirv/i965/anv: Relax push constant offset assertions being 32-bit
> aligned
>
> Feedback received but pending to finish v2
> --
> [2/7] i965/fs: Support 16-bit do_read_vector with
> VK_KHR_relaxed_block_layout
>
> To review when everything is ready
> --
> [4/7] anv: Enable VK_KHR_16bit_storage for SSBO and UBO
> [7/7] anv: Enable VK_KHR_16bit_storage for PushConstant
>
> I'll resend this series when [2/7] is ready.
>

Sounds good.  4 and 7 will basically be auto-reviewed once everything else
is in shape.

--Jason


>
> Chema
>
> > --Jason
> >
> >
> > On February 26, 2018 09:08:01 Chema Casanova 
> wrote:
> >
> >> On 26/02/18 16:54, Jason Ekstrand wrote:
> >>> On Mon, Feb 26, 2018 at 6:14 AM, Jose Maria Casanova Crespo
> >>> > wrote:
> >>>
> >>> The surfaces that backup the GPU buffers have a boundary check that
> >>> considers that access to partial dwords are considered
> >>> out-of-bounds.
> >>> For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and
> the
> >>> last two bytes would always be read as 0 or its writting ignored.
> >>>
> >>> The introduction of 16-bit types implies that we need to align
> >>> the size
> >>> to 4-bytes multiples so that partial dwords could be read/written.
> >>> Adding an inconditional +2 size to buffers not being multiple of 2
> >>> solves this issue for the general cases of UBO or SSBO.
> >>>
> >>> But, when unsized arrays of 16-bit elements are used it is not
> >>> possible
> >>> to know if the size was padded or not. To solve this issue the
> >>> implementation calculates the needed size of the buffer surfaces,
> >>> as suggested by Jason:
> >>>
> >>> surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size
> >>
> >>
> >> Changed also the commit log according the the comments below.
> >>
> >>>
> >>> So when we calculate backwards the buffer_size in the backend we
> >>> update the resinfo return value with:
> >>>
> >>> buffer_size = (surface_size & ~3) - (surface_size & 3)
> >>>
> >>> It is also exposed this buffer requirements when robust buffer
> >>> access
> >>> is enabled so these buffer sizes recommend being multiple of 4.
> >>>
> >>> v2: (Jason Ekstrand)
> >>> Move padding logic fron anv to isl_surface_state
> >>> Move calculus of original size from spirv to driver backend
> >>> ---
> >>>  src/intel/compiler/brw_fs_nir.cpp | 27
> ++-
> >>>  src/intel/isl/isl_surface_state.c | 21 -
> >>>  src/intel/vulkan/anv_device.c | 11 +++
> >>>  3 files changed, 57 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/src/intel/compiler/brw_fs_nir.cpp
> >>> b/src/intel/compiler/brw_fs_nir.cpp
> >>> index 8efec34cc9d..d017af040b4 100644
> >>> --- a/src/intel/compiler/brw_fs_nir.cpp
> >>> +++ b/src/intel/compiler/brw_fs_nir.cpp
> >>> @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const
> >>> fs_builder , nir_intrinsic_instr *instr
> >>>inst->mlen = 1;
> >>>inst->size_written = 4 * REG_SIZE;
> >>>
> >>> -  bld.MOV(retype(dest, ret_payload.type),
> >>> component(ret_payload, 0));
> >>> +  /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking
> and
> >>> Faulting:
> >>> +   *
> >>> +   * "Out-of-bounds checking is always performed at a DWord
> >>> granularity. If
> >>> +   * any part of the DWord is out-of-bounds then the whole
> >>> DWord is
> >>> +   * considered out-of-bounds."
> >>> +   *
> >>> +   * This implies that types with size smaller than 4-bytes
> >>> (16-bits) need
> >>>
> >>>
> >>> Yeah, 4-bytes (16-bits) is weird.  I'd just drop the "(16-bits)".
> >>
> >> Completely agree.
> >>
> >>> +   * to be padded if they don't complete the last dword of the
> >>> buffer. But
> >>> +   * as we need to maintain the original size we need to
> >>> reverse the padding
> >>> +   * calculation to return the correct size to know the
> number
> >>> of elements
> >>> +   * of an unsized array. As we stored in the last two bits of
> >>> the size
> >>> +   * of the buffer the needed 

Re: [Mesa-dev] [PATCH 1/7] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit (v2)

2018-02-26 Thread Chema Casanova
On 26/02/18 18:20, Jason Ekstrand wrote:
> I've lost track of what's reviewed and what's not.  Could you either
> just send a status list or do a resend once all the current comments are
> handled?

Reviewed-by and all feedback addressed
--
[1/7] anv/spirv: SSBO/UBO buffers needs padding size is not multiple of
32-bits.
[3/7] i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout
[5/7] spirv: Calculate properly 16-bit vector sizes
[6/7] spirv/i965/anv: Relax push constant offset assertions being 32-bit
aligned

Feedback received but pending to finish v2
--
[2/7] i965/fs: Support 16-bit do_read_vector with
VK_KHR_relaxed_block_layout

To review when everything is ready
--
[4/7] anv: Enable VK_KHR_16bit_storage for SSBO and UBO
[7/7] anv: Enable VK_KHR_16bit_storage for PushConstant

I'll resend this series when [2/7] is ready.

Chema

> --Jason
> 
> 
> On February 26, 2018 09:08:01 Chema Casanova  wrote:
> 
>> On 26/02/18 16:54, Jason Ekstrand wrote:
>>> On Mon, Feb 26, 2018 at 6:14 AM, Jose Maria Casanova Crespo
>>> > wrote:
>>>
>>>     The surfaces that backup the GPU buffers have a boundary check that
>>>     considers that access to partial dwords are considered
>>> out-of-bounds.
>>>     For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and the
>>>     last two bytes would always be read as 0 or its writting ignored.
>>>
>>>     The introduction of 16-bit types implies that we need to align
>>> the size
>>>     to 4-bytes multiples so that partial dwords could be read/written.
>>>     Adding an inconditional +2 size to buffers not being multiple of 2
>>>     solves this issue for the general cases of UBO or SSBO.
>>>
>>>     But, when unsized arrays of 16-bit elements are used it is not
>>> possible
>>>     to know if the size was padded or not. To solve this issue the
>>>     implementation calculates the needed size of the buffer surfaces,
>>>     as suggested by Jason:
>>>
>>>     surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size
>>
>>
>> Changed also the commit log according the the comments below.
>>
>>>
>>>     So when we calculate backwards the buffer_size in the backend we
>>>     update the resinfo return value with:
>>>
>>>     buffer_size = (surface_size & ~3) - (surface_size & 3)
>>>
>>>     It is also exposed this buffer requirements when robust buffer
>>> access
>>>     is enabled so these buffer sizes recommend being multiple of 4.
>>>
>>>     v2: (Jason Ekstrand)
>>>         Move padding logic fron anv to isl_surface_state
>>>         Move calculus of original size from spirv to driver backend
>>>     ---
>>>      src/intel/compiler/brw_fs_nir.cpp | 27 ++-
>>>      src/intel/isl/isl_surface_state.c | 21 -
>>>      src/intel/vulkan/anv_device.c     | 11 +++
>>>      3 files changed, 57 insertions(+), 2 deletions(-)
>>>
>>>     diff --git a/src/intel/compiler/brw_fs_nir.cpp
>>>     b/src/intel/compiler/brw_fs_nir.cpp
>>>     index 8efec34cc9d..d017af040b4 100644
>>>     --- a/src/intel/compiler/brw_fs_nir.cpp
>>>     +++ b/src/intel/compiler/brw_fs_nir.cpp
>>>     @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const
>>>     fs_builder , nir_intrinsic_instr *instr
>>>            inst->mlen = 1;
>>>            inst->size_written = 4 * REG_SIZE;
>>>
>>>     -      bld.MOV(retype(dest, ret_payload.type),
>>>     component(ret_payload, 0));
>>>     +      /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and
>>>     Faulting:
>>>     +       *
>>>     +       * "Out-of-bounds checking is always performed at a DWord
>>>     granularity. If
>>>     +       * any part of the DWord is out-of-bounds then the whole
>>> DWord is
>>>     +       * considered out-of-bounds."
>>>     +       *
>>>     +       * This implies that types with size smaller than 4-bytes
>>>     (16-bits) need
>>>
>>>
>>> Yeah, 4-bytes (16-bits) is weird.  I'd just drop the "(16-bits)".
>>
>> Completely agree.
>>
>>>     +       * to be padded if they don't complete the last dword of the
>>>     buffer. But
>>>     +       * as we need to maintain the original size we need to
>>>     reverse the padding
>>>     +       * calculation to return the correct size to know the  number
>>>     of elements
>>>     +       * of an unsized array. As we stored in the last two bits of
>>>     the size
>>>     +       * of the buffer the needed padding we calculate here:
>>>     +       *
>>>     +       * buffer_size = resinfo_size & ~3 - resinfo_size & 3
>>>     +       */
>>>     +
>>>     +      fs_reg size_aligned32 = ubld.vgrf(BRW_REGISTER_TYPE_UD);
>>>
>>>
>>> I'd call this aligned4 because it's in units of bytes.
>>
>> Locally changed.
>>
>>>     +      fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD);
>>>     +      fs_reg buffer_size = 

Re: [Mesa-dev] [PATCH 1/7] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit (v2)

2018-02-26 Thread Jason Ekstrand
I've lost track of what's reviewed and what's not.  Could you either just 
send a status list or do a resend once all the current comments are handled?


--Jason


On February 26, 2018 09:08:01 Chema Casanova  wrote:


On 26/02/18 16:54, Jason Ekstrand wrote:

On Mon, Feb 26, 2018 at 6:14 AM, Jose Maria Casanova Crespo
> wrote:

The surfaces that backup the GPU buffers have a boundary check that
considers that access to partial dwords are considered out-of-bounds.
For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and the
last two bytes would always be read as 0 or its writting ignored.

The introduction of 16-bit types implies that we need to align the size
to 4-bytes multiples so that partial dwords could be read/written.
Adding an inconditional +2 size to buffers not being multiple of 2
solves this issue for the general cases of UBO or SSBO.

But, when unsized arrays of 16-bit elements are used it is not possible
to know if the size was padded or not. To solve this issue the
implementation calculates the needed size of the buffer surfaces,
as suggested by Jason:

surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size



Changed also the commit log according the the comments below.



So when we calculate backwards the buffer_size in the backend we
update the resinfo return value with:

buffer_size = (surface_size & ~3) - (surface_size & 3)

It is also exposed this buffer requirements when robust buffer access
is enabled so these buffer sizes recommend being multiple of 4.

v2: (Jason Ekstrand)
    Move padding logic fron anv to isl_surface_state
    Move calculus of original size from spirv to driver backend
---
 src/intel/compiler/brw_fs_nir.cpp | 27 ++-
 src/intel/isl/isl_surface_state.c | 21 -
 src/intel/vulkan/anv_device.c     | 11 +++
 3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index 8efec34cc9d..d017af040b4 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const
fs_builder , nir_intrinsic_instr *instr
       inst->mlen = 1;
       inst->size_written = 4 * REG_SIZE;

-      bld.MOV(retype(dest, ret_payload.type),
component(ret_payload, 0));
+      /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and
Faulting:
+       *
+       * "Out-of-bounds checking is always performed at a DWord
granularity. If
+       * any part of the DWord is out-of-bounds then the whole DWord is
+       * considered out-of-bounds."
+       *
+       * This implies that types with size smaller than 4-bytes
(16-bits) need


Yeah, 4-bytes (16-bits) is weird.  I'd just drop the "(16-bits)".


Completely agree.


+       * to be padded if they don't complete the last dword of the
buffer. But
+       * as we need to maintain the original size we need to
reverse the padding
+       * calculation to return the correct size to know the  number
of elements
+       * of an unsized array. As we stored in the last two bits of
the size
+       * of the buffer the needed padding we calculate here:
+       *
+       * buffer_size = resinfo_size & ~3 - resinfo_size & 3
+       */
+
+      fs_reg size_aligned32 = ubld.vgrf(BRW_REGISTER_TYPE_UD);


I'd call this aligned4 because it's in units of bytes.


Locally changed.


+      fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD);
+      fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD);
+
+      ubld.AND(size_padding, component(ret_payload, 0), brw_imm_ud(3));
+      ubld.AND(size_aligned32, component(ret_payload, 0),
brw_imm_ud(~3));


You don't really need the component() here.


Removed.


+      ubld.ADD(buffer_size, size_aligned32, negate (size_padding));


Removed space after negate


+
+      bld.MOV(retype(dest, ret_payload.type),
component(buffer_size, 0));
+
       brw_mark_surface_used(prog_data, index);
       break;
    }
diff --git a/src/intel/isl/isl_surface_state.c
b/src/intel/isl/isl_surface_state.c
index bfb27fa4a44..ddc9eb53c96 100644
--- a/src/intel/isl/isl_surface_state.c
+++ b/src/intel/isl/isl_surface_state.c
@@ -673,7 +673,26 @@ void
 isl_genX(buffer_fill_state_s)(void *state,
                               const struct
isl_buffer_fill_state_info *restrict info)
 {
-   uint32_t num_elements = info->size / info->stride;
+   uint64_t buffer_size = info->size;
+
+   /* Uniform and Storage buffers need to have surface size not
less that the
+    * aligned 32-bit size of the 

Re: [Mesa-dev] [PATCH 1/7] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit (v2)

2018-02-26 Thread Chema Casanova
On 26/02/18 16:54, Jason Ekstrand wrote:
> On Mon, Feb 26, 2018 at 6:14 AM, Jose Maria Casanova Crespo
> > wrote:
> 
> The surfaces that backup the GPU buffers have a boundary check that
> considers that access to partial dwords are considered out-of-bounds.
> For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and the
> last two bytes would always be read as 0 or its writting ignored.
> 
> The introduction of 16-bit types implies that we need to align the size
> to 4-bytes multiples so that partial dwords could be read/written.
> Adding an inconditional +2 size to buffers not being multiple of 2
> solves this issue for the general cases of UBO or SSBO.
> 
> But, when unsized arrays of 16-bit elements are used it is not possible
> to know if the size was padded or not. To solve this issue the
> implementation calculates the needed size of the buffer surfaces,
> as suggested by Jason:
> 
> surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size


Changed also the commit log according the the comments below.

> 
> So when we calculate backwards the buffer_size in the backend we
> update the resinfo return value with:
> 
> buffer_size = (surface_size & ~3) - (surface_size & 3)
> 
> It is also exposed this buffer requirements when robust buffer access
> is enabled so these buffer sizes recommend being multiple of 4.
> 
> v2: (Jason Ekstrand)
>     Move padding logic fron anv to isl_surface_state
>     Move calculus of original size from spirv to driver backend
> ---
>  src/intel/compiler/brw_fs_nir.cpp | 27 ++-
>  src/intel/isl/isl_surface_state.c | 21 -
>  src/intel/vulkan/anv_device.c     | 11 +++
>  3 files changed, 57 insertions(+), 2 deletions(-)
> 
> diff --git a/src/intel/compiler/brw_fs_nir.cpp
> b/src/intel/compiler/brw_fs_nir.cpp
> index 8efec34cc9d..d017af040b4 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const
> fs_builder , nir_intrinsic_instr *instr
>        inst->mlen = 1;
>        inst->size_written = 4 * REG_SIZE;
> 
> -      bld.MOV(retype(dest, ret_payload.type),
> component(ret_payload, 0));
> +      /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and
> Faulting:
> +       *
> +       * "Out-of-bounds checking is always performed at a DWord
> granularity. If
> +       * any part of the DWord is out-of-bounds then the whole DWord is
> +       * considered out-of-bounds."
> +       *
> +       * This implies that types with size smaller than 4-bytes
> (16-bits) need
> 
> 
> Yeah, 4-bytes (16-bits) is weird.  I'd just drop the "(16-bits)".

Completely agree.

> +       * to be padded if they don't complete the last dword of the
> buffer. But
> +       * as we need to maintain the original size we need to
> reverse the padding
> +       * calculation to return the correct size to know the  number
> of elements
> +       * of an unsized array. As we stored in the last two bits of
> the size
> +       * of the buffer the needed padding we calculate here:
> +       *
> +       * buffer_size = resinfo_size & ~3 - resinfo_size & 3
> +       */
> +
> +      fs_reg size_aligned32 = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> 
> 
> I'd call this aligned4 because it's in units of bytes.

Locally changed.

> +      fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> +      fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> +
> +      ubld.AND(size_padding, component(ret_payload, 0), brw_imm_ud(3));
> +      ubld.AND(size_aligned32, component(ret_payload, 0),
> brw_imm_ud(~3));
> 
> 
> You don't really need the component() here.

Removed.

> +      ubld.ADD(buffer_size, size_aligned32, negate (size_padding));

Removed space after negate

> +
> +      bld.MOV(retype(dest, ret_payload.type),
> component(buffer_size, 0));
> +
>        brw_mark_surface_used(prog_data, index);
>        break;
>     }
> diff --git a/src/intel/isl/isl_surface_state.c
> b/src/intel/isl/isl_surface_state.c
> index bfb27fa4a44..ddc9eb53c96 100644
> --- a/src/intel/isl/isl_surface_state.c
> +++ b/src/intel/isl/isl_surface_state.c
> @@ -673,7 +673,26 @@ void
>  isl_genX(buffer_fill_state_s)(void *state,
>                                const struct
> isl_buffer_fill_state_info *restrict info)
>  {
> -   uint32_t num_elements = info->size / info->stride;
> +   uint64_t buffer_size = info->size;
> +
> +   /* Uniform and Storage buffers need to have surface size not
> less that the
> +    * aligned 32-bit size of the buffer. To calculate the array
>

Re: [Mesa-dev] [PATCH 1/7] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit (v2)

2018-02-26 Thread Chema Casanova
On 26/02/18 15:40, Ilia Mirkin wrote:
> On Mon, Feb 26, 2018 at 9:14 AM, Jose Maria Casanova Crespo
>  wrote:
>> The surfaces that backup the GPU buffers have a boundary check that
>> considers that access to partial dwords are considered out-of-bounds.
>> For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and the
>> last two bytes would always be read as 0 or its writting ignored.
>>
>> The introduction of 16-bit types implies that we need to align the size
>> to 4-bytes multiples so that partial dwords could be read/written.
>> Adding an inconditional +2 size to buffers not being multiple of 2
>> solves this issue for the general cases of UBO or SSBO.
>>
>> But, when unsized arrays of 16-bit elements are used it is not possible
>> to know if the size was padded or not. To solve this issue the
>> implementation calculates the needed size of the buffer surfaces,
>> as suggested by Jason:
>>
>> surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size
>>
>> So when we calculate backwards the buffer_size in the backend we
>> update the resinfo return value with:
>>
>> buffer_size = (surface_size & ~3) - (surface_size & 3)
>>
>> It is also exposed this buffer requirements when robust buffer access
>> is enabled so these buffer sizes recommend being multiple of 4.
>>
>> v2: (Jason Ekstrand)
>> Move padding logic fron anv to isl_surface_state
>> Move calculus of original size from spirv to driver backend
>> ---
>>  src/intel/compiler/brw_fs_nir.cpp | 27 ++-
>>  src/intel/isl/isl_surface_state.c | 21 -
>>  src/intel/vulkan/anv_device.c | 11 +++
>>  3 files changed, 57 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/intel/compiler/brw_fs_nir.cpp 
>> b/src/intel/compiler/brw_fs_nir.cpp
>> index 8efec34cc9d..d017af040b4 100644
>> --- a/src/intel/compiler/brw_fs_nir.cpp
>> +++ b/src/intel/compiler/brw_fs_nir.cpp
>> @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
>> nir_intrinsic_instr *instr
>>inst->mlen = 1;
>>inst->size_written = 4 * REG_SIZE;
>>
>> -  bld.MOV(retype(dest, ret_payload.type), component(ret_payload, 0));
>> +  /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and 
>> Faulting:
>> +   *
>> +   * "Out-of-bounds checking is always performed at a DWord 
>> granularity. If
>> +   * any part of the DWord is out-of-bounds then the whole DWord is
>> +   * considered out-of-bounds."
>> +   *
>> +   * This implies that types with size smaller than 4-bytes (16-bits) 
>> need
> 
> 32 bits?

The 16-bits was a kind of example, of type with size less than 4-bytes,
so better remove it.

Thanks

Chema

>> +   * to be padded if they don't complete the last dword of the buffer. 
>> But
>> +   * as we need to maintain the original size we need to reverse the 
>> padding
>> +   * calculation to return the correct size to know the  number of 
>> elements
>> +   * of an unsized array. As we stored in the last two bits of the size
>> +   * of the buffer the needed padding we calculate here:
>> +   *
>> +   * buffer_size = resinfo_size & ~3 - resinfo_size & 3
>> +   */
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/7] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit (v2)

2018-02-26 Thread Jason Ekstrand
On Mon, Feb 26, 2018 at 6:14 AM, Jose Maria Casanova Crespo <
jmcasan...@igalia.com> wrote:

> The surfaces that backup the GPU buffers have a boundary check that
> considers that access to partial dwords are considered out-of-bounds.
> For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and the
> last two bytes would always be read as 0 or its writting ignored.
>
> The introduction of 16-bit types implies that we need to align the size
> to 4-bytes multiples so that partial dwords could be read/written.
> Adding an inconditional +2 size to buffers not being multiple of 2
> solves this issue for the general cases of UBO or SSBO.
>
> But, when unsized arrays of 16-bit elements are used it is not possible
> to know if the size was padded or not. To solve this issue the
> implementation calculates the needed size of the buffer surfaces,
> as suggested by Jason:
>
> surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size
>
> So when we calculate backwards the buffer_size in the backend we
> update the resinfo return value with:
>
> buffer_size = (surface_size & ~3) - (surface_size & 3)
>
> It is also exposed this buffer requirements when robust buffer access
> is enabled so these buffer sizes recommend being multiple of 4.
>
> v2: (Jason Ekstrand)
> Move padding logic fron anv to isl_surface_state
> Move calculus of original size from spirv to driver backend
> ---
>  src/intel/compiler/brw_fs_nir.cpp | 27 ++-
>  src/intel/isl/isl_surface_state.c | 21 -
>  src/intel/vulkan/anv_device.c | 11 +++
>  3 files changed, 57 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_nir.cpp
> b/src/intel/compiler/brw_fs_nir.cpp
> index 8efec34cc9d..d017af040b4 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const fs_builder
> , nir_intrinsic_instr *instr
>inst->mlen = 1;
>inst->size_written = 4 * REG_SIZE;
>
> -  bld.MOV(retype(dest, ret_payload.type), component(ret_payload, 0));
> +  /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and
> Faulting:
> +   *
> +   * "Out-of-bounds checking is always performed at a DWord
> granularity. If
> +   * any part of the DWord is out-of-bounds then the whole DWord is
> +   * considered out-of-bounds."
> +   *
> +   * This implies that types with size smaller than 4-bytes (16-bits)
> need
>

Yeah, 4-bytes (16-bits) is weird.  I'd just drop the "(16-bits)".


> +   * to be padded if they don't complete the last dword of the
> buffer. But
> +   * as we need to maintain the original size we need to reverse the
> padding
> +   * calculation to return the correct size to know the  number of
> elements
> +   * of an unsized array. As we stored in the last two bits of the
> size
> +   * of the buffer the needed padding we calculate here:
> +   *
> +   * buffer_size = resinfo_size & ~3 - resinfo_size & 3
> +   */
> +
> +  fs_reg size_aligned32 = ubld.vgrf(BRW_REGISTER_TYPE_UD);
>

I'd call this aligned4 because it's in units of bytes.


> +  fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> +  fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD);
> +
> +  ubld.AND(size_padding, component(ret_payload, 0), brw_imm_ud(3));
> +  ubld.AND(size_aligned32, component(ret_payload, 0), brw_imm_ud(~3));
>

You don't really need the component() here.


> +  ubld.ADD(buffer_size, size_aligned32, negate (size_padding));
> +
> +  bld.MOV(retype(dest, ret_payload.type), component(buffer_size, 0));
> +
>brw_mark_surface_used(prog_data, index);
>break;
> }
> diff --git a/src/intel/isl/isl_surface_state.c
> b/src/intel/isl/isl_surface_state.c
> index bfb27fa4a44..ddc9eb53c96 100644
> --- a/src/intel/isl/isl_surface_state.c
> +++ b/src/intel/isl/isl_surface_state.c
> @@ -673,7 +673,26 @@ void
>  isl_genX(buffer_fill_state_s)(void *state,
>const struct isl_buffer_fill_state_info
> *restrict info)
>  {
> -   uint32_t num_elements = info->size / info->stride;
> +   uint64_t buffer_size = info->size;
> +
> +   /* Uniform and Storage buffers need to have surface size not less that
> the
> +* aligned 32-bit size of the buffer. To calculate the array lenght on
> +* unsized arrays in StorageBuffer the last 2 bits store the padding
> size
> +* added to the surface, so we can calculate latter the original buffer
> +* size to know the number of elements.
> +*
> +*  surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size
>

isl_align


> +*
> +*  array_size = (surface_size & ~3) - (surface_size & 3)
> +*/
> +   if (buffer_size % 4 &&
>

You don't need this bit as the below calculation will be a no-op in that
case.


> +   (info->format == ISL_FORMAT_RAW  ||
> +info->stride < 

Re: [Mesa-dev] [PATCH 1/7] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit (v2)

2018-02-26 Thread Ilia Mirkin
On Mon, Feb 26, 2018 at 9:14 AM, Jose Maria Casanova Crespo
 wrote:
> The surfaces that backup the GPU buffers have a boundary check that
> considers that access to partial dwords are considered out-of-bounds.
> For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and the
> last two bytes would always be read as 0 or its writting ignored.
>
> The introduction of 16-bit types implies that we need to align the size
> to 4-bytes multiples so that partial dwords could be read/written.
> Adding an inconditional +2 size to buffers not being multiple of 2
> solves this issue for the general cases of UBO or SSBO.
>
> But, when unsized arrays of 16-bit elements are used it is not possible
> to know if the size was padded or not. To solve this issue the
> implementation calculates the needed size of the buffer surfaces,
> as suggested by Jason:
>
> surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size
>
> So when we calculate backwards the buffer_size in the backend we
> update the resinfo return value with:
>
> buffer_size = (surface_size & ~3) - (surface_size & 3)
>
> It is also exposed this buffer requirements when robust buffer access
> is enabled so these buffer sizes recommend being multiple of 4.
>
> v2: (Jason Ekstrand)
> Move padding logic fron anv to isl_surface_state
> Move calculus of original size from spirv to driver backend
> ---
>  src/intel/compiler/brw_fs_nir.cpp | 27 ++-
>  src/intel/isl/isl_surface_state.c | 21 -
>  src/intel/vulkan/anv_device.c | 11 +++
>  3 files changed, 57 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_nir.cpp 
> b/src/intel/compiler/brw_fs_nir.cpp
> index 8efec34cc9d..d017af040b4 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
> nir_intrinsic_instr *instr
>inst->mlen = 1;
>inst->size_written = 4 * REG_SIZE;
>
> -  bld.MOV(retype(dest, ret_payload.type), component(ret_payload, 0));
> +  /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and Faulting:
> +   *
> +   * "Out-of-bounds checking is always performed at a DWord granularity. 
> If
> +   * any part of the DWord is out-of-bounds then the whole DWord is
> +   * considered out-of-bounds."
> +   *
> +   * This implies that types with size smaller than 4-bytes (16-bits) 
> need

32 bits?

> +   * to be padded if they don't complete the last dword of the buffer. 
> But
> +   * as we need to maintain the original size we need to reverse the 
> padding
> +   * calculation to return the correct size to know the  number of 
> elements
> +   * of an unsized array. As we stored in the last two bits of the size
> +   * of the buffer the needed padding we calculate here:
> +   *
> +   * buffer_size = resinfo_size & ~3 - resinfo_size & 3
> +   */
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/7] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit (v2)

2018-02-26 Thread Jose Maria Casanova Crespo
The surfaces that backup the GPU buffers have a boundary check that
considers that access to partial dwords are considered out-of-bounds.
For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and the
last two bytes would always be read as 0 or its writting ignored.

The introduction of 16-bit types implies that we need to align the size
to 4-bytes multiples so that partial dwords could be read/written.
Adding an inconditional +2 size to buffers not being multiple of 2
solves this issue for the general cases of UBO or SSBO.

But, when unsized arrays of 16-bit elements are used it is not possible
to know if the size was padded or not. To solve this issue the
implementation calculates the needed size of the buffer surfaces,
as suggested by Jason:

surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size

So when we calculate backwards the buffer_size in the backend we
update the resinfo return value with:

buffer_size = (surface_size & ~3) - (surface_size & 3)

It is also exposed this buffer requirements when robust buffer access
is enabled so these buffer sizes recommend being multiple of 4.

v2: (Jason Ekstrand)
Move padding logic fron anv to isl_surface_state
Move calculus of original size from spirv to driver backend
---
 src/intel/compiler/brw_fs_nir.cpp | 27 ++-
 src/intel/isl/isl_surface_state.c | 21 -
 src/intel/vulkan/anv_device.c | 11 +++
 3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 8efec34cc9d..d017af040b4 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
   inst->mlen = 1;
   inst->size_written = 4 * REG_SIZE;
 
-  bld.MOV(retype(dest, ret_payload.type), component(ret_payload, 0));
+  /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and Faulting:
+   *
+   * "Out-of-bounds checking is always performed at a DWord granularity. If
+   * any part of the DWord is out-of-bounds then the whole DWord is
+   * considered out-of-bounds."
+   *
+   * This implies that types with size smaller than 4-bytes (16-bits) need
+   * to be padded if they don't complete the last dword of the buffer. But
+   * as we need to maintain the original size we need to reverse the 
padding
+   * calculation to return the correct size to know the  number of elements
+   * of an unsized array. As we stored in the last two bits of the size
+   * of the buffer the needed padding we calculate here:
+   *
+   * buffer_size = resinfo_size & ~3 - resinfo_size & 3
+   */
+
+  fs_reg size_aligned32 = ubld.vgrf(BRW_REGISTER_TYPE_UD);
+  fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD);
+  fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD);
+
+  ubld.AND(size_padding, component(ret_payload, 0), brw_imm_ud(3));
+  ubld.AND(size_aligned32, component(ret_payload, 0), brw_imm_ud(~3));
+  ubld.ADD(buffer_size, size_aligned32, negate (size_padding));
+
+  bld.MOV(retype(dest, ret_payload.type), component(buffer_size, 0));
+
   brw_mark_surface_used(prog_data, index);
   break;
}
diff --git a/src/intel/isl/isl_surface_state.c 
b/src/intel/isl/isl_surface_state.c
index bfb27fa4a44..ddc9eb53c96 100644
--- a/src/intel/isl/isl_surface_state.c
+++ b/src/intel/isl/isl_surface_state.c
@@ -673,7 +673,26 @@ void
 isl_genX(buffer_fill_state_s)(void *state,
   const struct isl_buffer_fill_state_info 
*restrict info)
 {
-   uint32_t num_elements = info->size / info->stride;
+   uint64_t buffer_size = info->size;
+
+   /* Uniform and Storage buffers need to have surface size not less that the
+* aligned 32-bit size of the buffer. To calculate the array lenght on
+* unsized arrays in StorageBuffer the last 2 bits store the padding size
+* added to the surface, so we can calculate latter the original buffer
+* size to know the number of elements.
+*
+*  surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size
+*
+*  array_size = (surface_size & ~3) - (surface_size & 3)
+*/
+   if (buffer_size % 4 &&
+   (info->format == ISL_FORMAT_RAW  ||
+info->stride < isl_format_get_layout(info->format)->bpb / 8)) {
+  assert(info->stride == 1);
+  buffer_size = 2 * isl_align(buffer_size, 4) - buffer_size;
+   }
+
+   uint32_t num_elements = buffer_size / info->stride;
 
if (GEN_GEN >= 7) {
   /* From the IVB PRM, SURFACE_STATE::Height,
diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index a83b7a39f6a..cedeed56219 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -2103,6 +2103,17 @@ void anv_GetBufferMemoryRequirements(
 
pMemoryRequirements->size = buffer->size;