Re: [Mesa-dev] [PATCH] radv: set cb base tile swizzles for MRT speedups (v3)

2017-07-13 Thread Alex Smith
On 13 July 2017 at 04:35, Dave Airlie  wrote:
> On 11 July 2017 at 23:49, Alex Smith  wrote:
>> On 11 July 2017 at 14:27, Alex Smith  wrote:
>>>
>>> On 10 July 2017 at 05:59, Dave Airlie  wrote:

 From: Dave Airlie 

 This patch uses addrlib to workout the tile swizzles according
 to the surface index. It seems to produce the same values as
 amdgpu-pro for the deferred test.

 v2: don't apply swizzle to CMASK. the eg docs don't mention
 it, and we clearly don't align cmask for that.
 v3: disable surf index for dedicated images, as these will
 most likely be shared, and I don't think the metadata has
 space for this info in it yet.
>>>
>>>
>>> FWIW, disabling this for images marked as dedicated means this won't get
>>> any improvements for render targets on our games. We create all render
>>> targets as dedicated when NV_dedicated_allocation is available since this
>>> gets us significant perf improvement on NVIDIA.
>>>
>>> If it's not currently possible to have this enabled for dedicated images
>>> we could avoid using it on AMD, though I'm curious if there's likely to be
>>> any other perf benefits to marking RTs as dedicated we'd then be missing out
>>> on? I've not done any testing to see if there's any benefit from using it.
>>
>>
>> Realised this possibly didn't sound clear - what I'm asking is does using
>> NV_dedicated_allocation give any perf benefit on RADV at all like it does
>> for NV? If not we could avoid it to get the benefits from this patch.
>
> No on radv we really only want dedicated allocation for shared resources
> i.e. external memory ones. I don't think you'd get any speed up from using it.
>
> I'd like to figure out a way to make it work with this feature, but I
> think it means
> enhancing the metadata a bit.

Fair enough, in that case I'll disable our use of the extension on RADV.

Thanks,
Alex
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: set cb base tile swizzles for MRT speedups (v3)

2017-07-12 Thread Dave Airlie
On 11 July 2017 at 23:49, Alex Smith  wrote:
> On 11 July 2017 at 14:27, Alex Smith  wrote:
>>
>> On 10 July 2017 at 05:59, Dave Airlie  wrote:
>>>
>>> From: Dave Airlie 
>>>
>>> This patch uses addrlib to workout the tile swizzles according
>>> to the surface index. It seems to produce the same values as
>>> amdgpu-pro for the deferred test.
>>>
>>> v2: don't apply swizzle to CMASK. the eg docs don't mention
>>> it, and we clearly don't align cmask for that.
>>> v3: disable surf index for dedicated images, as these will
>>> most likely be shared, and I don't think the metadata has
>>> space for this info in it yet.
>>
>>
>> FWIW, disabling this for images marked as dedicated means this won't get
>> any improvements for render targets on our games. We create all render
>> targets as dedicated when NV_dedicated_allocation is available since this
>> gets us significant perf improvement on NVIDIA.
>>
>> If it's not currently possible to have this enabled for dedicated images
>> we could avoid using it on AMD, though I'm curious if there's likely to be
>> any other perf benefits to marking RTs as dedicated we'd then be missing out
>> on? I've not done any testing to see if there's any benefit from using it.
>
>
> Realised this possibly didn't sound clear - what I'm asking is does using
> NV_dedicated_allocation give any perf benefit on RADV at all like it does
> for NV? If not we could avoid it to get the benefits from this patch.

No on radv we really only want dedicated allocation for shared resources
i.e. external memory ones. I don't think you'd get any speed up from using it.

I'd like to figure out a way to make it work with this feature, but I
think it means
enhancing the metadata a bit.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: set cb base tile swizzles for MRT speedups (v3)

2017-07-11 Thread Alex Smith
On 11 July 2017 at 14:27, Alex Smith  wrote:

> On 10 July 2017 at 05:59, Dave Airlie  wrote:
>
>> From: Dave Airlie 
>>
>> This patch uses addrlib to workout the tile swizzles according
>> to the surface index. It seems to produce the same values as
>> amdgpu-pro for the deferred test.
>>
>> v2: don't apply swizzle to CMASK. the eg docs don't mention
>> it, and we clearly don't align cmask for that.
>> v3: disable surf index for dedicated images, as these will
>> most likely be shared, and I don't think the metadata has
>> space for this info in it yet.
>>
>
> FWIW, disabling this for images marked as dedicated means this won't get
> any improvements for render targets on our games. We create all render
> targets as dedicated when NV_dedicated_allocation is available since this
> gets us significant perf improvement on NVIDIA.
>
> If it's not currently possible to have this enabled for dedicated images
> we could avoid using it on AMD, though I'm curious if there's likely to be
> any other perf benefits to marking RTs as dedicated we'd then be missing
> out on? I've not done any testing to see if there's any benefit from using
> it.
>

Realised this possibly didn't sound clear - what I'm asking is does using
NV_dedicated_allocation give any perf benefit on RADV at all like it does
for NV? If not we could avoid it to get the benefits from this patch.

Alex


>
> Thanks,
> Alex
>
>
>> This gets the deferred demo from 730->950fps on my rx480.
>> (dcc cmask elim predication patches get it further)
>> I'm also seeing some improvements in Mad Max at 4K
>>
>> Signed-off-by: Dave Airlie 
>>
>> fixup for dedicate
>> ---
>>  src/amd/common/ac_surface.c   | 14 ++
>>  src/amd/common/ac_surface.h   |  2 ++
>>  src/amd/vulkan/radv_device.c  |  7 ++-
>>  src/amd/vulkan/radv_image.c   | 19 ++-
>>  src/amd/vulkan/radv_private.h |  2 ++
>>  5 files changed, 42 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/amd/common/ac_surface.c b/src/amd/common/ac_surface.c
>> index 23fb66b..0aebacc 100644
>> --- a/src/amd/common/ac_surface.c
>> +++ b/src/amd/common/ac_surface.c
>> @@ -692,6 +692,20 @@ static int gfx6_compute_surface(ADDR_HANDLE addrlib,
>> surf->htile_size *= 2;
>>
>> surf->is_linear = surf->u.legacy.level[0].mode ==
>> RADEON_SURF_MODE_LINEAR_ALIGNED;
>> +
>> +   /* workout base swizzle */
>> +   if (!(surf->flags & RADEON_SURF_Z_OR_SBUFFER)) {
>> +   ADDR_COMPUTE_BASE_SWIZZLE_INPUT AddrBaseSwizzleIn = {0};
>> +   ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT AddrBaseSwizzleOut =
>> {0};
>> +
>> +   AddrBaseSwizzleIn.surfIndex = config->info.surf_index;
>> +   AddrBaseSwizzleIn.tileIndex = AddrSurfInfoIn.tileIndex;
>> +   AddrBaseSwizzleIn.macroModeIndex =
>> AddrSurfInfoOut.macroModeIndex;
>> +   AddrBaseSwizzleIn.pTileInfo = AddrSurfInfoOut.pTileInfo;
>> +   AddrBaseSwizzleIn.tileMode = AddrSurfInfoOut.tileMode;
>> +   AddrComputeBaseSwizzle(addrlib, &AddrBaseSwizzleIn,
>> &AddrBaseSwizzleOut);
>> +   surf->u.legacy.combined_swizzle =
>> AddrBaseSwizzleOut.tileSwizzle;
>> +   }
>> return 0;
>>  }
>>
>> diff --git a/src/amd/common/ac_surface.h b/src/amd/common/ac_surface.h
>> index 4d893ff..7901b86 100644
>> --- a/src/amd/common/ac_surface.h
>> +++ b/src/amd/common/ac_surface.h
>> @@ -97,6 +97,7 @@ struct legacy_surf_layout {
>>  unsigneddepth_adjusted:1;
>>  unsignedstencil_adjusted:1;
>>
>> +uint8_t combined_swizzle;
>>  struct legacy_surf_levellevel[RADEON_SURF_MAX_LEVELS];
>>  struct legacy_surf_levelstencil_level[RADEON_SURF_MAX_LEVELS];
>>  uint8_t tiling_index[RADEON_SURF_MAX_LEVELS];
>> @@ -194,6 +195,7 @@ struct ac_surf_info {
>> uint32_t width;
>> uint32_t height;
>> uint32_t depth;
>> +   uint32_t surf_index;
>> uint8_t samples;
>> uint8_t levels;
>> uint16_t array_size;
>> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
>> index 789c90d..eb77914 100644
>> --- a/src/amd/vulkan/radv_device.c
>> +++ b/src/amd/vulkan/radv_device.c
>> @@ -2757,7 +2757,8 @@ radv_initialise_color_surface(struct radv_device
>> *device,
>> }
>>
>> cb->cb_color_base = va >> 8;
>> -
>> +   if (device->physical_device->rad_info.chip_class < GFX9)
>> +   cb->cb_color_base |= iview->image->surface.u.legacy
>> .combined_swizzle;
>> /* CMASK variables */
>> va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
>> va += iview->image->cmask.offset;
>> @@ -2766,6 +2767,8 @@ radv_initialise_color_surface(struct radv_device
>> *device,
>> va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
>> va += iview->image->dcc_offset;
>> cb->cb_dcc_base = va >> 8;
>> +   if (device->phys

Re: [Mesa-dev] [PATCH] radv: set cb base tile swizzles for MRT speedups (v3)

2017-07-11 Thread Alex Smith
On 10 July 2017 at 05:59, Dave Airlie  wrote:

> From: Dave Airlie 
>
> This patch uses addrlib to workout the tile swizzles according
> to the surface index. It seems to produce the same values as
> amdgpu-pro for the deferred test.
>
> v2: don't apply swizzle to CMASK. the eg docs don't mention
> it, and we clearly don't align cmask for that.
> v3: disable surf index for dedicated images, as these will
> most likely be shared, and I don't think the metadata has
> space for this info in it yet.
>

FWIW, disabling this for images marked as dedicated means this won't get
any improvements for render targets on our games. We create all render
targets as dedicated when NV_dedicated_allocation is available since this
gets us significant perf improvement on NVIDIA.

If it's not currently possible to have this enabled for dedicated images we
could avoid using it on AMD, though I'm curious if there's likely to be any
other perf benefits to marking RTs as dedicated we'd then be missing out
on? I've not done any testing to see if there's any benefit from using it.

Thanks,
Alex


> This gets the deferred demo from 730->950fps on my rx480.
> (dcc cmask elim predication patches get it further)
> I'm also seeing some improvements in Mad Max at 4K
>
> Signed-off-by: Dave Airlie 
>
> fixup for dedicate
> ---
>  src/amd/common/ac_surface.c   | 14 ++
>  src/amd/common/ac_surface.h   |  2 ++
>  src/amd/vulkan/radv_device.c  |  7 ++-
>  src/amd/vulkan/radv_image.c   | 19 ++-
>  src/amd/vulkan/radv_private.h |  2 ++
>  5 files changed, 42 insertions(+), 2 deletions(-)
>
> diff --git a/src/amd/common/ac_surface.c b/src/amd/common/ac_surface.c
> index 23fb66b..0aebacc 100644
> --- a/src/amd/common/ac_surface.c
> +++ b/src/amd/common/ac_surface.c
> @@ -692,6 +692,20 @@ static int gfx6_compute_surface(ADDR_HANDLE addrlib,
> surf->htile_size *= 2;
>
> surf->is_linear = surf->u.legacy.level[0].mode ==
> RADEON_SURF_MODE_LINEAR_ALIGNED;
> +
> +   /* workout base swizzle */
> +   if (!(surf->flags & RADEON_SURF_Z_OR_SBUFFER)) {
> +   ADDR_COMPUTE_BASE_SWIZZLE_INPUT AddrBaseSwizzleIn = {0};
> +   ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT AddrBaseSwizzleOut = {0};
> +
> +   AddrBaseSwizzleIn.surfIndex = config->info.surf_index;
> +   AddrBaseSwizzleIn.tileIndex = AddrSurfInfoIn.tileIndex;
> +   AddrBaseSwizzleIn.macroModeIndex = AddrSurfInfoOut.
> macroModeIndex;
> +   AddrBaseSwizzleIn.pTileInfo = AddrSurfInfoOut.pTileInfo;
> +   AddrBaseSwizzleIn.tileMode = AddrSurfInfoOut.tileMode;
> +   AddrComputeBaseSwizzle(addrlib, &AddrBaseSwizzleIn,
> &AddrBaseSwizzleOut);
> +   surf->u.legacy.combined_swizzle = AddrBaseSwizzleOut.
> tileSwizzle;
> +   }
> return 0;
>  }
>
> diff --git a/src/amd/common/ac_surface.h b/src/amd/common/ac_surface.h
> index 4d893ff..7901b86 100644
> --- a/src/amd/common/ac_surface.h
> +++ b/src/amd/common/ac_surface.h
> @@ -97,6 +97,7 @@ struct legacy_surf_layout {
>  unsigneddepth_adjusted:1;
>  unsignedstencil_adjusted:1;
>
> +uint8_t combined_swizzle;
>  struct legacy_surf_levellevel[RADEON_SURF_MAX_LEVELS];
>  struct legacy_surf_levelstencil_level[RADEON_SURF_MAX_LEVELS];
>  uint8_t tiling_index[RADEON_SURF_MAX_LEVELS];
> @@ -194,6 +195,7 @@ struct ac_surf_info {
> uint32_t width;
> uint32_t height;
> uint32_t depth;
> +   uint32_t surf_index;
> uint8_t samples;
> uint8_t levels;
> uint16_t array_size;
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index 789c90d..eb77914 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -2757,7 +2757,8 @@ radv_initialise_color_surface(struct radv_device
> *device,
> }
>
> cb->cb_color_base = va >> 8;
> -
> +   if (device->physical_device->rad_info.chip_class < GFX9)
> +   cb->cb_color_base |= iview->image->surface.u.
> legacy.combined_swizzle;
> /* CMASK variables */
> va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
> va += iview->image->cmask.offset;
> @@ -2766,6 +2767,8 @@ radv_initialise_color_surface(struct radv_device
> *device,
> va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
> va += iview->image->dcc_offset;
> cb->cb_dcc_base = va >> 8;
> +   if (device->physical_device->rad_info.chip_class < GFX9)
> +   cb->cb_dcc_base |= iview->image->surface.u.
> legacy.combined_swizzle;
>
> uint32_t max_slice = radv_surface_layer_count(iview);
> cb->cb_color_view = S_028C6C_SLICE_START(iview->base_layer) |
> @@ -2781,6 +2784,8 @@ radv_initialise_color_surface(struct radv_device
> *device,
> if (iview->image->fmask.size) {
>