Re: [Mesa-dev] [PATCH] radv: set cb base tile swizzles for MRT speedups (v3)
On 13 July 2017 at 04:35, Dave Airlie wrote: > On 11 July 2017 at 23:49, Alex Smith wrote: >> On 11 July 2017 at 14:27, Alex Smith wrote: >>> >>> On 10 July 2017 at 05:59, Dave Airlie wrote: From: Dave Airlie This patch uses addrlib to workout the tile swizzles according to the surface index. It seems to produce the same values as amdgpu-pro for the deferred test. v2: don't apply swizzle to CMASK. the eg docs don't mention it, and we clearly don't align cmask for that. v3: disable surf index for dedicated images, as these will most likely be shared, and I don't think the metadata has space for this info in it yet. >>> >>> >>> FWIW, disabling this for images marked as dedicated means this won't get >>> any improvements for render targets on our games. We create all render >>> targets as dedicated when NV_dedicated_allocation is available since this >>> gets us significant perf improvement on NVIDIA. >>> >>> If it's not currently possible to have this enabled for dedicated images >>> we could avoid using it on AMD, though I'm curious if there's likely to be >>> any other perf benefits to marking RTs as dedicated we'd then be missing out >>> on? I've not done any testing to see if there's any benefit from using it. >> >> >> Realised this possibly didn't sound clear - what I'm asking is does using >> NV_dedicated_allocation give any perf benefit on RADV at all like it does >> for NV? If not we could avoid it to get the benefits from this patch. > > No on radv we really only want dedicated allocation for shared resources > i.e. external memory ones. I don't think you'd get any speed up from using it. > > I'd like to figure out a way to make it work with this feature, but I > think it means > enhancing the metadata a bit. Fair enough, in that case I'll disable our use of the extension on RADV. Thanks, Alex ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: set cb base tile swizzles for MRT speedups (v3)
On 11 July 2017 at 23:49, Alex Smith wrote: > On 11 July 2017 at 14:27, Alex Smith wrote: >> >> On 10 July 2017 at 05:59, Dave Airlie wrote: >>> >>> From: Dave Airlie >>> >>> This patch uses addrlib to workout the tile swizzles according >>> to the surface index. It seems to produce the same values as >>> amdgpu-pro for the deferred test. >>> >>> v2: don't apply swizzle to CMASK. the eg docs don't mention >>> it, and we clearly don't align cmask for that. >>> v3: disable surf index for dedicated images, as these will >>> most likely be shared, and I don't think the metadata has >>> space for this info in it yet. >> >> >> FWIW, disabling this for images marked as dedicated means this won't get >> any improvements for render targets on our games. We create all render >> targets as dedicated when NV_dedicated_allocation is available since this >> gets us significant perf improvement on NVIDIA. >> >> If it's not currently possible to have this enabled for dedicated images >> we could avoid using it on AMD, though I'm curious if there's likely to be >> any other perf benefits to marking RTs as dedicated we'd then be missing out >> on? I've not done any testing to see if there's any benefit from using it. > > > Realised this possibly didn't sound clear - what I'm asking is does using > NV_dedicated_allocation give any perf benefit on RADV at all like it does > for NV? If not we could avoid it to get the benefits from this patch. No on radv we really only want dedicated allocation for shared resources i.e. external memory ones. I don't think you'd get any speed up from using it. I'd like to figure out a way to make it work with this feature, but I think it means enhancing the metadata a bit. Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: set cb base tile swizzles for MRT speedups (v3)
On 11 July 2017 at 14:27, Alex Smith wrote: > On 10 July 2017 at 05:59, Dave Airlie wrote: > >> From: Dave Airlie >> >> This patch uses addrlib to workout the tile swizzles according >> to the surface index. It seems to produce the same values as >> amdgpu-pro for the deferred test. >> >> v2: don't apply swizzle to CMASK. the eg docs don't mention >> it, and we clearly don't align cmask for that. >> v3: disable surf index for dedicated images, as these will >> most likely be shared, and I don't think the metadata has >> space for this info in it yet. >> > > FWIW, disabling this for images marked as dedicated means this won't get > any improvements for render targets on our games. We create all render > targets as dedicated when NV_dedicated_allocation is available since this > gets us significant perf improvement on NVIDIA. > > If it's not currently possible to have this enabled for dedicated images > we could avoid using it on AMD, though I'm curious if there's likely to be > any other perf benefits to marking RTs as dedicated we'd then be missing > out on? I've not done any testing to see if there's any benefit from using > it. > Realised this possibly didn't sound clear - what I'm asking is does using NV_dedicated_allocation give any perf benefit on RADV at all like it does for NV? If not we could avoid it to get the benefits from this patch. Alex > > Thanks, > Alex > > >> This gets the deferred demo from 730->950fps on my rx480. >> (dcc cmask elim predication patches get it further) >> I'm also seeing some improvements in Mad Max at 4K >> >> Signed-off-by: Dave Airlie >> >> fixup for dedicate >> --- >> src/amd/common/ac_surface.c | 14 ++ >> src/amd/common/ac_surface.h | 2 ++ >> src/amd/vulkan/radv_device.c | 7 ++- >> src/amd/vulkan/radv_image.c | 19 ++- >> src/amd/vulkan/radv_private.h | 2 ++ >> 5 files changed, 42 insertions(+), 2 deletions(-) >> >> diff --git a/src/amd/common/ac_surface.c b/src/amd/common/ac_surface.c >> index 23fb66b..0aebacc 100644 >> --- a/src/amd/common/ac_surface.c >> +++ b/src/amd/common/ac_surface.c >> @@ -692,6 +692,20 @@ static int gfx6_compute_surface(ADDR_HANDLE addrlib, >> surf->htile_size *= 2; >> >> surf->is_linear = surf->u.legacy.level[0].mode == >> RADEON_SURF_MODE_LINEAR_ALIGNED; >> + >> + /* workout base swizzle */ >> + if (!(surf->flags & RADEON_SURF_Z_OR_SBUFFER)) { >> + ADDR_COMPUTE_BASE_SWIZZLE_INPUT AddrBaseSwizzleIn = {0}; >> + ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT AddrBaseSwizzleOut = >> {0}; >> + >> + AddrBaseSwizzleIn.surfIndex = config->info.surf_index; >> + AddrBaseSwizzleIn.tileIndex = AddrSurfInfoIn.tileIndex; >> + AddrBaseSwizzleIn.macroModeIndex = >> AddrSurfInfoOut.macroModeIndex; >> + AddrBaseSwizzleIn.pTileInfo = AddrSurfInfoOut.pTileInfo; >> + AddrBaseSwizzleIn.tileMode = AddrSurfInfoOut.tileMode; >> + AddrComputeBaseSwizzle(addrlib, &AddrBaseSwizzleIn, >> &AddrBaseSwizzleOut); >> + surf->u.legacy.combined_swizzle = >> AddrBaseSwizzleOut.tileSwizzle; >> + } >> return 0; >> } >> >> diff --git a/src/amd/common/ac_surface.h b/src/amd/common/ac_surface.h >> index 4d893ff..7901b86 100644 >> --- a/src/amd/common/ac_surface.h >> +++ b/src/amd/common/ac_surface.h >> @@ -97,6 +97,7 @@ struct legacy_surf_layout { >> unsigneddepth_adjusted:1; >> unsignedstencil_adjusted:1; >> >> +uint8_t combined_swizzle; >> struct legacy_surf_levellevel[RADEON_SURF_MAX_LEVELS]; >> struct legacy_surf_levelstencil_level[RADEON_SURF_MAX_LEVELS]; >> uint8_t tiling_index[RADEON_SURF_MAX_LEVELS]; >> @@ -194,6 +195,7 @@ struct ac_surf_info { >> uint32_t width; >> uint32_t height; >> uint32_t depth; >> + uint32_t surf_index; >> uint8_t samples; >> uint8_t levels; >> uint16_t array_size; >> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c >> index 789c90d..eb77914 100644 >> --- a/src/amd/vulkan/radv_device.c >> +++ b/src/amd/vulkan/radv_device.c >> @@ -2757,7 +2757,8 @@ radv_initialise_color_surface(struct radv_device >> *device, >> } >> >> cb->cb_color_base = va >> 8; >> - >> + if (device->physical_device->rad_info.chip_class < GFX9) >> + cb->cb_color_base |= iview->image->surface.u.legacy >> .combined_swizzle; >> /* CMASK variables */ >> va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; >> va += iview->image->cmask.offset; >> @@ -2766,6 +2767,8 @@ radv_initialise_color_surface(struct radv_device >> *device, >> va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; >> va += iview->image->dcc_offset; >> cb->cb_dcc_base = va >> 8; >> + if (device->phys
Re: [Mesa-dev] [PATCH] radv: set cb base tile swizzles for MRT speedups (v3)
On 10 July 2017 at 05:59, Dave Airlie wrote: > From: Dave Airlie > > This patch uses addrlib to workout the tile swizzles according > to the surface index. It seems to produce the same values as > amdgpu-pro for the deferred test. > > v2: don't apply swizzle to CMASK. the eg docs don't mention > it, and we clearly don't align cmask for that. > v3: disable surf index for dedicated images, as these will > most likely be shared, and I don't think the metadata has > space for this info in it yet. > FWIW, disabling this for images marked as dedicated means this won't get any improvements for render targets on our games. We create all render targets as dedicated when NV_dedicated_allocation is available since this gets us significant perf improvement on NVIDIA. If it's not currently possible to have this enabled for dedicated images we could avoid using it on AMD, though I'm curious if there's likely to be any other perf benefits to marking RTs as dedicated we'd then be missing out on? I've not done any testing to see if there's any benefit from using it. Thanks, Alex > This gets the deferred demo from 730->950fps on my rx480. > (dcc cmask elim predication patches get it further) > I'm also seeing some improvements in Mad Max at 4K > > Signed-off-by: Dave Airlie > > fixup for dedicate > --- > src/amd/common/ac_surface.c | 14 ++ > src/amd/common/ac_surface.h | 2 ++ > src/amd/vulkan/radv_device.c | 7 ++- > src/amd/vulkan/radv_image.c | 19 ++- > src/amd/vulkan/radv_private.h | 2 ++ > 5 files changed, 42 insertions(+), 2 deletions(-) > > diff --git a/src/amd/common/ac_surface.c b/src/amd/common/ac_surface.c > index 23fb66b..0aebacc 100644 > --- a/src/amd/common/ac_surface.c > +++ b/src/amd/common/ac_surface.c > @@ -692,6 +692,20 @@ static int gfx6_compute_surface(ADDR_HANDLE addrlib, > surf->htile_size *= 2; > > surf->is_linear = surf->u.legacy.level[0].mode == > RADEON_SURF_MODE_LINEAR_ALIGNED; > + > + /* workout base swizzle */ > + if (!(surf->flags & RADEON_SURF_Z_OR_SBUFFER)) { > + ADDR_COMPUTE_BASE_SWIZZLE_INPUT AddrBaseSwizzleIn = {0}; > + ADDR_COMPUTE_BASE_SWIZZLE_OUTPUT AddrBaseSwizzleOut = {0}; > + > + AddrBaseSwizzleIn.surfIndex = config->info.surf_index; > + AddrBaseSwizzleIn.tileIndex = AddrSurfInfoIn.tileIndex; > + AddrBaseSwizzleIn.macroModeIndex = AddrSurfInfoOut. > macroModeIndex; > + AddrBaseSwizzleIn.pTileInfo = AddrSurfInfoOut.pTileInfo; > + AddrBaseSwizzleIn.tileMode = AddrSurfInfoOut.tileMode; > + AddrComputeBaseSwizzle(addrlib, &AddrBaseSwizzleIn, > &AddrBaseSwizzleOut); > + surf->u.legacy.combined_swizzle = AddrBaseSwizzleOut. > tileSwizzle; > + } > return 0; > } > > diff --git a/src/amd/common/ac_surface.h b/src/amd/common/ac_surface.h > index 4d893ff..7901b86 100644 > --- a/src/amd/common/ac_surface.h > +++ b/src/amd/common/ac_surface.h > @@ -97,6 +97,7 @@ struct legacy_surf_layout { > unsigneddepth_adjusted:1; > unsignedstencil_adjusted:1; > > +uint8_t combined_swizzle; > struct legacy_surf_levellevel[RADEON_SURF_MAX_LEVELS]; > struct legacy_surf_levelstencil_level[RADEON_SURF_MAX_LEVELS]; > uint8_t tiling_index[RADEON_SURF_MAX_LEVELS]; > @@ -194,6 +195,7 @@ struct ac_surf_info { > uint32_t width; > uint32_t height; > uint32_t depth; > + uint32_t surf_index; > uint8_t samples; > uint8_t levels; > uint16_t array_size; > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c > index 789c90d..eb77914 100644 > --- a/src/amd/vulkan/radv_device.c > +++ b/src/amd/vulkan/radv_device.c > @@ -2757,7 +2757,8 @@ radv_initialise_color_surface(struct radv_device > *device, > } > > cb->cb_color_base = va >> 8; > - > + if (device->physical_device->rad_info.chip_class < GFX9) > + cb->cb_color_base |= iview->image->surface.u. > legacy.combined_swizzle; > /* CMASK variables */ > va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; > va += iview->image->cmask.offset; > @@ -2766,6 +2767,8 @@ radv_initialise_color_surface(struct radv_device > *device, > va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; > va += iview->image->dcc_offset; > cb->cb_dcc_base = va >> 8; > + if (device->physical_device->rad_info.chip_class < GFX9) > + cb->cb_dcc_base |= iview->image->surface.u. > legacy.combined_swizzle; > > uint32_t max_slice = radv_surface_layer_count(iview); > cb->cb_color_view = S_028C6C_SLICE_START(iview->base_layer) | > @@ -2781,6 +2784,8 @@ radv_initialise_color_surface(struct radv_device > *device, > if (iview->image->fmask.size) { >