Re: [Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.
On Jul 8, 2017 1:59 PM, "Christian König" wrote: Am 08.07.2017 um 00:27 schrieb Marek Olšák: > On Fri, Jul 7, 2017 at 9:37 PM, Dave Airlie wrote: > >> On 8 July 2017 at 04:07, Christian König wrote: >> >>> Am 07.07.2017 um 18:51 schrieb Marek Olšák: >>> On Fri, Jul 7, 2017 at 11:18 AM, Christian König wrote: > What tilling format have the destination textures? > > Sounds like the offset is just added so that we distribute memory > accesses > more equally over memory channels. > You can't set an offset that is not aligned. The hardware ignores the low unaligned bits, so they have a different meaning. They specify pipe and bank rotation for macro tiling. It's like a state. It basically rotates the tile pattern. >>> >>> Yeah, I know. That's what I meant with distributing memory accesses more >>> equally over all channels. The lower bits select a memory bank swizzle >>> IIRC. >>> >>> I've tried years ago with R600 if shuffling them randomly could improve >>> performance, but MRT wasn't widely used and/or supported at that time. >>> >> I'd known this and forgotten, the public CIK docs say bits 0..7 must be >> zero, >> but I have older docs which had more info. It would be nice if we could >> get >> proper docs released for the bottom bits considering AMD are using them >> in their >> drivers. >> > I'm pretty sure AMD released that stuff years ago because I knew of it before starting to work for AMD. I think it was first released as addrlib source code. Some people might have had access to docs under NDA, but it wasn't known publicly. I didn't know it when I started at AMD. Marek The low 8 bits of the address are unused and can't be set, because > CB_COLOR0_BASE is shifted by 8 bits. We are really talking about bits > starting from 8 going higher. E.g. 8K alignment gives you 5 bits that > can be used to express the rotation. > > It would be good to know what registers have the bits that matter (i.e. >> BASE, >> FMASK, CMASK, DCC, and resource descriptors.) >> > The feature to select the memory pipe/bank to start with is implemented in the MC. So AFAIK all blocks are programmed the same way regarding this. E.g. you can use it for UVD/VCE as well. >> Then I suppose we'd need to know the algorithm for programming them, and >> if we need to make any allocations bigger in order to do so. >> > As far as I understand it you don't need to make anything bigger. Addrlib makes sure anyway that all pipe/banks are covered by a texture allocation as soon as you select some tilling mode (linear is obviously an exception). Regards, Christian. I expect this only starts to matter when we hit memory bandwidth limits, >> the deferred demo does 3 MRT, one depth at 2kx2k then samples from those >> down to 1280x720 displayed. This combined with a 3 instanced 57k vertex >> draw seemed to be enough to see the pain. (Maybe a GL example doing >> something >> similiar might show the problem for radeonsi). >> > Addrlib contains the encoding code for the base address pipe/bank bits. > > The other open question I have, is does this just matter for MRT or does >> texture >> sampling also get some boost from it, my hack patch does it for only >> surfaces which >> will end up attached to the CB. >> > Yes, it should be done for read-only textures too. > > I'll update the patch to not call it an offset but name them the tile >> rotation bits. >> > The proper name is "tile swizzle" or "pipe/bank swizzle". On gfx9, > it's called "pipe/bank xor". > > Marek > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.
Am 08.07.2017 um 00:27 schrieb Marek Olšák: On Fri, Jul 7, 2017 at 9:37 PM, Dave Airlie wrote: On 8 July 2017 at 04:07, Christian König wrote: Am 07.07.2017 um 18:51 schrieb Marek Olšák: On Fri, Jul 7, 2017 at 11:18 AM, Christian König wrote: What tilling format have the destination textures? Sounds like the offset is just added so that we distribute memory accesses more equally over memory channels. You can't set an offset that is not aligned. The hardware ignores the low unaligned bits, so they have a different meaning. They specify pipe and bank rotation for macro tiling. It's like a state. It basically rotates the tile pattern. Yeah, I know. That's what I meant with distributing memory accesses more equally over all channels. The lower bits select a memory bank swizzle IIRC. I've tried years ago with R600 if shuffling them randomly could improve performance, but MRT wasn't widely used and/or supported at that time. I'd known this and forgotten, the public CIK docs say bits 0..7 must be zero, but I have older docs which had more info. It would be nice if we could get proper docs released for the bottom bits considering AMD are using them in their drivers. I'm pretty sure AMD released that stuff years ago because I knew of it before starting to work for AMD. The low 8 bits of the address are unused and can't be set, because CB_COLOR0_BASE is shifted by 8 bits. We are really talking about bits starting from 8 going higher. E.g. 8K alignment gives you 5 bits that can be used to express the rotation. It would be good to know what registers have the bits that matter (i.e. BASE, FMASK, CMASK, DCC, and resource descriptors.) The feature to select the memory pipe/bank to start with is implemented in the MC. So AFAIK all blocks are programmed the same way regarding this. E.g. you can use it for UVD/VCE as well. Then I suppose we'd need to know the algorithm for programming them, and if we need to make any allocations bigger in order to do so. As far as I understand it you don't need to make anything bigger. Addrlib makes sure anyway that all pipe/banks are covered by a texture allocation as soon as you select some tilling mode (linear is obviously an exception). Regards, Christian. I expect this only starts to matter when we hit memory bandwidth limits, the deferred demo does 3 MRT, one depth at 2kx2k then samples from those down to 1280x720 displayed. This combined with a 3 instanced 57k vertex draw seemed to be enough to see the pain. (Maybe a GL example doing something similiar might show the problem for radeonsi). Addrlib contains the encoding code for the base address pipe/bank bits. The other open question I have, is does this just matter for MRT or does texture sampling also get some boost from it, my hack patch does it for only surfaces which will end up attached to the CB. Yes, it should be done for read-only textures too. I'll update the patch to not call it an offset but name them the tile rotation bits. The proper name is "tile swizzle" or "pipe/bank swizzle". On gfx9, it's called "pipe/bank xor". Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.
On Fri, Jul 7, 2017 at 9:37 PM, Dave Airlie wrote: > On 8 July 2017 at 04:07, Christian König wrote: >> Am 07.07.2017 um 18:51 schrieb Marek Olšák: >>> >>> On Fri, Jul 7, 2017 at 11:18 AM, Christian König >>> wrote: What tilling format have the destination textures? Sounds like the offset is just added so that we distribute memory accesses more equally over memory channels. >>> >>> You can't set an offset that is not aligned. The hardware ignores the >>> low unaligned bits, so they have a different meaning. They specify >>> pipe and bank rotation for macro tiling. It's like a state. It >>> basically rotates the tile pattern. >> >> >> Yeah, I know. That's what I meant with distributing memory accesses more >> equally over all channels. The lower bits select a memory bank swizzle IIRC. >> >> I've tried years ago with R600 if shuffling them randomly could improve >> performance, but MRT wasn't widely used and/or supported at that time. > > I'd known this and forgotten, the public CIK docs say bits 0..7 must be zero, > but I have older docs which had more info. It would be nice if we could get > proper docs released for the bottom bits considering AMD are using them in > their > drivers. The low 8 bits of the address are unused and can't be set, because CB_COLOR0_BASE is shifted by 8 bits. We are really talking about bits starting from 8 going higher. E.g. 8K alignment gives you 5 bits that can be used to express the rotation. > > It would be good to know what registers have the bits that matter (i.e. BASE, > FMASK, CMASK, DCC, and resource descriptors.) > > Then I suppose we'd need to know the algorithm for programming them, and > if we need to make any allocations bigger in order to do so. > > I expect this only starts to matter when we hit memory bandwidth limits, > the deferred demo does 3 MRT, one depth at 2kx2k then samples from those > down to 1280x720 displayed. This combined with a 3 instanced 57k vertex > draw seemed to be enough to see the pain. (Maybe a GL example doing something > similiar might show the problem for radeonsi). Addrlib contains the encoding code for the base address pipe/bank bits. > > The other open question I have, is does this just matter for MRT or does > texture > sampling also get some boost from it, my hack patch does it for only > surfaces which > will end up attached to the CB. Yes, it should be done for read-only textures too. > > I'll update the patch to not call it an offset but name them the tile > rotation bits. The proper name is "tile swizzle" or "pipe/bank swizzle". On gfx9, it's called "pipe/bank xor". Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount. (v2)
From: Dave Airlie (this patch doesn't seem to work fully, hopefully AMD can tell us more info on the rules, and how to calculate the magic). It appears that to get full access to memory bandwidth with MRT rendering the pro vulkan driver seems to offset each image by 0x3800. I'm not sure how that value is calculated. Glenn came up with the idea (probably what -pro does also) of just offseting every image in round robin order, in the hope that apps would create mrt images in sequence anyways. This attempts to do that using an atomic counter in the device. This gets the deferred demo from 800fps->1150fps on my rx480. (I've tested dota2 and talos still run at least after this) v2: acknowledge it isn't an offset but a tile rotation pattern. add a quote from evergreen docs --- src/amd/vulkan/radv_device.c | 8 src/amd/vulkan/radv_image.c | 22 ++ src/amd/vulkan/radv_private.h | 3 +++ 3 files changed, 25 insertions(+), 8 deletions(-) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 59efccf..fb15ed6 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -2756,16 +2756,16 @@ radv_initialise_color_surface(struct radv_device *device, } } - cb->cb_color_base = va >> 8; + cb->cb_color_base = (va >> 8) | iview->image->tile_rotate_bits; /* CMASK variables */ va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; va += iview->image->cmask.offset; - cb->cb_color_cmask = va >> 8; + cb->cb_color_cmask = (va >> 8) | iview->image->tile_rotate_bits; va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; va += iview->image->dcc_offset; - cb->cb_dcc_base = va >> 8; + cb->cb_dcc_base = (va >> 8) | iview->image->tile_rotate_bits; uint32_t max_slice = radv_surface_layer_count(iview); cb->cb_color_view = S_028C6C_SLICE_START(iview->base_layer) | @@ -2780,7 +2780,7 @@ radv_initialise_color_surface(struct radv_device *device, if (iview->image->fmask.size) { va = device->ws->buffer_get_va(iview->bo) + iview->image->offset + iview->image->fmask.offset; - cb->cb_color_fmask = va >> 8; + cb->cb_color_fmask = (va >> 8) | iview->image->tile_rotate_bits; } else { cb->cb_color_fmask = cb->cb_color_base; } diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c index b3a223b..b57a7d1 100644 --- a/src/amd/vulkan/radv_image.c +++ b/src/amd/vulkan/radv_image.c @@ -31,6 +31,7 @@ #include "sid.h" #include "gfx9d.h" #include "util/debug.h" +#include "util/u_atomic.h" static unsigned radv_choose_tiling(struct radv_device *Device, const struct radv_image_create_info *create_info) @@ -208,7 +209,7 @@ si_set_mutable_tex_desc_fields(struct radv_device *device, } else va += base_level_info->offset; - state[0] = va >> 8; + state[0] = (va >> 8) | image->tile_rotate_bits; state[1] &= C_008F14_BASE_ADDRESS_HI; state[1] |= S_008F14_BASE_ADDRESS_HI(va >> 40); state[3] |= S_008F1C_TILING_INDEX(si_tile_mode_index(image, base_level, @@ -223,8 +224,7 @@ si_set_mutable_tex_desc_fields(struct radv_device *device, if (chip_class <= VI) meta_va += base_level_info->dcc_offset; state[6] |= S_008F28_COMPRESSION_EN(1); - state[7] = meta_va >> 8; - + state[7] = (meta_va >> 8) | image->tile_rotate_bits; } } @@ -471,7 +471,7 @@ si_make_texture_descriptor(struct radv_device *device, num_format = V_008F14_IMG_NUM_FORMAT_UINT; } - fmask_state[0] = va >> 8; + fmask_state[0] = (va >> 8) | image->tile_rotate_bits; fmask_state[1] = S_008F14_BASE_ADDRESS_HI(va >> 40) | S_008F14_DATA_FORMAT_GFX6(fmask_format) | S_008F14_NUM_FORMAT_GFX6(num_format); @@ -801,6 +801,20 @@ radv_image_create(VkDevice _device, image->size = image->surface.surf_size; image->alignment = image->surface.surf_alignment; + if ((pCreateInfo->usage & VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT) && !create_info->scanout) { + /* +* from the evergreen docs - +* Bits [p-1:0] of this field, where p = +* log2(numPipes), specifiy the pipe swizzle. Bits [p+b- +* 1:p], where b = log2(numBanks) specify the bank +* swizzle. +* this may not be correct for GCN gpus. + */ + uint32_t mrt_idx = p_atomic_inc_return(&device->image_mrt_offset_counter) - 1; + mrt_idx %= 4; + image->tile_rotate_bits = 0x38 * mrt_idx; + } + if (image->exc
Re: [Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.
On 8 July 2017 at 04:07, Christian König wrote: > Am 07.07.2017 um 18:51 schrieb Marek Olšák: >> >> On Fri, Jul 7, 2017 at 11:18 AM, Christian König >> wrote: >>> >>> What tilling format have the destination textures? >>> >>> Sounds like the offset is just added so that we distribute memory >>> accesses >>> more equally over memory channels. >> >> You can't set an offset that is not aligned. The hardware ignores the >> low unaligned bits, so they have a different meaning. They specify >> pipe and bank rotation for macro tiling. It's like a state. It >> basically rotates the tile pattern. > > > Yeah, I know. That's what I meant with distributing memory accesses more > equally over all channels. The lower bits select a memory bank swizzle IIRC. > > I've tried years ago with R600 if shuffling them randomly could improve > performance, but MRT wasn't widely used and/or supported at that time. I'd known this and forgotten, the public CIK docs say bits 0..7 must be zero, but I have older docs which had more info. It would be nice if we could get proper docs released for the bottom bits considering AMD are using them in their drivers. It would be good to know what registers have the bits that matter (i.e. BASE, FMASK, CMASK, DCC, and resource descriptors.) Then I suppose we'd need to know the algorithm for programming them, and if we need to make any allocations bigger in order to do so. I expect this only starts to matter when we hit memory bandwidth limits, the deferred demo does 3 MRT, one depth at 2kx2k then samples from those down to 1280x720 displayed. This combined with a 3 instanced 57k vertex draw seemed to be enough to see the pain. (Maybe a GL example doing something similiar might show the problem for radeonsi). The other open question I have, is does this just matter for MRT or does texture sampling also get some boost from it, my hack patch does it for only surfaces which will end up attached to the CB. I'll update the patch to not call it an offset but name them the tile rotation bits. Thanks, Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.
Am 07.07.2017 um 18:51 schrieb Marek Olšák: On Fri, Jul 7, 2017 at 11:18 AM, Christian König wrote: What tilling format have the destination textures? Sounds like the offset is just added so that we distribute memory accesses more equally over memory channels. You can't set an offset that is not aligned. The hardware ignores the low unaligned bits, so they have a different meaning. They specify pipe and bank rotation for macro tiling. It's like a state. It basically rotates the tile pattern. Yeah, I know. That's what I meant with distributing memory accesses more equally over all channels. The lower bits select a memory bank swizzle IIRC. I've tried years ago with R600 if shuffling them randomly could improve performance, but MRT wasn't widely used and/or supported at that time. Christian. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.
On Fri, Jul 7, 2017 at 11:18 AM, Christian König wrote: > What tilling format have the destination textures? > > Sounds like the offset is just added so that we distribute memory accesses > more equally over memory channels. You can't set an offset that is not aligned. The hardware ignores the low unaligned bits, so they have a different meaning. They specify pipe and bank rotation for macro tiling. It's like a state. It basically rotates the tile pattern. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.
On 7 Jul. 2017 19:29, "Christian König" wrote: What tilling format have the destination textures? Sounds like the offset is just added so that we distribute memory accesses more equally over memory channels. >From the traces i think tile index mode was 10. Dave. Regards, Christian. Am 07.07.2017 um 09:18 schrieb Dave Airlie: > From: Dave Airlie > > (this patch doesn't seem to work fully, hopefully AMD can tell us > more info on the rules, and how to calculate the magic). > > It appears that to get full access to memory bandwidth with MRT > rendering the pro vulkan driver seems to offset each image by 0x3800. > I'm not sure how that value is calculated. > > Glenn came up with the idea (probably what -pro does also) of just > offseting every image in round robin order, in the hope that apps > would create mrt images in sequence anyways. > > This attempts to do that using an atomic counter in the device. > > This gets the deferred demo from 800fps->1150fps on my rx480. > > (I've tested dota2 and talos still run at least after this) > --- > src/amd/vulkan/radv_device.c | 7 --- > src/amd/vulkan/radv_image.c | 16 +++- > src/amd/vulkan/radv_private.h | 3 +++ > 3 files changed, 22 insertions(+), 4 deletions(-) > > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c > index d1c519a..f39526d 100644 > --- a/src/amd/vulkan/radv_device.c > +++ b/src/amd/vulkan/radv_device.c > @@ -2706,7 +2706,7 @@ radv_initialise_color_surface(struct radv_device > *device, > /* Intensity is implemented as Red, so treat it that way. */ > cb->cb_color_attrib = S_028C74_FORCE_DST_ALPHA_1(desc->swizzle[3] > == VK_SWIZZLE_1); > - va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; > + va = device->ws->buffer_get_va(iview->bo) + iview->image->offset > + iview->image->mrt_offset; > if (device->physical_device->rad_info.chip_class >= GFX9) { > struct gfx9_surf_meta_flags meta; > @@ -2756,11 +2756,11 @@ radv_initialise_color_surface(struct radv_device > *device, > /* CMASK variables */ > va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; > - va += iview->image->cmask.offset; > + va += iview->image->cmask.offset + iview->image->mrt_offset; > cb->cb_color_cmask = va >> 8; > va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; > - va += iview->image->dcc_offset; > + va += iview->image->dcc_offset + iview->image->mrt_offset; > cb->cb_dcc_base = va >> 8; > uint32_t max_slice = radv_surface_layer_count(iview); > @@ -2776,6 +2776,7 @@ radv_initialise_color_surface(struct radv_device > *device, > if (iview->image->fmask.size) { > va = device->ws->buffer_get_va(iview->bo) + > iview->image->offset + iview->image->fmask.offset; > + va += iview->image->mrt_offset; > cb->cb_color_fmask = va >> 8; > } else { > cb->cb_color_fmask = cb->cb_color_base; > diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c > index b3a223b..bc20a53 100644 > --- a/src/amd/vulkan/radv_image.c > +++ b/src/amd/vulkan/radv_image.c > @@ -31,6 +31,7 @@ > #include "sid.h" > #include "gfx9d.h" > #include "util/debug.h" > +#include "util/u_atomic.h" > static unsigned > radv_choose_tiling(struct radv_device *Device, >const struct radv_image_create_info *create_info) > @@ -208,6 +209,7 @@ si_set_mutable_tex_desc_fields(struct radv_device > *device, > } else > va += base_level_info->offset; > + va += image->mrt_offset; > state[0] = va >> 8; > state[1] &= C_008F14_BASE_ADDRESS_HI; > state[1] |= S_008F14_BASE_ADDRESS_HI(va >> 40); > @@ -220,6 +222,7 @@ si_set_mutable_tex_desc_fields(struct radv_device > *device, > state[7] = 0; > if (image->surface.dcc_size && first_level < > image->surface.num_dcc_levels) { > uint64_t meta_va = gpu_address + image->dcc_offset; > + meta_va += image->mrt_offset; > if (chip_class <= VI) > meta_va += base_level_info->dcc_offset; > state[6] |= S_008F28_COMPRESSION_EN(1); > @@ -436,7 +439,7 @@ si_make_texture_descriptor(struct radv_device *device, > uint64_t gpu_address = device->ws->buffer_get_va(imag > e->bo); > uint64_t va; > - va = gpu_address + image->offset + image->fmask.offset; > + va = gpu_address + image->offset + image->mrt_offset + > image->fmask.offset; > if (device->physical_device->rad_info.chip_class >= GFX9) > { > fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK; > @@ -642,6 +645,7 @@ radv_image_alloc_fmask(struct radv_device *device, > radv_image_get_fmask_info(device, image, image->info.s
Re: [Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.
What tilling format have the destination textures? Sounds like the offset is just added so that we distribute memory accesses more equally over memory channels. Regards, Christian. Am 07.07.2017 um 09:18 schrieb Dave Airlie: From: Dave Airlie (this patch doesn't seem to work fully, hopefully AMD can tell us more info on the rules, and how to calculate the magic). It appears that to get full access to memory bandwidth with MRT rendering the pro vulkan driver seems to offset each image by 0x3800. I'm not sure how that value is calculated. Glenn came up with the idea (probably what -pro does also) of just offseting every image in round robin order, in the hope that apps would create mrt images in sequence anyways. This attempts to do that using an atomic counter in the device. This gets the deferred demo from 800fps->1150fps on my rx480. (I've tested dota2 and talos still run at least after this) --- src/amd/vulkan/radv_device.c | 7 --- src/amd/vulkan/radv_image.c | 16 +++- src/amd/vulkan/radv_private.h | 3 +++ 3 files changed, 22 insertions(+), 4 deletions(-) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index d1c519a..f39526d 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -2706,7 +2706,7 @@ radv_initialise_color_surface(struct radv_device *device, /* Intensity is implemented as Red, so treat it that way. */ cb->cb_color_attrib = S_028C74_FORCE_DST_ALPHA_1(desc->swizzle[3] == VK_SWIZZLE_1); - va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; + va = device->ws->buffer_get_va(iview->bo) + iview->image->offset + iview->image->mrt_offset; if (device->physical_device->rad_info.chip_class >= GFX9) { struct gfx9_surf_meta_flags meta; @@ -2756,11 +2756,11 @@ radv_initialise_color_surface(struct radv_device *device, /* CMASK variables */ va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; - va += iview->image->cmask.offset; + va += iview->image->cmask.offset + iview->image->mrt_offset; cb->cb_color_cmask = va >> 8; va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; - va += iview->image->dcc_offset; + va += iview->image->dcc_offset + iview->image->mrt_offset; cb->cb_dcc_base = va >> 8; uint32_t max_slice = radv_surface_layer_count(iview); @@ -2776,6 +2776,7 @@ radv_initialise_color_surface(struct radv_device *device, if (iview->image->fmask.size) { va = device->ws->buffer_get_va(iview->bo) + iview->image->offset + iview->image->fmask.offset; + va += iview->image->mrt_offset; cb->cb_color_fmask = va >> 8; } else { cb->cb_color_fmask = cb->cb_color_base; diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c index b3a223b..bc20a53 100644 --- a/src/amd/vulkan/radv_image.c +++ b/src/amd/vulkan/radv_image.c @@ -31,6 +31,7 @@ #include "sid.h" #include "gfx9d.h" #include "util/debug.h" +#include "util/u_atomic.h" static unsigned radv_choose_tiling(struct radv_device *Device, const struct radv_image_create_info *create_info) @@ -208,6 +209,7 @@ si_set_mutable_tex_desc_fields(struct radv_device *device, } else va += base_level_info->offset; + va += image->mrt_offset; state[0] = va >> 8; state[1] &= C_008F14_BASE_ADDRESS_HI; state[1] |= S_008F14_BASE_ADDRESS_HI(va >> 40); @@ -220,6 +222,7 @@ si_set_mutable_tex_desc_fields(struct radv_device *device, state[7] = 0; if (image->surface.dcc_size && first_level < image->surface.num_dcc_levels) { uint64_t meta_va = gpu_address + image->dcc_offset; + meta_va += image->mrt_offset; if (chip_class <= VI) meta_va += base_level_info->dcc_offset; state[6] |= S_008F28_COMPRESSION_EN(1); @@ -436,7 +439,7 @@ si_make_texture_descriptor(struct radv_device *device, uint64_t gpu_address = device->ws->buffer_get_va(image->bo); uint64_t va; - va = gpu_address + image->offset + image->fmask.offset; + va = gpu_address + image->offset + image->mrt_offset + image->fmask.offset; if (device->physical_device->rad_info.chip_class >= GFX9) { fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK; @@ -642,6 +645,7 @@ radv_image_alloc_fmask(struct radv_device *device, radv_image_get_fmask_info(device, image, image->info.samples, &image->fmask); image->fmask.offset = align64(image->size, image->fmask.alignment); + image->fmask.size += image->mrt_offset; image->size = image->fmask.offset + image->fmask.size; image->alignment = MAX2(image->alignment, image->fmask.alignment); } @@ -709,6 +713,7 @@ radv_image_a
[Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.
From: Dave Airlie (this patch doesn't seem to work fully, hopefully AMD can tell us more info on the rules, and how to calculate the magic). It appears that to get full access to memory bandwidth with MRT rendering the pro vulkan driver seems to offset each image by 0x3800. I'm not sure how that value is calculated. Glenn came up with the idea (probably what -pro does also) of just offseting every image in round robin order, in the hope that apps would create mrt images in sequence anyways. This attempts to do that using an atomic counter in the device. This gets the deferred demo from 800fps->1150fps on my rx480. (I've tested dota2 and talos still run at least after this) --- src/amd/vulkan/radv_device.c | 7 --- src/amd/vulkan/radv_image.c | 16 +++- src/amd/vulkan/radv_private.h | 3 +++ 3 files changed, 22 insertions(+), 4 deletions(-) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index d1c519a..f39526d 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -2706,7 +2706,7 @@ radv_initialise_color_surface(struct radv_device *device, /* Intensity is implemented as Red, so treat it that way. */ cb->cb_color_attrib = S_028C74_FORCE_DST_ALPHA_1(desc->swizzle[3] == VK_SWIZZLE_1); - va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; + va = device->ws->buffer_get_va(iview->bo) + iview->image->offset + iview->image->mrt_offset; if (device->physical_device->rad_info.chip_class >= GFX9) { struct gfx9_surf_meta_flags meta; @@ -2756,11 +2756,11 @@ radv_initialise_color_surface(struct radv_device *device, /* CMASK variables */ va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; - va += iview->image->cmask.offset; + va += iview->image->cmask.offset + iview->image->mrt_offset; cb->cb_color_cmask = va >> 8; va = device->ws->buffer_get_va(iview->bo) + iview->image->offset; - va += iview->image->dcc_offset; + va += iview->image->dcc_offset + iview->image->mrt_offset; cb->cb_dcc_base = va >> 8; uint32_t max_slice = radv_surface_layer_count(iview); @@ -2776,6 +2776,7 @@ radv_initialise_color_surface(struct radv_device *device, if (iview->image->fmask.size) { va = device->ws->buffer_get_va(iview->bo) + iview->image->offset + iview->image->fmask.offset; + va += iview->image->mrt_offset; cb->cb_color_fmask = va >> 8; } else { cb->cb_color_fmask = cb->cb_color_base; diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c index b3a223b..bc20a53 100644 --- a/src/amd/vulkan/radv_image.c +++ b/src/amd/vulkan/radv_image.c @@ -31,6 +31,7 @@ #include "sid.h" #include "gfx9d.h" #include "util/debug.h" +#include "util/u_atomic.h" static unsigned radv_choose_tiling(struct radv_device *Device, const struct radv_image_create_info *create_info) @@ -208,6 +209,7 @@ si_set_mutable_tex_desc_fields(struct radv_device *device, } else va += base_level_info->offset; + va += image->mrt_offset; state[0] = va >> 8; state[1] &= C_008F14_BASE_ADDRESS_HI; state[1] |= S_008F14_BASE_ADDRESS_HI(va >> 40); @@ -220,6 +222,7 @@ si_set_mutable_tex_desc_fields(struct radv_device *device, state[7] = 0; if (image->surface.dcc_size && first_level < image->surface.num_dcc_levels) { uint64_t meta_va = gpu_address + image->dcc_offset; + meta_va += image->mrt_offset; if (chip_class <= VI) meta_va += base_level_info->dcc_offset; state[6] |= S_008F28_COMPRESSION_EN(1); @@ -436,7 +439,7 @@ si_make_texture_descriptor(struct radv_device *device, uint64_t gpu_address = device->ws->buffer_get_va(image->bo); uint64_t va; - va = gpu_address + image->offset + image->fmask.offset; + va = gpu_address + image->offset + image->mrt_offset + image->fmask.offset; if (device->physical_device->rad_info.chip_class >= GFX9) { fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK; @@ -642,6 +645,7 @@ radv_image_alloc_fmask(struct radv_device *device, radv_image_get_fmask_info(device, image, image->info.samples, &image->fmask); image->fmask.offset = align64(image->size, image->fmask.alignment); + image->fmask.size += image->mrt_offset; image->size = image->fmask.offset + image->fmask.size; image->alignment = MAX2(image->alignment, image->fmask.alignment); } @@ -709,6 +713,7 @@ radv_image_alloc_cmask(struct radv_device *device, radv_image_get_cmask_info(device, image, &image->cmask); image->cmask.offset = align64(image->size, image->cmask.alignment); + image