Re: [Mesa-dev] [PATCH 12/18] radeonsi: use a clever alignment for descriptor uploads
On 16.02.2017 23:16, Marek Olšák wrote: On Thu, Feb 16, 2017 at 4:17 PM, Nicolai Hähnle wrote: On 16.02.2017 13:53, Marek Olšák wrote: From: Marek Olšák Non-VBO descriptors won't be smaller than the cache line, so simply use the cache line size. What about SSBOs? Those are just 16 bytes. Also, shader images are just 32 bytes (though we may have to bump this to 64 We always upload the whole list for non-VBO descriptors, which is num_slot * slot_size. That's a lot more than a cache line. We could certainly optimize this for both CE and non-CE paths. The CE path evicts more cache lines needlessly, while the non-CE path has to upload more data. Since only the necessary number of VBO descriptors is uploaded, we can hang the hardware if the vertex shader is using more inputs than the vertex element state, which luckily can't happen with st/mesa. Ah, thanks for the reminder. This patch has my R-b as well. bytes for multisample image support -- except that it's unclear how to write to a multisample shader image while keeping the FMASK). I wouldn't like to support MSAA image stores. We may not have a choice if games expect to be able to use shader image functionality with MSAA. But I agree that it's ugly. Nicolai ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 12/18] radeonsi: use a clever alignment for descriptor uploads
On Thu, Feb 16, 2017 at 4:17 PM, Nicolai Hähnle wrote: > On 16.02.2017 13:53, Marek Olšák wrote: >> >> From: Marek Olšák >> >> Non-VBO descriptors won't be smaller than the cache line, so simply use >> the cache line size. > > > What about SSBOs? Those are just 16 bytes. > > Also, shader images are just 32 bytes (though we may have to bump this to 64 We always upload the whole list for non-VBO descriptors, which is num_slot * slot_size. That's a lot more than a cache line. We could certainly optimize this for both CE and non-CE paths. The CE path evicts more cache lines needlessly, while the non-CE path has to upload more data. Since only the necessary number of VBO descriptors is uploaded, we can hang the hardware if the vertex shader is using more inputs than the vertex element state, which luckily can't happen with st/mesa. > bytes for multisample image support -- except that it's unclear how to write > to a multisample shader image while keeping the FMASK). I wouldn't like to support MSAA image stores. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 12/18] radeonsi: use a clever alignment for descriptor uploads
On 16.02.2017 13:53, Marek Olšák wrote: From: Marek Olšák Non-VBO descriptors won't be smaller than the cache line, so simply use the cache line size. What about SSBOs? Those are just 16 bytes. Also, shader images are just 32 bytes (though we may have to bump this to 64 bytes for multisample image support -- except that it's unclear how to write to a multisample shader image while keeping the FMASK). Thanks, Nicolai --- src/gallium/drivers/radeonsi/si_descriptors.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c b/src/gallium/drivers/radeonsi/si_descriptors.c index 72b33f3..4f2dbbb 100644 --- a/src/gallium/drivers/radeonsi/si_descriptors.c +++ b/src/gallium/drivers/radeonsi/si_descriptors.c @@ -227,23 +227,24 @@ static bool si_upload_descriptors(struct si_context *sctx, radeon_emit(sctx->ce_ib, desc->ce_offset + begin * 4); radeon_emit_array(sctx->ce_ib, list + begin, count); } if (!si_ce_upload(sctx, desc->ce_offset, list_size, &desc->buffer_offset, &desc->buffer)) return false; } else { void *ptr; - u_upload_alloc(sctx->b.b.stream_uploader, 0, list_size, 256, - &desc->buffer_offset, - (struct pipe_resource**)&desc->buffer, &ptr); + u_upload_alloc(sctx->b.b.stream_uploader, 0, list_size, + sctx->screen->b.info.tcc_cache_line_size, + &desc->buffer_offset, + (struct pipe_resource**)&desc->buffer, &ptr); if (!desc->buffer) return false; /* skip the draw call */ util_memcpy_cpu_to_le32(ptr, desc->list, list_size); desc->gpu_list = ptr; radeon_add_to_buffer_list(&sctx->b, &sctx->b.gfx, desc->buffer, RADEON_USAGE_READ, RADEON_PRIO_DESCRIPTORS); } desc->dirty_mask = 0; @@ -941,34 +942,36 @@ static void si_vertex_buffers_begin_new_cs(struct si_context *sctx) radeon_add_to_buffer_list(&sctx->b, &sctx->b.gfx, desc->buffer, RADEON_USAGE_READ, RADEON_PRIO_DESCRIPTORS); } bool si_upload_vertex_buffer_descriptors(struct si_context *sctx) { struct si_vertex_element *velems = sctx->vertex_elements; struct si_descriptors *desc = &sctx->vertex_buffers; unsigned i, count = velems->count; + unsigned desc_list_byte_size = velems->desc_list_byte_size; uint64_t va; uint32_t *ptr; if (!sctx->vertex_buffers_dirty || !count || !velems) return true; unsigned first_vb_use_mask = velems->first_vb_use_mask; /* Vertex buffer descriptors are the only ones which are uploaded * directly through a staging buffer and don't go through * the fine-grained upload path. */ u_upload_alloc(sctx->b.b.stream_uploader, 0, - velems->desc_list_byte_size, 256, + desc_list_byte_size, + si_optimal_tcc_alignment(sctx, desc_list_byte_size), &desc->buffer_offset, (struct pipe_resource**)&desc->buffer, (void**)&ptr); if (!desc->buffer) return false; radeon_add_to_buffer_list(&sctx->b, &sctx->b.gfx, desc->buffer, RADEON_USAGE_READ, RADEON_PRIO_DESCRIPTORS); assert(count <= SI_MAX_ATTRIBS); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/18] radeonsi: use a clever alignment for descriptor uploads
From: Marek Olšák Non-VBO descriptors won't be smaller than the cache line, so simply use the cache line size. --- src/gallium/drivers/radeonsi/si_descriptors.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c b/src/gallium/drivers/radeonsi/si_descriptors.c index 72b33f3..4f2dbbb 100644 --- a/src/gallium/drivers/radeonsi/si_descriptors.c +++ b/src/gallium/drivers/radeonsi/si_descriptors.c @@ -227,23 +227,24 @@ static bool si_upload_descriptors(struct si_context *sctx, radeon_emit(sctx->ce_ib, desc->ce_offset + begin * 4); radeon_emit_array(sctx->ce_ib, list + begin, count); } if (!si_ce_upload(sctx, desc->ce_offset, list_size, &desc->buffer_offset, &desc->buffer)) return false; } else { void *ptr; - u_upload_alloc(sctx->b.b.stream_uploader, 0, list_size, 256, - &desc->buffer_offset, - (struct pipe_resource**)&desc->buffer, &ptr); + u_upload_alloc(sctx->b.b.stream_uploader, 0, list_size, + sctx->screen->b.info.tcc_cache_line_size, + &desc->buffer_offset, + (struct pipe_resource**)&desc->buffer, &ptr); if (!desc->buffer) return false; /* skip the draw call */ util_memcpy_cpu_to_le32(ptr, desc->list, list_size); desc->gpu_list = ptr; radeon_add_to_buffer_list(&sctx->b, &sctx->b.gfx, desc->buffer, RADEON_USAGE_READ, RADEON_PRIO_DESCRIPTORS); } desc->dirty_mask = 0; @@ -941,34 +942,36 @@ static void si_vertex_buffers_begin_new_cs(struct si_context *sctx) radeon_add_to_buffer_list(&sctx->b, &sctx->b.gfx, desc->buffer, RADEON_USAGE_READ, RADEON_PRIO_DESCRIPTORS); } bool si_upload_vertex_buffer_descriptors(struct si_context *sctx) { struct si_vertex_element *velems = sctx->vertex_elements; struct si_descriptors *desc = &sctx->vertex_buffers; unsigned i, count = velems->count; + unsigned desc_list_byte_size = velems->desc_list_byte_size; uint64_t va; uint32_t *ptr; if (!sctx->vertex_buffers_dirty || !count || !velems) return true; unsigned first_vb_use_mask = velems->first_vb_use_mask; /* Vertex buffer descriptors are the only ones which are uploaded * directly through a staging buffer and don't go through * the fine-grained upload path. */ u_upload_alloc(sctx->b.b.stream_uploader, 0, - velems->desc_list_byte_size, 256, + desc_list_byte_size, + si_optimal_tcc_alignment(sctx, desc_list_byte_size), &desc->buffer_offset, (struct pipe_resource**)&desc->buffer, (void**)&ptr); if (!desc->buffer) return false; radeon_add_to_buffer_list(&sctx->b, &sctx->b.gfx, desc->buffer, RADEON_USAGE_READ, RADEON_PRIO_DESCRIPTORS); assert(count <= SI_MAX_ATTRIBS); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev