Re: [Mesa-dev] [PATCH 12/18] radeonsi: use a clever alignment for descriptor uploads

2017-02-17 Thread Nicolai Hähnle

On 16.02.2017 23:16, Marek Olšák wrote:

On Thu, Feb 16, 2017 at 4:17 PM, Nicolai Hähnle  wrote:

On 16.02.2017 13:53, Marek Olšák wrote:


From: Marek Olšák 

Non-VBO descriptors won't be smaller than the cache line, so simply use
the cache line size.



What about SSBOs? Those are just 16 bytes.

Also, shader images are just 32 bytes (though we may have to bump this to 64


We always upload the whole list for non-VBO descriptors, which is
num_slot * slot_size. That's a lot more than a cache line. We could
certainly optimize this for both CE and non-CE paths. The CE path
evicts more cache lines needlessly, while the non-CE path has to
upload more data.

Since only the necessary number of VBO descriptors is uploaded, we can
hang the hardware if the vertex shader is using more inputs than the
vertex element state, which luckily can't happen with st/mesa.


Ah, thanks for the reminder. This patch has my R-b as well.



bytes for multisample image support -- except that it's unclear how to write
to a multisample shader image while keeping the FMASK).


I wouldn't like to support MSAA image stores.


We may not have a choice if games expect to be able to use shader image 
functionality with MSAA. But I agree that it's ugly.


Nicolai
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 12/18] radeonsi: use a clever alignment for descriptor uploads

2017-02-16 Thread Marek Olšák
On Thu, Feb 16, 2017 at 4:17 PM, Nicolai Hähnle  wrote:
> On 16.02.2017 13:53, Marek Olšák wrote:
>>
>> From: Marek Olšák 
>>
>> Non-VBO descriptors won't be smaller than the cache line, so simply use
>> the cache line size.
>
>
> What about SSBOs? Those are just 16 bytes.
>
> Also, shader images are just 32 bytes (though we may have to bump this to 64

We always upload the whole list for non-VBO descriptors, which is
num_slot * slot_size. That's a lot more than a cache line. We could
certainly optimize this for both CE and non-CE paths. The CE path
evicts more cache lines needlessly, while the non-CE path has to
upload more data.

Since only the necessary number of VBO descriptors is uploaded, we can
hang the hardware if the vertex shader is using more inputs than the
vertex element state, which luckily can't happen with st/mesa.

> bytes for multisample image support -- except that it's unclear how to write
> to a multisample shader image while keeping the FMASK).

I wouldn't like to support MSAA image stores.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 12/18] radeonsi: use a clever alignment for descriptor uploads

2017-02-16 Thread Nicolai Hähnle

On 16.02.2017 13:53, Marek Olšák wrote:

From: Marek Olšák 

Non-VBO descriptors won't be smaller than the cache line, so simply use
the cache line size.


What about SSBOs? Those are just 16 bytes.

Also, shader images are just 32 bytes (though we may have to bump this 
to 64 bytes for multisample image support -- except that it's unclear 
how to write to a multisample shader image while keeping the FMASK).


Thanks,
Nicolai


---
 src/gallium/drivers/radeonsi/si_descriptors.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 72b33f3..4f2dbbb 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -227,23 +227,24 @@ static bool si_upload_descriptors(struct si_context *sctx,
radeon_emit(sctx->ce_ib, desc->ce_offset + begin * 4);
radeon_emit_array(sctx->ce_ib, list + begin, count);
}

if (!si_ce_upload(sctx, desc->ce_offset, list_size,
   >buffer_offset, >buffer))
return false;
} else {
void *ptr;

-   u_upload_alloc(sctx->b.b.stream_uploader, 0, list_size, 256,
-   >buffer_offset,
-   (struct pipe_resource**)>buffer, );
+   u_upload_alloc(sctx->b.b.stream_uploader, 0, list_size,
+  sctx->screen->b.info.tcc_cache_line_size,
+  >buffer_offset,
+  (struct pipe_resource**)>buffer, );
if (!desc->buffer)
return false; /* skip the draw call */

util_memcpy_cpu_to_le32(ptr, desc->list, list_size);
desc->gpu_list = ptr;

radeon_add_to_buffer_list(>b, >b.gfx, desc->buffer,
RADEON_USAGE_READ, RADEON_PRIO_DESCRIPTORS);
}
desc->dirty_mask = 0;
@@ -941,34 +942,36 @@ static void si_vertex_buffers_begin_new_cs(struct 
si_context *sctx)
radeon_add_to_buffer_list(>b, >b.gfx,
  desc->buffer, RADEON_USAGE_READ,
  RADEON_PRIO_DESCRIPTORS);
 }

 bool si_upload_vertex_buffer_descriptors(struct si_context *sctx)
 {
struct si_vertex_element *velems = sctx->vertex_elements;
struct si_descriptors *desc = >vertex_buffers;
unsigned i, count = velems->count;
+   unsigned desc_list_byte_size = velems->desc_list_byte_size;
uint64_t va;
uint32_t *ptr;

if (!sctx->vertex_buffers_dirty || !count || !velems)
return true;

unsigned first_vb_use_mask = velems->first_vb_use_mask;

/* Vertex buffer descriptors are the only ones which are uploaded
 * directly through a staging buffer and don't go through
 * the fine-grained upload path.
 */
u_upload_alloc(sctx->b.b.stream_uploader, 0,
-  velems->desc_list_byte_size, 256,
+  desc_list_byte_size,
+  si_optimal_tcc_alignment(sctx, desc_list_byte_size),
   >buffer_offset,
   (struct pipe_resource**)>buffer, (void**));
if (!desc->buffer)
return false;

radeon_add_to_buffer_list(>b, >b.gfx,
  desc->buffer, RADEON_USAGE_READ,
  RADEON_PRIO_DESCRIPTORS);

assert(count <= SI_MAX_ATTRIBS);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/18] radeonsi: use a clever alignment for descriptor uploads

2017-02-16 Thread Marek Olšák
From: Marek Olšák 

Non-VBO descriptors won't be smaller than the cache line, so simply use
the cache line size.
---
 src/gallium/drivers/radeonsi/si_descriptors.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 72b33f3..4f2dbbb 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -227,23 +227,24 @@ static bool si_upload_descriptors(struct si_context *sctx,
radeon_emit(sctx->ce_ib, desc->ce_offset + begin * 4);
radeon_emit_array(sctx->ce_ib, list + begin, count);
}
 
if (!si_ce_upload(sctx, desc->ce_offset, list_size,
   >buffer_offset, >buffer))
return false;
} else {
void *ptr;
 
-   u_upload_alloc(sctx->b.b.stream_uploader, 0, list_size, 256,
-   >buffer_offset,
-   (struct pipe_resource**)>buffer, );
+   u_upload_alloc(sctx->b.b.stream_uploader, 0, list_size,
+  sctx->screen->b.info.tcc_cache_line_size,
+  >buffer_offset,
+  (struct pipe_resource**)>buffer, );
if (!desc->buffer)
return false; /* skip the draw call */
 
util_memcpy_cpu_to_le32(ptr, desc->list, list_size);
desc->gpu_list = ptr;
 
radeon_add_to_buffer_list(>b, >b.gfx, desc->buffer,
RADEON_USAGE_READ, RADEON_PRIO_DESCRIPTORS);
}
desc->dirty_mask = 0;
@@ -941,34 +942,36 @@ static void si_vertex_buffers_begin_new_cs(struct 
si_context *sctx)
radeon_add_to_buffer_list(>b, >b.gfx,
  desc->buffer, RADEON_USAGE_READ,
  RADEON_PRIO_DESCRIPTORS);
 }
 
 bool si_upload_vertex_buffer_descriptors(struct si_context *sctx)
 {
struct si_vertex_element *velems = sctx->vertex_elements;
struct si_descriptors *desc = >vertex_buffers;
unsigned i, count = velems->count;
+   unsigned desc_list_byte_size = velems->desc_list_byte_size;
uint64_t va;
uint32_t *ptr;
 
if (!sctx->vertex_buffers_dirty || !count || !velems)
return true;
 
unsigned first_vb_use_mask = velems->first_vb_use_mask;
 
/* Vertex buffer descriptors are the only ones which are uploaded
 * directly through a staging buffer and don't go through
 * the fine-grained upload path.
 */
u_upload_alloc(sctx->b.b.stream_uploader, 0,
-  velems->desc_list_byte_size, 256,
+  desc_list_byte_size,
+  si_optimal_tcc_alignment(sctx, desc_list_byte_size),
   >buffer_offset,
   (struct pipe_resource**)>buffer, (void**));
if (!desc->buffer)
return false;
 
radeon_add_to_buffer_list(>b, >b.gfx,
  desc->buffer, RADEON_USAGE_READ,
  RADEON_PRIO_DESCRIPTORS);
 
assert(count <= SI_MAX_ATTRIBS);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev