Re: [Mesa-dev] [PATCH 1/4] radeonsi: use CE suballocator for CP DMA realignment.

2016-04-20 Thread Marek Olšák
On Wed, Apr 20, 2016 at 1:29 AM, Bas Nieuwenhuizen
 wrote:
> I retract patch 1 and 2. Large scratch buffers are nice, but the
> hardware only supports a 32-bit offset into it.

You can still allocate a smaller scratch buffer. This should limit the
number of waves in hw. TMPRING_SIZE.WAVES should be adjusted
accordingly.

We can also decrease the size of scratch based on the max number of
waves with the given register and LDS usage. si_shader_dump_stats
calculates the max number of waves.

You just need to:
- set TMPRING_SIZE.WAVES = MIN2(32, max_simd_waves * 4)
- allocate scratch for TMPRING_SIZE.WAVES waves per CU (instead of 32)
- si_context::scratch_waves can be moved to si_shader in some form
(e.g. max_scratch_bytes_per_cu)

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] radeonsi: use CE suballocator for CP DMA realignment.

2016-04-19 Thread eocallaghan

On 2016-04-20 09:29, Bas Nieuwenhuizen wrote:

I retract patch 1 and 2. Large scratch buffers are nice, but the
hardware only supports a 32-bit offset into it.

- Bas

On Wed, Apr 20, 2016 at 12:50 AM, Bas Nieuwenhuizen
 wrote:

Use the CE suballocator instead of the normal one as the usage
is most similar to the CE, i.e. only read and written on GPU
and not mapped to CPU.

Signed-off-by: Bas Nieuwenhuizen 
---
 src/gallium/drivers/radeonsi/si_cp_dma.c | 27 
++-

 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_cp_dma.c 
b/src/gallium/drivers/radeonsi/si_cp_dma.c

index 38e0ee6..264789d 100644
--- a/src/gallium/drivers/radeonsi/si_cp_dma.c
+++ b/src/gallium/drivers/radeonsi/si_cp_dma.c
@@ -222,31 +222,24 @@ static void si_clear_buffer(struct pipe_context 
*ctx, struct pipe_resource *dst,

  */
 static void si_cp_dma_realign_engine(struct si_context *sctx, 
unsigned size)

 {
+


trivial spurious '\n'


uint64_t va;
unsigned dma_flags = 0;
unsigned scratch_size = CP_DMA_ALIGNMENT * 2;
+   unsigned offset;
+   struct r600_resource *tmp_buf;

assert(size < CP_DMA_ALIGNMENT);

-   /* Use the scratch buffer as the dummy buffer. The 3D engine 
should be

-* idle at this point.
-*/
-   if (!sctx->scratch_buffer ||
-   sctx->scratch_buffer->b.b.width0 < scratch_size) {
-   r600_resource_reference(>scratch_buffer, NULL);
-   sctx->scratch_buffer =
-   si_resource_create_custom(>screen->b.b,
- PIPE_USAGE_DEFAULT,
- scratch_size);
-   if (!sctx->scratch_buffer)
-   return;
-   sctx->emit_scratch_reloc = true;
-   }
+   u_suballocator_alloc(sctx->ce_suballocator, scratch_size, 
,

+(struct pipe_resource**)_buf);
+   if (!tmp_buf)
+   return;

-   si_cp_dma_prepare(sctx, >scratch_buffer->b.b,
- >scratch_buffer->b.b, size, size, 
_flags);

+   si_cp_dma_prepare(sctx, _buf->b.b,
+ _buf->b.b, size, size, _flags);

-   va = sctx->scratch_buffer->gpu_address;
+   va = tmp_buf->gpu_address + offset;
si_emit_cp_dma_copy_buffer(sctx, va, va + CP_DMA_ALIGNMENT, 
size,

   dma_flags);
 }
--
2.8.0


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] radeonsi: use CE suballocator for CP DMA realignment.

2016-04-19 Thread Bas Nieuwenhuizen
On Wed, Apr 20, 2016 at 2:13 AM, Nicolai Hähnle  wrote:
> On 19.04.2016 18:29, Bas Nieuwenhuizen wrote:
>>
>> I retract patch 1 and 2. Large scratch buffers are nice, but the
>> hardware only supports a 32-bit offset into it.
>
>
> Do you mean patch 2 and 3? Do you plan alternative patches to error out when
> there is an integer overflow? That's still kind of important...
>
> Cheers,
> Nicolai

Really, patch 1 and 2. I did patch 1 only so I did not need to make
the wole cp_dma work with pb_buffer.

Although I guess patch 3 can best be merged with the to be written
patch that checks that the resulting size fits in 32 bit.

- Bas

>
>>
>> - Bas
>>
>> On Wed, Apr 20, 2016 at 12:50 AM, Bas Nieuwenhuizen
>>  wrote:
>>>
>>> Use the CE suballocator instead of the normal one as the usage
>>> is most similar to the CE, i.e. only read and written on GPU
>>> and not mapped to CPU.
>>>
>>> Signed-off-by: Bas Nieuwenhuizen 
>>> ---
>>>   src/gallium/drivers/radeonsi/si_cp_dma.c | 27
>>> ++-
>>>   1 file changed, 10 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/radeonsi/si_cp_dma.c
>>> b/src/gallium/drivers/radeonsi/si_cp_dma.c
>>> index 38e0ee6..264789d 100644
>>> --- a/src/gallium/drivers/radeonsi/si_cp_dma.c
>>> +++ b/src/gallium/drivers/radeonsi/si_cp_dma.c
>>> @@ -222,31 +222,24 @@ static void si_clear_buffer(struct pipe_context
>>> *ctx, struct pipe_resource *dst,
>>>*/
>>>   static void si_cp_dma_realign_engine(struct si_context *sctx, unsigned
>>> size)
>>>   {
>>> +
>>>  uint64_t va;
>>>  unsigned dma_flags = 0;
>>>  unsigned scratch_size = CP_DMA_ALIGNMENT * 2;
>>> +   unsigned offset;
>>> +   struct r600_resource *tmp_buf;
>>>
>>>  assert(size < CP_DMA_ALIGNMENT);
>>>
>>> -   /* Use the scratch buffer as the dummy buffer. The 3D engine
>>> should be
>>> -* idle at this point.
>>> -*/
>>> -   if (!sctx->scratch_buffer ||
>>> -   sctx->scratch_buffer->b.b.width0 < scratch_size) {
>>> -   r600_resource_reference(>scratch_buffer, NULL);
>>> -   sctx->scratch_buffer =
>>> -   si_resource_create_custom(>screen->b.b,
>>> - PIPE_USAGE_DEFAULT,
>>> - scratch_size);
>>> -   if (!sctx->scratch_buffer)
>>> -   return;
>>> -   sctx->emit_scratch_reloc = true;
>>> -   }
>>> +   u_suballocator_alloc(sctx->ce_suballocator, scratch_size,
>>> ,
>>> +(struct pipe_resource**)_buf);
>>> +   if (!tmp_buf)
>>> +   return;
>>>
>>> -   si_cp_dma_prepare(sctx, >scratch_buffer->b.b,
>>> - >scratch_buffer->b.b, size, size,
>>> _flags);
>>> +   si_cp_dma_prepare(sctx, _buf->b.b,
>>> + _buf->b.b, size, size, _flags);
>>>
>>> -   va = sctx->scratch_buffer->gpu_address;
>>> +   va = tmp_buf->gpu_address + offset;
>>>  si_emit_cp_dma_copy_buffer(sctx, va, va + CP_DMA_ALIGNMENT,
>>> size,
>>> dma_flags);
>>>   }
>>> --
>>> 2.8.0
>>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] radeonsi: use CE suballocator for CP DMA realignment.

2016-04-19 Thread Nicolai Hähnle

On 19.04.2016 17:50, Bas Nieuwenhuizen wrote:

Use the CE suballocator instead of the normal one as the usage
is most similar to the CE, i.e. only read and written on GPU
and not mapped to CPU.


The scratch buffer is also only read and written by the GPU...



Signed-off-by: Bas Nieuwenhuizen 
---
  src/gallium/drivers/radeonsi/si_cp_dma.c | 27 ++-
  1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_cp_dma.c 
b/src/gallium/drivers/radeonsi/si_cp_dma.c
index 38e0ee6..264789d 100644
--- a/src/gallium/drivers/radeonsi/si_cp_dma.c
+++ b/src/gallium/drivers/radeonsi/si_cp_dma.c
@@ -222,31 +222,24 @@ static void si_clear_buffer(struct pipe_context *ctx, 
struct pipe_resource *dst,
   */
  static void si_cp_dma_realign_engine(struct si_context *sctx, unsigned size)
  {
+
uint64_t va;
unsigned dma_flags = 0;
unsigned scratch_size = CP_DMA_ALIGNMENT * 2;
+   unsigned offset;
+   struct r600_resource *tmp_buf;

assert(size < CP_DMA_ALIGNMENT);

-   /* Use the scratch buffer as the dummy buffer. The 3D engine should be
-* idle at this point.
-*/
-   if (!sctx->scratch_buffer ||
-   sctx->scratch_buffer->b.b.width0 < scratch_size) {
-   r600_resource_reference(>scratch_buffer, NULL);
-   sctx->scratch_buffer =
-   si_resource_create_custom(>screen->b.b,
- PIPE_USAGE_DEFAULT,
- scratch_size);
-   if (!sctx->scratch_buffer)
-   return;
-   sctx->emit_scratch_reloc = true;
-   }
+   u_suballocator_alloc(sctx->ce_suballocator, scratch_size, ,
+(struct pipe_resource**)_buf);
+   if (!tmp_buf)
+   return;

-   si_cp_dma_prepare(sctx, >scratch_buffer->b.b,
- >scratch_buffer->b.b, size, size, _flags);
+   si_cp_dma_prepare(sctx, _buf->b.b,
+ _buf->b.b, size, size, _flags);

-   va = sctx->scratch_buffer->gpu_address;
+   va = tmp_buf->gpu_address + offset;
si_emit_cp_dma_copy_buffer(sctx, va, va + CP_DMA_ALIGNMENT, size,
   dma_flags);
  }


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] radeonsi: use CE suballocator for CP DMA realignment.

2016-04-19 Thread Nicolai Hähnle

On 19.04.2016 18:29, Bas Nieuwenhuizen wrote:

I retract patch 1 and 2. Large scratch buffers are nice, but the
hardware only supports a 32-bit offset into it.


Do you mean patch 2 and 3? Do you plan alternative patches to error out 
when there is an integer overflow? That's still kind of important...


Cheers,
Nicolai



- Bas

On Wed, Apr 20, 2016 at 12:50 AM, Bas Nieuwenhuizen
 wrote:

Use the CE suballocator instead of the normal one as the usage
is most similar to the CE, i.e. only read and written on GPU
and not mapped to CPU.

Signed-off-by: Bas Nieuwenhuizen 
---
  src/gallium/drivers/radeonsi/si_cp_dma.c | 27 ++-
  1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_cp_dma.c 
b/src/gallium/drivers/radeonsi/si_cp_dma.c
index 38e0ee6..264789d 100644
--- a/src/gallium/drivers/radeonsi/si_cp_dma.c
+++ b/src/gallium/drivers/radeonsi/si_cp_dma.c
@@ -222,31 +222,24 @@ static void si_clear_buffer(struct pipe_context *ctx, 
struct pipe_resource *dst,
   */
  static void si_cp_dma_realign_engine(struct si_context *sctx, unsigned size)
  {
+
 uint64_t va;
 unsigned dma_flags = 0;
 unsigned scratch_size = CP_DMA_ALIGNMENT * 2;
+   unsigned offset;
+   struct r600_resource *tmp_buf;

 assert(size < CP_DMA_ALIGNMENT);

-   /* Use the scratch buffer as the dummy buffer. The 3D engine should be
-* idle at this point.
-*/
-   if (!sctx->scratch_buffer ||
-   sctx->scratch_buffer->b.b.width0 < scratch_size) {
-   r600_resource_reference(>scratch_buffer, NULL);
-   sctx->scratch_buffer =
-   si_resource_create_custom(>screen->b.b,
- PIPE_USAGE_DEFAULT,
- scratch_size);
-   if (!sctx->scratch_buffer)
-   return;
-   sctx->emit_scratch_reloc = true;
-   }
+   u_suballocator_alloc(sctx->ce_suballocator, scratch_size, ,
+(struct pipe_resource**)_buf);
+   if (!tmp_buf)
+   return;

-   si_cp_dma_prepare(sctx, >scratch_buffer->b.b,
- >scratch_buffer->b.b, size, size, _flags);
+   si_cp_dma_prepare(sctx, _buf->b.b,
+ _buf->b.b, size, size, _flags);

-   va = sctx->scratch_buffer->gpu_address;
+   va = tmp_buf->gpu_address + offset;
 si_emit_cp_dma_copy_buffer(sctx, va, va + CP_DMA_ALIGNMENT, size,
dma_flags);
  }
--
2.8.0


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] radeonsi: use CE suballocator for CP DMA realignment.

2016-04-19 Thread Bas Nieuwenhuizen
I retract patch 1 and 2. Large scratch buffers are nice, but the
hardware only supports a 32-bit offset into it.

- Bas

On Wed, Apr 20, 2016 at 12:50 AM, Bas Nieuwenhuizen
 wrote:
> Use the CE suballocator instead of the normal one as the usage
> is most similar to the CE, i.e. only read and written on GPU
> and not mapped to CPU.
>
> Signed-off-by: Bas Nieuwenhuizen 
> ---
>  src/gallium/drivers/radeonsi/si_cp_dma.c | 27 ++-
>  1 file changed, 10 insertions(+), 17 deletions(-)
>
> diff --git a/src/gallium/drivers/radeonsi/si_cp_dma.c 
> b/src/gallium/drivers/radeonsi/si_cp_dma.c
> index 38e0ee6..264789d 100644
> --- a/src/gallium/drivers/radeonsi/si_cp_dma.c
> +++ b/src/gallium/drivers/radeonsi/si_cp_dma.c
> @@ -222,31 +222,24 @@ static void si_clear_buffer(struct pipe_context *ctx, 
> struct pipe_resource *dst,
>   */
>  static void si_cp_dma_realign_engine(struct si_context *sctx, unsigned size)
>  {
> +
> uint64_t va;
> unsigned dma_flags = 0;
> unsigned scratch_size = CP_DMA_ALIGNMENT * 2;
> +   unsigned offset;
> +   struct r600_resource *tmp_buf;
>
> assert(size < CP_DMA_ALIGNMENT);
>
> -   /* Use the scratch buffer as the dummy buffer. The 3D engine should be
> -* idle at this point.
> -*/
> -   if (!sctx->scratch_buffer ||
> -   sctx->scratch_buffer->b.b.width0 < scratch_size) {
> -   r600_resource_reference(>scratch_buffer, NULL);
> -   sctx->scratch_buffer =
> -   si_resource_create_custom(>screen->b.b,
> - PIPE_USAGE_DEFAULT,
> - scratch_size);
> -   if (!sctx->scratch_buffer)
> -   return;
> -   sctx->emit_scratch_reloc = true;
> -   }
> +   u_suballocator_alloc(sctx->ce_suballocator, scratch_size, ,
> +(struct pipe_resource**)_buf);
> +   if (!tmp_buf)
> +   return;
>
> -   si_cp_dma_prepare(sctx, >scratch_buffer->b.b,
> - >scratch_buffer->b.b, size, size, _flags);
> +   si_cp_dma_prepare(sctx, _buf->b.b,
> + _buf->b.b, size, size, _flags);
>
> -   va = sctx->scratch_buffer->gpu_address;
> +   va = tmp_buf->gpu_address + offset;
> si_emit_cp_dma_copy_buffer(sctx, va, va + CP_DMA_ALIGNMENT, size,
>dma_flags);
>  }
> --
> 2.8.0
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] radeonsi: use CE suballocator for CP DMA realignment.

2016-04-19 Thread Bas Nieuwenhuizen
Use the CE suballocator instead of the normal one as the usage
is most similar to the CE, i.e. only read and written on GPU
and not mapped to CPU.

Signed-off-by: Bas Nieuwenhuizen 
---
 src/gallium/drivers/radeonsi/si_cp_dma.c | 27 ++-
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_cp_dma.c 
b/src/gallium/drivers/radeonsi/si_cp_dma.c
index 38e0ee6..264789d 100644
--- a/src/gallium/drivers/radeonsi/si_cp_dma.c
+++ b/src/gallium/drivers/radeonsi/si_cp_dma.c
@@ -222,31 +222,24 @@ static void si_clear_buffer(struct pipe_context *ctx, 
struct pipe_resource *dst,
  */
 static void si_cp_dma_realign_engine(struct si_context *sctx, unsigned size)
 {
+
uint64_t va;
unsigned dma_flags = 0;
unsigned scratch_size = CP_DMA_ALIGNMENT * 2;
+   unsigned offset;
+   struct r600_resource *tmp_buf;
 
assert(size < CP_DMA_ALIGNMENT);
 
-   /* Use the scratch buffer as the dummy buffer. The 3D engine should be
-* idle at this point.
-*/
-   if (!sctx->scratch_buffer ||
-   sctx->scratch_buffer->b.b.width0 < scratch_size) {
-   r600_resource_reference(>scratch_buffer, NULL);
-   sctx->scratch_buffer =
-   si_resource_create_custom(>screen->b.b,
- PIPE_USAGE_DEFAULT,
- scratch_size);
-   if (!sctx->scratch_buffer)
-   return;
-   sctx->emit_scratch_reloc = true;
-   }
+   u_suballocator_alloc(sctx->ce_suballocator, scratch_size, ,
+(struct pipe_resource**)_buf);
+   if (!tmp_buf)
+   return;
 
-   si_cp_dma_prepare(sctx, >scratch_buffer->b.b,
- >scratch_buffer->b.b, size, size, _flags);
+   si_cp_dma_prepare(sctx, _buf->b.b,
+ _buf->b.b, size, size, _flags);
 
-   va = sctx->scratch_buffer->gpu_address;
+   va = tmp_buf->gpu_address + offset;
si_emit_cp_dma_copy_buffer(sctx, va, va + CP_DMA_ALIGNMENT, size,
   dma_flags);
 }
-- 
2.8.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev