Re: [Mesa-dev] [PATCH 1/2] r600g: rework flusing and synchronization pattern v4

2012-12-08 Thread Jerome Glisse
On Sat, Dec 8, 2012 at 7:27 PM, Marek Olšák  wrote:
> Hi Jerome,
>
> I'm okay with the simplification of r600_flush_emit, I'm not so okay
> with some other things. There's also some cruft unrelated to flushing.
>
> 1) R600_CONTEXT_FLUSH could have a better name, because it's not clear
> what it does. (it looks like it only flushed read-only bindings)

GPU_FLUSH ?

> 2) Don't use magic numbers when setting cp_coher_cntl unless you want
> to hide something from us / obfuscating the code. :)
>
> 3) The definition of R600_MAX_FLUSH_CS_DWORDS should be updated.

Yes i haven't recomputed worst case

> 4) SURFACE_BASE_UPDATE is emitted twice in emit_framebuffer_state. I
> don't think splitting one packet into two packets doing the same thing
> is needed.

It's need couple r6xx/r7xx gpu will lockup after couple hour of
stressing, wasn't seeing lockup with it.

> 5) RS780 and RS880 don't need SURFACE_BASE_UPDATE for streamout. Their
> streamout hardware was actually copied from R700. Doing "< CHIP_RS780"
> instead of "< CHIP_RV770" was correct. The same for r600_flush_emit.

fglrx mostly do the same on r7xx and r6xx for streamout as i am not
sure i have any stressing test for that i side on fglrx side.

> 6) In r600_context_flush, don't remove the comment about flushing
> framebuffer caches, because it's still done there.
>
> 7) Masking out R600_CONTEXT_FLUSH in r600_context_emit_fence is not
> correct. We should still flush the caches later if they're dirty and
> even if the fence was emitted. You can't see this regression in
> piglit, because we don't have a test for that.
True
> 8) There's some inconsistent flushing between graphics and compute
> colorbuffer bindings. For graphics, you use (WAIT_IDLE |
> FLUSH_AND_INV), which makes sense. For compute, you use
> R600_CONTEXT_FLUSH (which is used for vertex buffers and the like
> elsewhere, but not colorbuffers).

I haven't paid much attention to compute side, i should probably look at it.

> And one question:
>
> Why do you use set both FLUSH_AND_INV and STREAMOUT_FLUSH on
> Evergreen, while r600 only gets FLUSH_AND_INV? Did you overlook this?

No, just matching fglrx pattern, i don't think i tested without that
change, but it definitly match fglrx.

Cheers,
Jerome

> Marek
>
> On Thu, Dec 6, 2012 at 8:51 PM,   wrote:
>> From: Jerome Glisse 
>>
>> This bring r600g allmost inline with closed source driver when
>> it comes to flushing and synchronization pattern.
>>
>> Signed-off-by: Jerome Glisse 
>> ---
>>  src/gallium/drivers/r600/evergreen_compute.c   |   8 +-
>>  .../drivers/r600/evergreen_compute_internal.c  |   4 +-
>>  src/gallium/drivers/r600/evergreen_state.c |   4 +-
>>  src/gallium/drivers/r600/r600.h|  16 +--
>>  src/gallium/drivers/r600/r600_hw_context.c | 154 
>> -
>>  src/gallium/drivers/r600/r600_state.c  |  18 ++-
>>  src/gallium/drivers/r600/r600_state_common.c   |  19 ++-
>>  7 files changed, 61 insertions(+), 162 deletions(-)
>>
>> diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
>> b/src/gallium/drivers/r600/evergreen_compute.c
>> index 44831a7..33a5910 100644
>> --- a/src/gallium/drivers/r600/evergreen_compute.c
>> +++ b/src/gallium/drivers/r600/evergreen_compute.c
>> @@ -98,7 +98,7 @@ static void evergreen_cs_set_vertex_buffer(
>>
>> /* The vertex instructions in the compute shaders use the texture 
>> cache,
>>  * so we need to invalidate it. */
>> -   rctx->flags |= R600_CONTEXT_TEX_FLUSH;
>> +   rctx->flags |= R600_CONTEXT_FLUSH;
>> state->enabled_mask |= 1 << vb_index;
>> state->dirty_mask |= 1 << vb_index;
>> state->atom.dirty = true;
>> @@ -329,7 +329,7 @@ static void compute_emit_cs(struct r600_context *ctx, 
>> const uint *block_layout,
>>  */
>> r600_emit_command_buffer(ctx->cs, &ctx->start_compute_cs_cmd);
>>
>> -   ctx->flags |= R600_CONTEXT_CB_FLUSH;
>> +   ctx->flags |= R600_CONTEXT_FLUSH;
>> r600_flush_emit(ctx);
>>
>> /* Emit colorbuffers. */
>> @@ -409,7 +409,7 @@ static void compute_emit_cs(struct r600_context *ctx, 
>> const uint *block_layout,
>>
>> /* XXX evergreen_flush_emit() hardcodes the CP_COHER_SIZE to 
>> 0x
>>  */
>> -   ctx->flags |= R600_CONTEXT_CB_FLUSH;
>> +   ctx->flags |= R600_CONTEXT_FLUSH;
>> r600_flush_emit(ctx);
>>
>>  #if 0
>> @@ -468,7 +468,7 @@ void evergreen_emit_cs_shader(
>> r600_write_value(cs, r600_context_bo_reloc(rctx, kernel->code_bo,
>> RADEON_USAGE_READ));
>>
>> -   rctx->flags |= R600_CONTEXT_SHADERCONST_FLUSH;
>> +   rctx->flags |= R600_CONTEXT_FLUSH;
>>  }
>>
>>  static void evergreen_launch_grid(
>> diff --git a/src/gallium/drivers/r600/evergreen_compute_internal.c 
>> b/src/gallium/drivers/r600/evergreen_compute_internal.c
>> index 7bc7fb4..187bcf1 100644
>> --- a/src/gallium/drivers/r600/evergreen_com

Re: [Mesa-dev] [PATCH 1/2] r600g: rework flusing and synchronization pattern v4

2012-12-08 Thread Marek Olšák
Hi Jerome,

I'm okay with the simplification of r600_flush_emit, I'm not so okay
with some other things. There's also some cruft unrelated to flushing.

1) R600_CONTEXT_FLUSH could have a better name, because it's not clear
what it does. (it looks like it only flushed read-only bindings)

2) Don't use magic numbers when setting cp_coher_cntl unless you want
to hide something from us / obfuscating the code. :)

3) The definition of R600_MAX_FLUSH_CS_DWORDS should be updated.

4) SURFACE_BASE_UPDATE is emitted twice in emit_framebuffer_state. I
don't think splitting one packet into two packets doing the same thing
is needed.

5) RS780 and RS880 don't need SURFACE_BASE_UPDATE for streamout. Their
streamout hardware was actually copied from R700. Doing "< CHIP_RS780"
instead of "< CHIP_RV770" was correct. The same for r600_flush_emit.

6) In r600_context_flush, don't remove the comment about flushing
framebuffer caches, because it's still done there.

7) Masking out R600_CONTEXT_FLUSH in r600_context_emit_fence is not
correct. We should still flush the caches later if they're dirty and
even if the fence was emitted. You can't see this regression in
piglit, because we don't have a test for that.

8) There's some inconsistent flushing between graphics and compute
colorbuffer bindings. For graphics, you use (WAIT_IDLE |
FLUSH_AND_INV), which makes sense. For compute, you use
R600_CONTEXT_FLUSH (which is used for vertex buffers and the like
elsewhere, but not colorbuffers).

And one question:

Why do you use set both FLUSH_AND_INV and STREAMOUT_FLUSH on
Evergreen, while r600 only gets FLUSH_AND_INV? Did you overlook this?

Marek

On Thu, Dec 6, 2012 at 8:51 PM,   wrote:
> From: Jerome Glisse 
>
> This bring r600g allmost inline with closed source driver when
> it comes to flushing and synchronization pattern.
>
> Signed-off-by: Jerome Glisse 
> ---
>  src/gallium/drivers/r600/evergreen_compute.c   |   8 +-
>  .../drivers/r600/evergreen_compute_internal.c  |   4 +-
>  src/gallium/drivers/r600/evergreen_state.c |   4 +-
>  src/gallium/drivers/r600/r600.h|  16 +--
>  src/gallium/drivers/r600/r600_hw_context.c | 154 
> -
>  src/gallium/drivers/r600/r600_state.c  |  18 ++-
>  src/gallium/drivers/r600/r600_state_common.c   |  19 ++-
>  7 files changed, 61 insertions(+), 162 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
> b/src/gallium/drivers/r600/evergreen_compute.c
> index 44831a7..33a5910 100644
> --- a/src/gallium/drivers/r600/evergreen_compute.c
> +++ b/src/gallium/drivers/r600/evergreen_compute.c
> @@ -98,7 +98,7 @@ static void evergreen_cs_set_vertex_buffer(
>
> /* The vertex instructions in the compute shaders use the texture 
> cache,
>  * so we need to invalidate it. */
> -   rctx->flags |= R600_CONTEXT_TEX_FLUSH;
> +   rctx->flags |= R600_CONTEXT_FLUSH;
> state->enabled_mask |= 1 << vb_index;
> state->dirty_mask |= 1 << vb_index;
> state->atom.dirty = true;
> @@ -329,7 +329,7 @@ static void compute_emit_cs(struct r600_context *ctx, 
> const uint *block_layout,
>  */
> r600_emit_command_buffer(ctx->cs, &ctx->start_compute_cs_cmd);
>
> -   ctx->flags |= R600_CONTEXT_CB_FLUSH;
> +   ctx->flags |= R600_CONTEXT_FLUSH;
> r600_flush_emit(ctx);
>
> /* Emit colorbuffers. */
> @@ -409,7 +409,7 @@ static void compute_emit_cs(struct r600_context *ctx, 
> const uint *block_layout,
>
> /* XXX evergreen_flush_emit() hardcodes the CP_COHER_SIZE to 
> 0x
>  */
> -   ctx->flags |= R600_CONTEXT_CB_FLUSH;
> +   ctx->flags |= R600_CONTEXT_FLUSH;
> r600_flush_emit(ctx);
>
>  #if 0
> @@ -468,7 +468,7 @@ void evergreen_emit_cs_shader(
> r600_write_value(cs, r600_context_bo_reloc(rctx, kernel->code_bo,
> RADEON_USAGE_READ));
>
> -   rctx->flags |= R600_CONTEXT_SHADERCONST_FLUSH;
> +   rctx->flags |= R600_CONTEXT_FLUSH;
>  }
>
>  static void evergreen_launch_grid(
> diff --git a/src/gallium/drivers/r600/evergreen_compute_internal.c 
> b/src/gallium/drivers/r600/evergreen_compute_internal.c
> index 7bc7fb4..187bcf1 100644
> --- a/src/gallium/drivers/r600/evergreen_compute_internal.c
> +++ b/src/gallium/drivers/r600/evergreen_compute_internal.c
> @@ -538,7 +538,7 @@ void evergreen_set_tex_resource(
>  
> util_format_get_blockwidth(tmp->resource.b.b.format) *
>  view->base.texture->width0*height*depth;
>
> -   pipe->ctx->flags |= R600_CONTEXT_TEX_FLUSH;
> +   pipe->ctx->flags |= R600_CONTEXT_FLUSH;
>
> evergreen_emit_force_reloc(res);
> evergreen_emit_force_reloc(res);
> @@ -597,7 +597,7 @@ void evergreen_set_const_cache(
> res->usage = RADEON_USAGE_READ;
> res->coher_bo_size = size;
>
> -   pipe->ctx->flags |= R600_CONTEXT_SHADERCONST

[Mesa-dev] [PATCH 1/2] r600g: rework flusing and synchronization pattern v4

2012-12-06 Thread j . glisse
From: Jerome Glisse 

This bring r600g allmost inline with closed source driver when
it comes to flushing and synchronization pattern.

Signed-off-by: Jerome Glisse 
---
 src/gallium/drivers/r600/evergreen_compute.c   |   8 +-
 .../drivers/r600/evergreen_compute_internal.c  |   4 +-
 src/gallium/drivers/r600/evergreen_state.c |   4 +-
 src/gallium/drivers/r600/r600.h|  16 +--
 src/gallium/drivers/r600/r600_hw_context.c | 154 -
 src/gallium/drivers/r600/r600_state.c  |  18 ++-
 src/gallium/drivers/r600/r600_state_common.c   |  19 ++-
 7 files changed, 61 insertions(+), 162 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
b/src/gallium/drivers/r600/evergreen_compute.c
index 44831a7..33a5910 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -98,7 +98,7 @@ static void evergreen_cs_set_vertex_buffer(
 
/* The vertex instructions in the compute shaders use the texture cache,
 * so we need to invalidate it. */
-   rctx->flags |= R600_CONTEXT_TEX_FLUSH;
+   rctx->flags |= R600_CONTEXT_FLUSH;
state->enabled_mask |= 1 << vb_index;
state->dirty_mask |= 1 << vb_index;
state->atom.dirty = true;
@@ -329,7 +329,7 @@ static void compute_emit_cs(struct r600_context *ctx, const 
uint *block_layout,
 */
r600_emit_command_buffer(ctx->cs, &ctx->start_compute_cs_cmd);
 
-   ctx->flags |= R600_CONTEXT_CB_FLUSH;
+   ctx->flags |= R600_CONTEXT_FLUSH;
r600_flush_emit(ctx);
 
/* Emit colorbuffers. */
@@ -409,7 +409,7 @@ static void compute_emit_cs(struct r600_context *ctx, const 
uint *block_layout,
 
/* XXX evergreen_flush_emit() hardcodes the CP_COHER_SIZE to 0x
 */
-   ctx->flags |= R600_CONTEXT_CB_FLUSH;
+   ctx->flags |= R600_CONTEXT_FLUSH;
r600_flush_emit(ctx);
 
 #if 0
@@ -468,7 +468,7 @@ void evergreen_emit_cs_shader(
r600_write_value(cs, r600_context_bo_reloc(rctx, kernel->code_bo,
RADEON_USAGE_READ));
 
-   rctx->flags |= R600_CONTEXT_SHADERCONST_FLUSH;
+   rctx->flags |= R600_CONTEXT_FLUSH;
 }
 
 static void evergreen_launch_grid(
diff --git a/src/gallium/drivers/r600/evergreen_compute_internal.c 
b/src/gallium/drivers/r600/evergreen_compute_internal.c
index 7bc7fb4..187bcf1 100644
--- a/src/gallium/drivers/r600/evergreen_compute_internal.c
+++ b/src/gallium/drivers/r600/evergreen_compute_internal.c
@@ -538,7 +538,7 @@ void evergreen_set_tex_resource(
 
util_format_get_blockwidth(tmp->resource.b.b.format) *
 view->base.texture->width0*height*depth;
 
-   pipe->ctx->flags |= R600_CONTEXT_TEX_FLUSH;
+   pipe->ctx->flags |= R600_CONTEXT_FLUSH;
 
evergreen_emit_force_reloc(res);
evergreen_emit_force_reloc(res);
@@ -597,7 +597,7 @@ void evergreen_set_const_cache(
res->usage = RADEON_USAGE_READ;
res->coher_bo_size = size;
 
-   pipe->ctx->flags |= R600_CONTEXT_SHADERCONST_FLUSH;
+   pipe->ctx->flags |= R600_CONTEXT_FLUSH;
 }
 
 struct r600_resource* r600_compute_buffer_alloc_vram(
diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 9b898cb..7bc4772 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -1557,14 +1557,14 @@ static void evergreen_set_framebuffer_state(struct 
pipe_context *ctx,
uint32_t i, log_samples;
 
if (rctx->framebuffer.state.nr_cbufs) {
-   rctx->flags |= R600_CONTEXT_CB_FLUSH;
+   rctx->flags |= R600_CONTEXT_WAIT_IDLE | 
R600_CONTEXT_FLUSH_AND_INV;
 
if (rctx->framebuffer.state.cbufs[0]->texture->nr_samples > 1) {
rctx->flags |= R600_CONTEXT_FLUSH_AND_INV_CB_META;
}
}
if (rctx->framebuffer.state.zsbuf) {
-   rctx->flags |= R600_CONTEXT_DB_FLUSH;
+   rctx->flags |= R600_CONTEXT_WAIT_IDLE | 
R600_CONTEXT_FLUSH_AND_INV;
}
 
util_copy_framebuffer_state(&rctx->framebuffer.state, state);
diff --git a/src/gallium/drivers/r600/r600.h b/src/gallium/drivers/r600/r600.h
index 7d43416..4060672 100644
--- a/src/gallium/drivers/r600/r600.h
+++ b/src/gallium/drivers/r600/r600.h
@@ -180,17 +180,11 @@ struct r600_so_target {
unsignedso_index;
 };
 
-#define R600_CONTEXT_PS_PARTIAL_FLUSH  (1 << 0)
-#define R600_CONTEXT_CB_FLUSH  (1 << 1)
-#define R600_CONTEXT_DB_FLUSH  (1 << 2)
-#define R600_CONTEXT_SHADERCONST_FLUSH (1 << 3)
-#define R600_CONTEXT_TEX_FLUSH (1 << 4)
-#define R600_CONTEXT_VTX_FLUSH (1 << 5)
-#define R600_CONTEXT_STREAMOUT_FLUSH   (1 << 6)
-#define R600_CONTEXT_WAIT_IDLE