[Mesa-dev] [PATCH] r600g: Use a fake reloc to sleep for fences

2012-02-02 Thread Simon Farnsworth
r300g is able to sleep until a fence completes rather than busywait because
it creates a special buffer object and relocation that stays busy until the
CS containing the fence is finished.

Copy the idea into r600g, and use it to sleep if the user asked for an
infinite wait, falling back to busywaiting if the user provided a timeout.

Signed-off-by: Simon Farnsworth simon.farnswo...@onelan.co.uk
---

v2: Address Michel's review comments.

The fake bo is now unconditional, and is unreferenced when the fence is
moved to the fence pool by fence_reference. This entailed shifting to
r600_resource, which cleaned the code up a little to boot.

 src/gallium/drivers/r600/r600.h|2 +-
 src/gallium/drivers/r600/r600_hw_context.c |9 -
 src/gallium/drivers/r600/r600_pipe.c   |9 -
 src/gallium/drivers/r600/r600_pipe.h   |1 +
 4 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600.h b/src/gallium/drivers/r600/r600.h
index baf09c1..c5813c2 100644
--- a/src/gallium/drivers/r600/r600.h
+++ b/src/gallium/drivers/r600/r600.h
@@ -284,7 +284,7 @@ void r600_context_queries_resume(struct r600_context *ctx);
 void r600_query_predication(struct r600_context *ctx, struct r600_query 
*query, int operation,
int flag_wait);
 void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource 
*fence,
- unsigned offset, unsigned value);
+ unsigned offset, unsigned value, struct 
r600_resource **sleep_bo);
 void r600_context_flush_all(struct r600_context *ctx, unsigned flush_flags);
 void r600_context_flush_dest_caches(struct r600_context *ctx);
 
diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 8eb8e6d..acfa494 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -1603,7 +1603,7 @@ void r600_context_flush(struct r600_context *ctx, 
unsigned flags)
}
 }
 
-void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource 
*fence_bo, unsigned offset, unsigned value)
+void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource 
*fence_bo, unsigned offset, unsigned value, struct r600_resource **sleep_bo)
 {
uint64_t va;
 
@@ -1623,6 +1623,13 @@ void r600_context_emit_fence(struct r600_context *ctx, 
struct r600_resource *fen
ctx-pm4[ctx-pm4_cdwords++] = 0;   /* DATA_HI */
ctx-pm4[ctx-pm4_cdwords++] = PKT3(PKT3_NOP, 0, 0);
ctx-pm4[ctx-pm4_cdwords++] = r600_context_bo_reloc(ctx, fence_bo, 
RADEON_USAGE_WRITE);
+
+   /* Create a dummy BO so that fence_finish without a timeout can sleep 
waiting for completion */
+   *sleep_bo = (struct r600_resource*)
+   pipe_buffer_create(ctx-screen-screen, 
PIPE_BIND_CUSTOM,
+  PIPE_USAGE_STAGING, 1);
+   /* Add the fence as a dummy relocation. */
+   r600_context_bo_reloc(ctx, *sleep_bo, RADEON_USAGE_READWRITE);
 }
 
 static unsigned r600_query_read_result(char *map, unsigned start_index, 
unsigned end_index,
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index c38fbc5..748869a 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -115,7 +115,7 @@ static struct r600_fence *r600_create_fence(struct 
r600_pipe_context *ctx)
pipe_reference_init(fence-reference, 1);
 
rscreen-fences.data[fence-index] = 0;
-   r600_context_emit_fence(ctx-ctx, rscreen-fences.bo, fence-index, 1);
+   r600_context_emit_fence(ctx-ctx, rscreen-fences.bo, fence-index, 1, 
fence-sleep_bo);
 out:
pipe_mutex_unlock(rscreen-fences.mutex);
return fence;
@@ -572,6 +572,7 @@ static void r600_fence_reference(struct pipe_screen 
*pscreen,
if (pipe_reference((*oldf)-reference, newf-reference)) {
struct r600_screen *rscreen = (struct r600_screen *)pscreen;
pipe_mutex_lock(rscreen-fences.mutex);
+   pipe_resource_reference((struct 
pipe_resource**)(*oldf)-sleep_bo, NULL);
LIST_ADDTAIL((*oldf)-head, rscreen-fences.pool);
pipe_mutex_unlock(rscreen-fences.mutex);
}
@@ -605,6 +606,12 @@ static boolean r600_fence_finish(struct pipe_screen 
*pscreen,
}
 
while (rscreen-fences.data[rfence-index] == 0) {
+   /* Special-case infinite timeout */
+   if (timeout == PIPE_TIMEOUT_INFINITE) {
+   rscreen-ws-buffer_wait(rfence-sleep_bo-buf, 
RADEON_USAGE_READWRITE);
+   continue;
+   }
+
if (++spins % 256)
continue;
 #ifdef PIPE_OS_UNIX
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 9ba60c8..df56560 100644
--- 

[Mesa-dev] [PATCH] r600g: Use a fake reloc to sleep for fences

2012-02-01 Thread Simon Farnsworth
r300g is able to sleep until a fence completes rather than busywait because
it creates a special buffer object and relocation that stays busy until the
CS containing the fence is finished.

Copy the idea into r600g, and use it to sleep if the user asked for an
infinite wait, falling back to busywaiting if the user provided a timeout.

Signed-off-by: Simon Farnsworth simon.farnswo...@onelan.co.uk
---
This is a userspace only fix for my problem, as suggested by Mark Olšák. It
doesn't fix the case with a timeout (which doesn't matter to me), but works
on existing kernels.

Given that my previous suggested fix opened a can of worms (mostly because I
don't fully understand modern Radeon hardware), this is probably enough of a
fix to apply for now, leaving a proper fix for someone with more
understanding of the way multiple rings interact.

 src/gallium/drivers/r600/r600.h|2 +-
 src/gallium/drivers/r600/r600_hw_context.c |   16 +++-
 src/gallium/drivers/r600/r600_pipe.c   |   10 +-
 src/gallium/drivers/r600/r600_pipe.h   |1 +
 4 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600.h b/src/gallium/drivers/r600/r600.h
index baf09c1..bac4890 100644
--- a/src/gallium/drivers/r600/r600.h
+++ b/src/gallium/drivers/r600/r600.h
@@ -284,7 +284,7 @@ void r600_context_queries_resume(struct r600_context *ctx);
 void r600_query_predication(struct r600_context *ctx, struct r600_query 
*query, int operation,
int flag_wait);
 void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource 
*fence,
- unsigned offset, unsigned value);
+ unsigned offset, unsigned value, struct pb_buffer 
**sleep_bo);
 void r600_context_flush_all(struct r600_context *ctx, unsigned flush_flags);
 void r600_context_flush_dest_caches(struct r600_context *ctx);
 
diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 8eb8e6d..cc81c48 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -1603,7 +1603,7 @@ void r600_context_flush(struct r600_context *ctx, 
unsigned flags)
}
 }
 
-void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource 
*fence_bo, unsigned offset, unsigned value)
+void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource 
*fence_bo, unsigned offset, unsigned value, struct pb_buffer **sleep_bo)
 {
uint64_t va;
 
@@ -1623,6 +1623,20 @@ void r600_context_emit_fence(struct r600_context *ctx, 
struct r600_resource *fen
ctx-pm4[ctx-pm4_cdwords++] = 0;   /* DATA_HI */
ctx-pm4[ctx-pm4_cdwords++] = PKT3(PKT3_NOP, 0, 0);
ctx-pm4[ctx-pm4_cdwords++] = r600_context_bo_reloc(ctx, fence_bo, 
RADEON_USAGE_WRITE);
+
+   if (sleep_bo) {
+   unsigned reloc_index;
+   /* Create a dummy BO so that fence_finish without a timeout can 
sleep waiting for completion */
+   *sleep_bo = ctx-ws-buffer_create(ctx-ws, 1, 1,
+  PIPE_BIND_CUSTOM,
+  RADEON_DOMAIN_GTT);
+   /* Add the fence as a dummy relocation. */
+   reloc_index = ctx-ws-cs_add_reloc(ctx-cs,
+   
ctx-ws-buffer_get_cs_handle(*sleep_bo),
+   RADEON_USAGE_READWRITE, 
RADEON_DOMAIN_GTT);
+   if (reloc_index = ctx-creloc)
+   ctx-creloc = reloc_index+1;
+   }
 }
 
 static unsigned r600_query_read_result(char *map, unsigned start_index, 
unsigned end_index,
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index c38fbc5..71e31b1 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -115,7 +115,7 @@ static struct r600_fence *r600_create_fence(struct 
r600_pipe_context *ctx)
pipe_reference_init(fence-reference, 1);
 
rscreen-fences.data[fence-index] = 0;
-   r600_context_emit_fence(ctx-ctx, rscreen-fences.bo, fence-index, 1);
+   r600_context_emit_fence(ctx-ctx, rscreen-fences.bo, fence-index, 1, 
fence-sleep_bo);
 out:
pipe_mutex_unlock(rscreen-fences.mutex);
return fence;
@@ -605,6 +605,14 @@ static boolean r600_fence_finish(struct pipe_screen 
*pscreen,
}
 
while (rscreen-fences.data[rfence-index] == 0) {
+   /* Special-case infinite timeout */
+   if (timeout == PIPE_TIMEOUT_INFINITE 
+   rfence-sleep_bo) {
+   rscreen-ws-buffer_wait(rfence-sleep_bo, 
RADEON_USAGE_READWRITE);
+   pb_reference(rfence-sleep_bo, NULL);
+   continue;
+   }
+
if (++spins % 256)
 

Re: [Mesa-dev] [PATCH] r600g: Use a fake reloc to sleep for fences

2012-02-01 Thread Simon Farnsworth
On Wednesday 1 February 2012, Simon Farnsworth simon.farnswo...@onelan.co.uk 
wrote:
 This is a userspace only fix for my problem, as suggested by Mark Olšák. It

My apologies, Marek; my typing appears to have failed me, and renamed you to
Mark. I shall try not to make that mistake again.
-- 
Simon Farnsworth
Software Engineer
ONELAN Limited
http://www.onelan.com/


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Use a fake reloc to sleep for fences

2012-02-01 Thread Michel Dänzer
[ Dropping dri-devel list ]

On Mit, 2012-02-01 at 15:01 +, Simon Farnsworth wrote: 
 r300g is able to sleep until a fence completes rather than busywait because
 it creates a special buffer object and relocation that stays busy until the
 CS containing the fence is finished.
 
 Copy the idea into r600g, and use it to sleep if the user asked for an
 infinite wait, falling back to busywaiting if the user provided a timeout.
 
 Signed-off-by: Simon Farnsworth simon.farnswo...@onelan.co.uk
 ---
 This is a userspace only fix for my problem, as suggested by Mark Olšák. It
 doesn't fix the case with a timeout (which doesn't matter to me), but works
 on existing kernels.
 
 Given that my previous suggested fix opened a can of worms (mostly because I
 don't fully understand modern Radeon hardware), this is probably enough of a
 fix to apply for now, leaving a proper fix for someone with more
 understanding of the way multiple rings interact.
[...] 
 diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
 b/src/gallium/drivers/r600/r600_hw_context.c
 index 8eb8e6d..cc81c48 100644
 --- a/src/gallium/drivers/r600/r600_hw_context.c
 +++ b/src/gallium/drivers/r600/r600_hw_context.c
 @@ -1623,6 +1623,20 @@ void r600_context_emit_fence(struct r600_context *ctx, 
 struct r600_resource *fen
   ctx-pm4[ctx-pm4_cdwords++] = 0;   /* DATA_HI */
   ctx-pm4[ctx-pm4_cdwords++] = PKT3(PKT3_NOP, 0, 0);
   ctx-pm4[ctx-pm4_cdwords++] = r600_context_bo_reloc(ctx, fence_bo, 
 RADEON_USAGE_WRITE);
 +
 + if (sleep_bo) {
 + unsigned reloc_index;
 + /* Create a dummy BO so that fence_finish without a timeout can 
 sleep waiting for completion */
 + *sleep_bo = ctx-ws-buffer_create(ctx-ws, 1, 1,
 +PIPE_BIND_CUSTOM,
 +RADEON_DOMAIN_GTT);
 + /* Add the fence as a dummy relocation. */
 + reloc_index = ctx-ws-cs_add_reloc(ctx-cs,
 + 
 ctx-ws-buffer_get_cs_handle(*sleep_bo),
 + RADEON_USAGE_READWRITE, 
 RADEON_DOMAIN_GTT);
 + if (reloc_index = ctx-creloc)
 + ctx-creloc = reloc_index+1;
 + }

Is there a point in making sleep_bo optional?


 diff --git a/src/gallium/drivers/r600/r600_pipe.c 
 b/src/gallium/drivers/r600/r600_pipe.c
 index c38fbc5..71e31b1 100644
 --- a/src/gallium/drivers/r600/r600_pipe.c
 +++ b/src/gallium/drivers/r600/r600_pipe.c
 @@ -605,6 +605,14 @@ static boolean r600_fence_finish(struct pipe_screen 
 *pscreen,
   }
  
   while (rscreen-fences.data[rfence-index] == 0) {
 + /* Special-case infinite timeout */
 + if (timeout == PIPE_TIMEOUT_INFINITE 
 + rfence-sleep_bo) {
 + rscreen-ws-buffer_wait(rfence-sleep_bo, 
 RADEON_USAGE_READWRITE);
 + pb_reference(rfence-sleep_bo, NULL);
 + continue;
 + }

I think rfence-sleep_bo should only be unreferenced in
r600_fence_reference() when the fence is recycled, otherwise it'll be
leaked if r600_fence_finish() is never called for some reason.

If r600_fence_finish() only ever called os_time_sleep(), never
sched_yield() (like r300_fence_finish()), would that avoid your problem
even with a finite timeout?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Use a fake reloc to sleep for fences

2012-02-01 Thread Simon Farnsworth
On Wednesday 1 February 2012, Michel Dänzer mic...@daenzer.net wrote:
 On Mit, 2012-02-01 at 15:01 +, Simon Farnsworth wrote: 
  +   if (sleep_bo) {
  +   unsigned reloc_index;
  +   /* Create a dummy BO so that fence_finish without a timeout can 
  sleep waiting for completion */
  +   *sleep_bo = ctx-ws-buffer_create(ctx-ws, 1, 1,
  +  PIPE_BIND_CUSTOM,
  +  RADEON_DOMAIN_GTT);
  +   /* Add the fence as a dummy relocation. */
  +   reloc_index = ctx-ws-cs_add_reloc(ctx-cs,
  +   
  ctx-ws-buffer_get_cs_handle(*sleep_bo),
  +   RADEON_USAGE_READWRITE, 
  RADEON_DOMAIN_GTT);
  +   if (reloc_index = ctx-creloc)
  +   ctx-creloc = reloc_index+1;
  +   }
 
 Is there a point in making sleep_bo optional?

I can't think of a reason to make it optional; I'll remove that in v2.
 
  diff --git a/src/gallium/drivers/r600/r600_pipe.c 
  b/src/gallium/drivers/r600/r600_pipe.c
  index c38fbc5..71e31b1 100644
  --- a/src/gallium/drivers/r600/r600_pipe.c
  +++ b/src/gallium/drivers/r600/r600_pipe.c
  @@ -605,6 +605,14 @@ static boolean r600_fence_finish(struct pipe_screen 
  *pscreen,
  }
   
  while (rscreen-fences.data[rfence-index] == 0) {
  +   /* Special-case infinite timeout */
  +   if (timeout == PIPE_TIMEOUT_INFINITE 
  +   rfence-sleep_bo) {
  +   rscreen-ws-buffer_wait(rfence-sleep_bo, 
  RADEON_USAGE_READWRITE);
  +   pb_reference(rfence-sleep_bo, NULL);
  +   continue;
  +   }
 
 I think rfence-sleep_bo should only be unreferenced in
 r600_fence_reference() when the fence is recycled, otherwise it'll be
 leaked if r600_fence_finish() is never called for some reason.

I'll fix this in v2.

 If r600_fence_finish() only ever called os_time_sleep(), never
 sched_yield() (like r300_fence_finish()), would that avoid your problem
 even with a finite timeout?

I experimented with that - depending on the specific workload, I need the
timeout to vary, otherwise I can see the impact of the loop in terms of bad
latency behaviour (resulting in occasional dropped frames). For the
workloads I tried, I needed the sleep to vary between 1 usec (for
low-complexity workloads) and 100 usec (for high complexity
workloads). Recompiling Mesa for each workload is obviously not an option.

I did try an adaptive spin - essentially removing the if (spins++ % 256)
continue, and adding:

if (spins  40)
os_sleep_time(1);
else if (spins  100)
os_sleep_time(10);
else
os_sleep_time(100);

But I felt this was ugly, when the core problem is that I want to sleep
until completion, the hardware has support for sleeping until completion,
and the only reason I can't is deficiencies in the driver stack.

Fundamentally, I suspect that the reason I'm seeing pain from this and other
people aren't is that I'm comparing an AMD E-350 to an Intel Atom D510, and
I've tuned my software stack on the D510 to within an inch of its life.

My expectation is that the better GPU in the E-350 will make my 2D
graphics-intensive workload (OpenGL compositing of 2D movies) perform about
as well as it did on the D510 - sleep-based waiting for fence completion
gets in the way, as the D510 has slightly more CPU power than the E-350, and
I'm not (yet) fully exploiting the E-350's GPU.
-- 
Simon Farnsworth
Software Engineer
ONELAN Limited
http://www.onelan.com/


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev