[Mesa-dev] Problem getting a 32-bit X11 visual when porting from GLX to EGL
Hello all, I'm attempting to port our GLX based code to EGL/X11, as a first step towards being ready for Wayland, and I've hit a snag. We use an X11 compositor to get translucent windows blending one atop the other, and I carefully select a 32 bit RGBA visual (ID 0x64 on my system) instead of a 24 bit RGB visual (IDs 0x21 and 0x22 on my system) to make this work. I've got the code working on EGL/X11, but I can't find an EGLConfig that's based on visual 0x64 - they're all based on 0x21 (TrueColor) and 0x22 (DirectColor). Looking at http://cgit.freedesktop.org/mesa/mesa/tree/src/egl/drivers/dri2/platform_x11.c#n671 suggests this is deliberate. I've tried the stupid patch below, but that makes all the TrueColor EGLConfigs into the RGBA visual 0x64, which is clearly not wanted based on the comment. How should I proceed here? I can obviously go back to GLX, and migrate to EGL when I migrate to Wayland, but I'd prefer to move to EGL now while I'm still X11 based. diff --git a/src/egl/drivers/dri2/platform_x11.c b/src/egl/drivers/dri2/platform_x11.c index a518db1..18fb81d 100644 --- a/src/egl/drivers/dri2/platform_x11.c +++ b/src/egl/drivers/dri2/platform_x11.c @@ -649,6 +649,37 @@ dri2_add_configs_for_visuals(struct dri2_egl_display *dri2_dpy, EGL_SWAP_BEHAVIOR_PRESERVED_BIT; while (d.rem > 0) { + if (d.data->depth != 32) { + xcb_depth_next(&d); + continue; + } + EGLBoolean class_added[6] = { 0, }; + + visuals = xcb_depth_visuals(d.data); + for (i = 0; i < xcb_depth_visuals_length(d.data); i++) { +if (class_added[visuals[i]._class]) + continue; + +class_added[visuals[i]._class] = EGL_TRUE; +for (j = 0; dri2_dpy->driver_configs[j]; j++) { +config_attrs[1] = visuals[i].visual_id; +config_attrs[3] = visuals[i]._class; + +rgba_masks[0] = visuals[i].red_mask; +rgba_masks[1] = visuals[i].green_mask; +rgba_masks[2] = visuals[i].blue_mask; +rgba_masks[3] = + ~(rgba_masks[0] | rgba_masks[1] | rgba_masks[2]); + dri2_add_config(disp, dri2_dpy->driver_configs[j], id++, + surface_type, config_attrs, rgba_masks); +} + } + + xcb_depth_next(&d); + } + + d = xcb_screen_allowed_depths_iterator(s.data); + while (d.rem > 0) { EGLBoolean class_added[6] = { 0, }; visuals = xcb_depth_visuals(d.data); -- Simon Farnsworth Software Engineer ONELAN Ltd http://www.onelan.com signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCHv4] r600g: Use a fake reloc to sleep for fences
r300g is able to sleep until a fence completes rather than busywait because it creates a special buffer object and relocation that stays busy until the CS containing the fence is finished. Copy the idea into r600g, and use it to sleep if the user asked for an infinite wait, falling back to busywaiting if the user provided a timeout. Signed-off-by: Simon Farnsworth --- This is now ready to apply. We might also want to backport it to the 8.0 branch, as without this or an equivalent, a GPU reset can result in applications busywaiting forever. v4: Rebase against master - cleanups to r600g had stopped it applying. At Marek's suggestion, move the sleep BO into r600_pipe.c, instead of r600_hw_context.c v3: At Dave Airlie's suggestion on dri-devel, make use of the BO to give us a fighting chance of recovering after a GPU reset. We know that the fence will never be signalled by hardware if the dummy BO has gone idle - we use that information to escape the loop. This might be a useful addition to the 8.0 branch. v2: Address Michel's review comments. The fake bo is now unconditional, and is unreferenced when the fence is moved to the fence pool by fence_reference. This entailed shifting to r600_resource, which cleaned the code up a little to boot. src/gallium/drivers/r600/r600_pipe.c | 25 +++-- src/gallium/drivers/r600/r600_pipe.h |1 + 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index bfb01f5..7d2dd20 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -47,6 +47,7 @@ #include "r600_resource.h" #include "r600_shader.h" #include "r600_pipe.h" +#include "r600_hw_context_priv.h" /* * pipe_context @@ -116,6 +117,14 @@ static struct r600_fence *r600_create_fence(struct r600_context *rctx) rscreen->fences.data[fence->index] = 0; r600_context_emit_fence(rctx, rscreen->fences.bo, fence->index, 1); + + /* Create a dummy BO so that fence_finish without a timeout can sleep waiting for completion */ + fence->sleep_bo = (struct r600_resource*) + pipe_buffer_create(&rctx->screen->screen, PIPE_BIND_CUSTOM, + PIPE_USAGE_STAGING, 1); + /* Add the fence as a dummy relocation. */ + r600_context_bo_reloc(rctx, fence->sleep_bo, RADEON_USAGE_READWRITE); + out: pipe_mutex_unlock(rscreen->fences.mutex); return fence; @@ -584,6 +593,7 @@ static void r600_fence_reference(struct pipe_screen *pscreen, if (pipe_reference(&(*oldf)->reference, &newf->reference)) { struct r600_screen *rscreen = (struct r600_screen *)pscreen; pipe_mutex_lock(rscreen->fences.mutex); + pipe_resource_reference((struct pipe_resource**)&(*oldf)->sleep_bo, NULL); LIST_ADDTAIL(&(*oldf)->head, &rscreen->fences.pool); pipe_mutex_unlock(rscreen->fences.mutex); } @@ -617,6 +627,17 @@ static boolean r600_fence_finish(struct pipe_screen *pscreen, } while (rscreen->fences.data[rfence->index] == 0) { + /* Special-case infinite timeout - wait for the dummy BO to become idle */ + if (timeout == PIPE_TIMEOUT_INFINITE) { + rscreen->ws->buffer_wait(rfence->sleep_bo->buf, RADEON_USAGE_READWRITE); + break; + } + + /* The dummy BO will be busy until the CS including the fence has completed, or +* the GPU is reset. Don't bother continuing to spin when the BO is idle. */ + if (!rscreen->ws->buffer_is_busy(rfence->sleep_bo->buf, RADEON_USAGE_READWRITE)) + break; + if (++spins % 256) continue; #ifdef PIPE_OS_UNIX @@ -626,11 +647,11 @@ static boolean r600_fence_finish(struct pipe_screen *pscreen, #endif if (timeout != PIPE_TIMEOUT_INFINITE && os_time_get() - start_time >= timeout) { - return FALSE; + break; } } - return TRUE; + return rscreen->fences.data[rfence->index] != 0; } static int r600_interpret_tiling(struct r600_screen *rscreen, uint32_t tiling_config) diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index f130617..d8e396a 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -204,6 +204,7 @@ struct r600_textures_info { struct r600_fence { struct pipe_reference reference; unsignedindex; /* in the shared bo */ + struct
Re: [Mesa-dev] [PATCHv3] r600g: Use a fake reloc to sleep for fences
On Thursday 9 February 2012 20:31:18 Marek Olšák wrote: > 2012/2/9 Simon Farnsworth : > > On Wednesday 8 February 2012 18:28:05 Michel Dänzer wrote: > >> On Fre, 2012-02-03 at 17:32 +, Simon Farnsworth wrote: > >> > --- a/src/gallium/drivers/r600/r600_hw_context.c > >> > +++ b/src/gallium/drivers/r600/r600_hw_context.c > >> > @@ -1623,6 +1623,13 @@ void r600_context_emit_fence(struct > >> > + > >> > + /* Create a dummy BO so that fence_finish without a timeout > >> > can sleep waiting for completion */ + *sleep_bo = (struct > >> > r600_resource*) + > >> > pipe_buffer_create(&ctx->screen->screen, PIPE_BIND_CUSTOM, + > >> > PIPE_USAGE_STAGING, 1); + /* > >> > Add the fence as a dummy relocation. */ > >> > + r600_context_bo_reloc(ctx, *sleep_bo, RADEON_USAGE_READWRITE); > >> > >> Sorry for only thinking of this now, but what's the advantage of doing > >> this here, rather than in r600_create_fence()? Seems like that would > >> be > >> simpler and cleaner. > > > > I've done it here because this is the bit of code that deals with all > > the > > hardware-related stuff, and it already knows about writing out > > relocations. I have no particular opinion either way, though, and > > wouldn't get upset about moving it. Thoughts? > > FWIW, the "hw_context" files are there for historical reasons only. There is > nothing wrong with adding relocations and emitting commands from the other > places of the driver. Eventually, r600.h, r600_pipe.h, and > r600_hw_context_priv.h will be merged together. > OK, so when I rebase this onto master (which will be v4), I should move the sleep_bo into r600_create_fence. Is there anything else that needs to be done before this can be considered for merging to master? -- Simon Farnsworth Software Engineer ONELAN Ltd http://www.onelan.com signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCHv3] r600g: Use a fake reloc to sleep for fences
On Wednesday 8 February 2012 18:28:05 Michel Dänzer wrote: > On Fre, 2012-02-03 at 17:32 +0000, Simon Farnsworth wrote: > > r300g is able to sleep until a fence completes rather than busywait > > because it creates a special buffer object and relocation that stays > > busy until the CS containing the fence is finished. > > > > Copy the idea into r600g, and use it to sleep if the user asked for an > > infinite wait, falling back to busywaiting if the user provided a > > timeout. > > > > Signed-off-by: Simon Farnsworth > > --- > > > > v3: At Dave Airlie's suggestion on dri-devel, make use of the BO to give > > us a fighting chance of recovering after a GPU reset. > > > > We know that the fence will never be signalled by hardware if the dummy > > BO has gone idle - we use that information to escape the loop. This > > might be a useful addition to the 8.0 branch. > > That's a nifty trick. :) > It's just a shame that the rest of recovering from a GPU reset isn't as simple - at least this stops us busywaiting eternally after the kernel's detected a CP stall. > > diff --git a/src/gallium/drivers/r600/r600_hw_context.c > > b/src/gallium/drivers/r600/r600_hw_context.c index 8eb8e6d..acfa494 > > 100644 > > --- a/src/gallium/drivers/r600/r600_hw_context.c > > +++ b/src/gallium/drivers/r600/r600_hw_context.c > > @@ -1623,6 +1623,13 @@ void r600_context_emit_fence(struct r600_context > > *ctx, struct r600_resource *fen> > > ctx->pm4[ctx->pm4_cdwords++] = 0; /* DATA_HI > > */ > > ctx->pm4[ctx->pm4_cdwords++] = PKT3(PKT3_NOP, 0, 0); > > ctx->pm4[ctx->pm4_cdwords++] = r600_context_bo_reloc(ctx, fence_bo, > > RADEON_USAGE_WRITE);> > > + > > + /* Create a dummy BO so that fence_finish without a timeout can sleep > > waiting for completion */ + *sleep_bo = (struct r600_resource*) > > + pipe_buffer_create(&ctx->screen->screen, > > PIPE_BIND_CUSTOM, > > + PIPE_USAGE_STAGING, 1); > > + /* Add the fence as a dummy relocation. */ > > + r600_context_bo_reloc(ctx, *sleep_bo, RADEON_USAGE_READWRITE); > > Sorry for only thinking of this now, but what's the advantage of doing > this here, rather than in r600_create_fence()? Seems like that would be > simpler and cleaner. I've done it here because this is the bit of code that deals with all the hardware-related stuff, and it already knows about writing out relocations. I have no particular opinion either way, though, and wouldn't get upset about moving it. Thoughts? -- Simon Farnsworth Software Engineer ONELAN Ltd http://www.onelan.com signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCHv3] r600g: Use a fake reloc to sleep for fences
r300g is able to sleep until a fence completes rather than busywait because it creates a special buffer object and relocation that stays busy until the CS containing the fence is finished. Copy the idea into r600g, and use it to sleep if the user asked for an infinite wait, falling back to busywaiting if the user provided a timeout. Signed-off-by: Simon Farnsworth --- v3: At Dave Airlie's suggestion on dri-devel, make use of the BO to give us a fighting chance of recovering after a GPU reset. We know that the fence will never be signalled by hardware if the dummy BO has gone idle - we use that information to escape the loop. This might be a useful addition to the 8.0 branch. v2: Address Michel's review comments. The fake bo is now unconditional, and is unreferenced when the fence is moved to the fence pool by fence_reference. This entailed shifting to r600_resource, which cleaned the code up a little to boot. src/gallium/drivers/r600/r600.h|2 +- src/gallium/drivers/r600/r600_hw_context.c |9 - src/gallium/drivers/r600/r600_pipe.c | 18 +++--- src/gallium/drivers/r600/r600_pipe.h |1 + 4 files changed, 25 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/r600/r600.h b/src/gallium/drivers/r600/r600.h index baf09c1..c5813c2 100644 --- a/src/gallium/drivers/r600/r600.h +++ b/src/gallium/drivers/r600/r600.h @@ -284,7 +284,7 @@ void r600_context_queries_resume(struct r600_context *ctx); void r600_query_predication(struct r600_context *ctx, struct r600_query *query, int operation, int flag_wait); void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource *fence, - unsigned offset, unsigned value); + unsigned offset, unsigned value, struct r600_resource **sleep_bo); void r600_context_flush_all(struct r600_context *ctx, unsigned flush_flags); void r600_context_flush_dest_caches(struct r600_context *ctx); diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index 8eb8e6d..acfa494 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -1603,7 +1603,7 @@ void r600_context_flush(struct r600_context *ctx, unsigned flags) } } -void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource *fence_bo, unsigned offset, unsigned value) +void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource *fence_bo, unsigned offset, unsigned value, struct r600_resource **sleep_bo) { uint64_t va; @@ -1623,6 +1623,13 @@ void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource *fen ctx->pm4[ctx->pm4_cdwords++] = 0; /* DATA_HI */ ctx->pm4[ctx->pm4_cdwords++] = PKT3(PKT3_NOP, 0, 0); ctx->pm4[ctx->pm4_cdwords++] = r600_context_bo_reloc(ctx, fence_bo, RADEON_USAGE_WRITE); + + /* Create a dummy BO so that fence_finish without a timeout can sleep waiting for completion */ + *sleep_bo = (struct r600_resource*) + pipe_buffer_create(&ctx->screen->screen, PIPE_BIND_CUSTOM, + PIPE_USAGE_STAGING, 1); + /* Add the fence as a dummy relocation. */ + r600_context_bo_reloc(ctx, *sleep_bo, RADEON_USAGE_READWRITE); } static unsigned r600_query_read_result(char *map, unsigned start_index, unsigned end_index, diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index c38fbc5..9b6015c 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -115,7 +115,7 @@ static struct r600_fence *r600_create_fence(struct r600_pipe_context *ctx) pipe_reference_init(&fence->reference, 1); rscreen->fences.data[fence->index] = 0; - r600_context_emit_fence(&ctx->ctx, rscreen->fences.bo, fence->index, 1); + r600_context_emit_fence(&ctx->ctx, rscreen->fences.bo, fence->index, 1, &fence->sleep_bo); out: pipe_mutex_unlock(rscreen->fences.mutex); return fence; @@ -572,6 +572,7 @@ static void r600_fence_reference(struct pipe_screen *pscreen, if (pipe_reference(&(*oldf)->reference, &newf->reference)) { struct r600_screen *rscreen = (struct r600_screen *)pscreen; pipe_mutex_lock(rscreen->fences.mutex); + pipe_resource_reference((struct pipe_resource**)&(*oldf)->sleep_bo, NULL); LIST_ADDTAIL(&(*oldf)->head, &rscreen->fences.pool); pipe_mutex_unlock(rscreen->fences.mutex); } @@ -605,6 +606,17 @@ static boolean r600_fence_finish(struct pipe_screen *pscreen, } while (rscreen->fences.data[rfence->index] == 0) { +
[Mesa-dev] [PATCH] r600g: Use a fake reloc to sleep for fences
r300g is able to sleep until a fence completes rather than busywait because it creates a special buffer object and relocation that stays busy until the CS containing the fence is finished. Copy the idea into r600g, and use it to sleep if the user asked for an infinite wait, falling back to busywaiting if the user provided a timeout. Signed-off-by: Simon Farnsworth --- v2: Address Michel's review comments. The fake bo is now unconditional, and is unreferenced when the fence is moved to the fence pool by fence_reference. This entailed shifting to r600_resource, which cleaned the code up a little to boot. src/gallium/drivers/r600/r600.h|2 +- src/gallium/drivers/r600/r600_hw_context.c |9 - src/gallium/drivers/r600/r600_pipe.c |9 - src/gallium/drivers/r600/r600_pipe.h |1 + 4 files changed, 18 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600.h b/src/gallium/drivers/r600/r600.h index baf09c1..c5813c2 100644 --- a/src/gallium/drivers/r600/r600.h +++ b/src/gallium/drivers/r600/r600.h @@ -284,7 +284,7 @@ void r600_context_queries_resume(struct r600_context *ctx); void r600_query_predication(struct r600_context *ctx, struct r600_query *query, int operation, int flag_wait); void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource *fence, - unsigned offset, unsigned value); + unsigned offset, unsigned value, struct r600_resource **sleep_bo); void r600_context_flush_all(struct r600_context *ctx, unsigned flush_flags); void r600_context_flush_dest_caches(struct r600_context *ctx); diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index 8eb8e6d..acfa494 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -1603,7 +1603,7 @@ void r600_context_flush(struct r600_context *ctx, unsigned flags) } } -void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource *fence_bo, unsigned offset, unsigned value) +void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource *fence_bo, unsigned offset, unsigned value, struct r600_resource **sleep_bo) { uint64_t va; @@ -1623,6 +1623,13 @@ void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource *fen ctx->pm4[ctx->pm4_cdwords++] = 0; /* DATA_HI */ ctx->pm4[ctx->pm4_cdwords++] = PKT3(PKT3_NOP, 0, 0); ctx->pm4[ctx->pm4_cdwords++] = r600_context_bo_reloc(ctx, fence_bo, RADEON_USAGE_WRITE); + + /* Create a dummy BO so that fence_finish without a timeout can sleep waiting for completion */ + *sleep_bo = (struct r600_resource*) + pipe_buffer_create(&ctx->screen->screen, PIPE_BIND_CUSTOM, + PIPE_USAGE_STAGING, 1); + /* Add the fence as a dummy relocation. */ + r600_context_bo_reloc(ctx, *sleep_bo, RADEON_USAGE_READWRITE); } static unsigned r600_query_read_result(char *map, unsigned start_index, unsigned end_index, diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index c38fbc5..748869a 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -115,7 +115,7 @@ static struct r600_fence *r600_create_fence(struct r600_pipe_context *ctx) pipe_reference_init(&fence->reference, 1); rscreen->fences.data[fence->index] = 0; - r600_context_emit_fence(&ctx->ctx, rscreen->fences.bo, fence->index, 1); + r600_context_emit_fence(&ctx->ctx, rscreen->fences.bo, fence->index, 1, &fence->sleep_bo); out: pipe_mutex_unlock(rscreen->fences.mutex); return fence; @@ -572,6 +572,7 @@ static void r600_fence_reference(struct pipe_screen *pscreen, if (pipe_reference(&(*oldf)->reference, &newf->reference)) { struct r600_screen *rscreen = (struct r600_screen *)pscreen; pipe_mutex_lock(rscreen->fences.mutex); + pipe_resource_reference((struct pipe_resource**)&(*oldf)->sleep_bo, NULL); LIST_ADDTAIL(&(*oldf)->head, &rscreen->fences.pool); pipe_mutex_unlock(rscreen->fences.mutex); } @@ -605,6 +606,12 @@ static boolean r600_fence_finish(struct pipe_screen *pscreen, } while (rscreen->fences.data[rfence->index] == 0) { + /* Special-case infinite timeout */ + if (timeout == PIPE_TIMEOUT_INFINITE) { + rscreen->ws->buffer_wait(rfence->sleep_bo->buf, RADEON_USAGE_READWRITE); + continue; + } + if (++spins % 256) continue;
Re: [Mesa-dev] [PATCH] r600g: Use a fake reloc to sleep for fences
On Wednesday 1 February 2012, Michel Dänzer wrote: > On Mit, 2012-02-01 at 15:01 +0000, Simon Farnsworth wrote: > > + if (sleep_bo) { > > + unsigned reloc_index; > > + /* Create a dummy BO so that fence_finish without a timeout can > > sleep waiting for completion */ > > + *sleep_bo = ctx->ws->buffer_create(ctx->ws, 1, 1, > > + PIPE_BIND_CUSTOM, > > + RADEON_DOMAIN_GTT); > > + /* Add the fence as a dummy relocation. */ > > + reloc_index = ctx->ws->cs_add_reloc(ctx->cs, > > + > > ctx->ws->buffer_get_cs_handle(*sleep_bo), > > + RADEON_USAGE_READWRITE, > > RADEON_DOMAIN_GTT); > > + if (reloc_index >= ctx->creloc) > > + ctx->creloc = reloc_index+1; > > + } > > Is there a point in making sleep_bo optional? > I can't think of a reason to make it optional; I'll remove that in v2. > > > diff --git a/src/gallium/drivers/r600/r600_pipe.c > > b/src/gallium/drivers/r600/r600_pipe.c > > index c38fbc5..71e31b1 100644 > > --- a/src/gallium/drivers/r600/r600_pipe.c > > +++ b/src/gallium/drivers/r600/r600_pipe.c > > @@ -605,6 +605,14 @@ static boolean r600_fence_finish(struct pipe_screen > > *pscreen, > > } > > > > while (rscreen->fences.data[rfence->index] == 0) { > > + /* Special-case infinite timeout */ > > + if (timeout == PIPE_TIMEOUT_INFINITE && > > + rfence->sleep_bo) { > > + rscreen->ws->buffer_wait(rfence->sleep_bo, > > RADEON_USAGE_READWRITE); > > + pb_reference(&rfence->sleep_bo, NULL); > > + continue; > > + } > > I think rfence->sleep_bo should only be unreferenced in > r600_fence_reference() when the fence is recycled, otherwise it'll be > leaked if r600_fence_finish() is never called for some reason. > I'll fix this in v2. > If r600_fence_finish() only ever called os_time_sleep(), never > sched_yield() (like r300_fence_finish()), would that avoid your problem > even with a finite timeout? > I experimented with that - depending on the specific workload, I need the timeout to vary, otherwise I can see the impact of the loop in terms of bad latency behaviour (resulting in occasional dropped frames). For the workloads I tried, I needed the sleep to vary between 1 usec (for low-complexity workloads) and 100 usec (for high complexity workloads). Recompiling Mesa for each workload is obviously not an option. I did try an adaptive spin - essentially removing the "if (spins++ % 256) continue", and adding: if (spins < 40) os_sleep_time(1); else if (spins < 100) os_sleep_time(10); else os_sleep_time(100); But I felt this was ugly, when the core problem is that I want to sleep until completion, the hardware has support for sleeping until completion, and the only reason I can't is deficiencies in the driver stack. Fundamentally, I suspect that the reason I'm seeing pain from this and other people aren't is that I'm comparing an AMD E-350 to an Intel Atom D510, and I've tuned my software stack on the D510 to within an inch of its life. My expectation is that the better GPU in the E-350 will make my 2D graphics-intensive workload (OpenGL compositing of 2D movies) perform about as well as it did on the D510 - sleep-based waiting for fence completion gets in the way, as the D510 has slightly more CPU power than the E-350, and I'm not (yet) fully exploiting the E-350's GPU. -- Simon Farnsworth Software Engineer ONELAN Limited http://www.onelan.com/ signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: Use a fake reloc to sleep for fences
On Wednesday 1 February 2012, Simon Farnsworth wrote: > This is a userspace only fix for my problem, as suggested by Mark Olšák. It My apologies, Marek; my typing appears to have failed me, and renamed you to Mark. I shall try not to make that mistake again. -- Simon Farnsworth Software Engineer ONELAN Limited http://www.onelan.com/ signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g: Use a fake reloc to sleep for fences
r300g is able to sleep until a fence completes rather than busywait because it creates a special buffer object and relocation that stays busy until the CS containing the fence is finished. Copy the idea into r600g, and use it to sleep if the user asked for an infinite wait, falling back to busywaiting if the user provided a timeout. Signed-off-by: Simon Farnsworth --- This is a userspace only fix for my problem, as suggested by Mark Olšák. It doesn't fix the case with a timeout (which doesn't matter to me), but works on existing kernels. Given that my previous suggested fix opened a can of worms (mostly because I don't fully understand modern Radeon hardware), this is probably enough of a fix to apply for now, leaving a proper fix for someone with more understanding of the way multiple rings interact. src/gallium/drivers/r600/r600.h|2 +- src/gallium/drivers/r600/r600_hw_context.c | 16 +++- src/gallium/drivers/r600/r600_pipe.c | 10 +- src/gallium/drivers/r600/r600_pipe.h |1 + 4 files changed, 26 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600.h b/src/gallium/drivers/r600/r600.h index baf09c1..bac4890 100644 --- a/src/gallium/drivers/r600/r600.h +++ b/src/gallium/drivers/r600/r600.h @@ -284,7 +284,7 @@ void r600_context_queries_resume(struct r600_context *ctx); void r600_query_predication(struct r600_context *ctx, struct r600_query *query, int operation, int flag_wait); void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource *fence, - unsigned offset, unsigned value); + unsigned offset, unsigned value, struct pb_buffer **sleep_bo); void r600_context_flush_all(struct r600_context *ctx, unsigned flush_flags); void r600_context_flush_dest_caches(struct r600_context *ctx); diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index 8eb8e6d..cc81c48 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -1603,7 +1603,7 @@ void r600_context_flush(struct r600_context *ctx, unsigned flags) } } -void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource *fence_bo, unsigned offset, unsigned value) +void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource *fence_bo, unsigned offset, unsigned value, struct pb_buffer **sleep_bo) { uint64_t va; @@ -1623,6 +1623,20 @@ void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource *fen ctx->pm4[ctx->pm4_cdwords++] = 0; /* DATA_HI */ ctx->pm4[ctx->pm4_cdwords++] = PKT3(PKT3_NOP, 0, 0); ctx->pm4[ctx->pm4_cdwords++] = r600_context_bo_reloc(ctx, fence_bo, RADEON_USAGE_WRITE); + + if (sleep_bo) { + unsigned reloc_index; + /* Create a dummy BO so that fence_finish without a timeout can sleep waiting for completion */ + *sleep_bo = ctx->ws->buffer_create(ctx->ws, 1, 1, + PIPE_BIND_CUSTOM, + RADEON_DOMAIN_GTT); + /* Add the fence as a dummy relocation. */ + reloc_index = ctx->ws->cs_add_reloc(ctx->cs, + ctx->ws->buffer_get_cs_handle(*sleep_bo), + RADEON_USAGE_READWRITE, RADEON_DOMAIN_GTT); + if (reloc_index >= ctx->creloc) + ctx->creloc = reloc_index+1; + } } static unsigned r600_query_read_result(char *map, unsigned start_index, unsigned end_index, diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index c38fbc5..71e31b1 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -115,7 +115,7 @@ static struct r600_fence *r600_create_fence(struct r600_pipe_context *ctx) pipe_reference_init(&fence->reference, 1); rscreen->fences.data[fence->index] = 0; - r600_context_emit_fence(&ctx->ctx, rscreen->fences.bo, fence->index, 1); + r600_context_emit_fence(&ctx->ctx, rscreen->fences.bo, fence->index, 1, &fence->sleep_bo); out: pipe_mutex_unlock(rscreen->fences.mutex); return fence; @@ -605,6 +605,14 @@ static boolean r600_fence_finish(struct pipe_screen *pscreen, } while (rscreen->fences.data[rfence->index] == 0) { + /* Special-case infinite timeout */ + if (timeout == PIPE_TIMEOUT_INFINITE && + rfence->sleep_bo) { + rscreen->ws->buffer_wait(rfence->sleep_bo, RADEON_USAGE_READWRITE); +
[Mesa-dev] [PATCHv2] drm/radeon: Add support for userspace fence waits
Userspace currently busywaits for fences to complete; on my workload, this busywait consumes 10% of the available CPU time. Provide an ioctl so that userspace can wait for an EOP interrupt that corresponds to a previous EVENT_WRITE_EOP. Signed-off-by: Simon Farnsworth --- There's debate ongoing about whether this is a sensible interface; I've cleaned up in accordance with Dave Airlie's review comments, so that there's a nicer starting point if the debate ends up agreeing that this is actually the sort of interface we want. In the meantime, I'm going to leave this interface alone, and work on porting r300g's fence mechanism to r600g, using it to sleep instead of spinning when fences are issued with an infinite timeout, but using the existing busywait mechanism if a timeout is provided. drivers/gpu/drm/radeon/radeon.h |3 ++ drivers/gpu/drm/radeon/radeon_drv.c |3 +- drivers/gpu/drm/radeon/radeon_fence.c |2 + drivers/gpu/drm/radeon/radeon_gem.c | 63 + drivers/gpu/drm/radeon/radeon_kms.c |1 + include/drm/radeon_drm.h | 31 6 files changed, 102 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 2859406..00c187b 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -217,6 +217,7 @@ struct radeon_fence_driver { unsigned long last_jiffies; unsigned long last_timeout; wait_queue_head_t queue; + wait_queue_head_t userspace_queue; struct list_headcreated; struct list_heademitted; struct list_headsignaled; @@ -1348,6 +1349,8 @@ int radeon_gem_set_tiling_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); int radeon_gem_get_tiling_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); +int radeon_gem_wait_user_fence_ioctl(struct drm_device *dev, void *data, +struct drm_file *filp); /* VRAM scratch page for HDP bug, default vram page */ struct r600_vram_scratch { diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c index 4ae2e1d..9f82fa9 100644 --- a/drivers/gpu/drm/radeon/radeon_drv.c +++ b/drivers/gpu/drm/radeon/radeon_drv.c @@ -56,9 +56,10 @@ * 2.12.0 - RADEON_CS_KEEP_TILING_FLAGS * 2.13.0 - virtual memory support * 2.14.0 - add evergreen tiling informations + * 2.15.0 - gem_wait_user_fence ioctl */ #define KMS_DRIVER_MAJOR 2 -#define KMS_DRIVER_MINOR 14 +#define KMS_DRIVER_MINOR 15 #define KMS_DRIVER_PATCHLEVEL 0 int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags); int radeon_driver_unload_kms(struct drm_device *dev); diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 64ea3dd..d86bc28 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -356,6 +356,7 @@ void radeon_fence_process(struct radeon_device *rdev, int ring) if (wake) { wake_up_all(&rdev->fence_drv[ring].queue); } + wake_up_interruptible_all(&rdev->fence_drv[ring].userspace_queue); } int radeon_fence_count_emitted(struct radeon_device *rdev, int ring) @@ -421,6 +422,7 @@ static void radeon_fence_driver_init_ring(struct radeon_device *rdev, int ring) INIT_LIST_HEAD(&rdev->fence_drv[ring].emitted); INIT_LIST_HEAD(&rdev->fence_drv[ring].signaled); init_waitqueue_head(&rdev->fence_drv[ring].queue); + init_waitqueue_head(&rdev->fence_drv[ring].userspace_queue); rdev->fence_drv[ring].initialized = false; } diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c index 7337850..765fc6d 100644 --- a/drivers/gpu/drm/radeon/radeon_gem.c +++ b/drivers/gpu/drm/radeon/radeon_gem.c @@ -531,3 +531,66 @@ int radeon_mode_dumb_destroy(struct drm_file *file_priv, { return drm_gem_handle_delete(file_priv, handle); } + +int radeon_gem_wait_user_fence_ioctl(struct drm_device *dev, void *data, + struct drm_file *filp) +{ + struct drm_radeon_gem_wait_user_fence *args = data; + struct radeon_device *rdev = dev->dev_private; + struct drm_gem_object *gobj; + struct radeon_bo *robj; + void *buffer_data; + uint32_t *fence_data; + int r = 0; + long timeout; + int ring = RADEON_RING_TYPE_GFX_INDEX; + + /* If you're implementing this for other rings, you'll want to share + code with radeon_cs_get_ring in radeon_cs.c */ + if (args->ring != RADEON_CS_RING_GFX) { + return -EINVAL; +
[Mesa-dev] [PATCHv2] r600g: Use new kernel interface to wait for fences
Instead of busywaiting for the GPU to finish a fence, use the new kernel interface to wait for fence completion. If the new kernel interface is unavailable, fall back to busywaiting. Signed-off-by: Simon Farnsworth --- This is simply addressing Michel's review comments against the v1 patch. There is ongoing debate on the dri-devel list about the required kernel interface to make this patch useful; until that discussion is resolved, this patch should probably not be applied. src/gallium/drivers/r600/r600_hw_context.c|2 +- src/gallium/drivers/r600/r600_pipe.c | 14 +++-- src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 34 + src/gallium/winsys/radeon/drm/radeon_winsys.h | 16 +++ 4 files changed, 56 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index 8eb8e6d..35a57a7 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -1618,7 +1618,7 @@ void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource *fen ctx->pm4[ctx->pm4_cdwords++] = EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_TS_EVENT) | EVENT_INDEX(5); ctx->pm4[ctx->pm4_cdwords++] = va & 0xUL; /* ADDRESS_LO */ /* DATA_SEL | INT_EN | ADDRESS_HI */ - ctx->pm4[ctx->pm4_cdwords++] = (1 << 29) | (0 << 24) | ((va >> 32UL) & 0xFF); + ctx->pm4[ctx->pm4_cdwords++] = (1 << 29) | (2 << 24) | ((va >> 32UL) & 0xFF); ctx->pm4[ctx->pm4_cdwords++] = value; /* DATA_LO */ ctx->pm4[ctx->pm4_cdwords++] = 0; /* DATA_HI */ ctx->pm4[ctx->pm4_cdwords++] = PKT3(PKT3_NOP, 0, 0); diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index c38fbc5..c03a176 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -595,7 +595,6 @@ static boolean r600_fence_finish(struct pipe_screen *pscreen, struct r600_screen *rscreen = (struct r600_screen *)pscreen; struct r600_fence *rfence = (struct r600_fence*)fence; int64_t start_time = 0; - unsigned spins = 0; if (timeout != PIPE_TIMEOUT_INFINITE) { start_time = os_time_get(); @@ -605,16 +604,13 @@ static boolean r600_fence_finish(struct pipe_screen *pscreen, } while (rscreen->fences.data[rfence->index] == 0) { - if (++spins % 256) - continue; -#ifdef PIPE_OS_UNIX - sched_yield(); -#else - os_time_sleep(10); -#endif + rscreen->ws->buffer_wait_fence(rscreen->fences.bo->buf, + rfence->index << 2, + 0, + timeout); if (timeout != PIPE_TIMEOUT_INFINITE && os_time_get() - start_time >= timeout) { - return FALSE; + return rscreen->fences.data[rfence->index] == 0 ? FALSE : TRUE; } } diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c index 143dcf9..b83791d 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c @@ -879,6 +879,36 @@ static uint64_t radeon_winsys_bo_va(struct pb_buffer *buffer) return bo->va; } +/* No kernel support for doing this faster - just spin */ +static void radeon_winsys_bo_wait_fence_nokernel(struct pb_buffer *buf, +unsigned offset, +uint32_t value, +uint64_t timeout) +{ +#ifdef PIPE_OS_UNIX +sched_yield(); +#else +os_time_sleep(10); +#endif +} + +static void radeon_winsys_bo_wait_fence(struct pb_buffer *_buf, + unsigned offset, + uint32_t value, + uint64_t timeout) +{ +struct radeon_bo *bo = get_radeon_bo(_buf); +struct drm_radeon_gem_wait_user_fence args; +memset(&args, 0, sizeof(args)); +args.handle = bo->handle; +args.ring = RADEON_CS_RING_GFX; +args.offset = offset; +args.value = value; +args.timeout_usec = timeout; +while (drmCommandWrite(bo->rws->fd, DRM_RADEON_GEM_WAIT_USER_FENCE, + &args, sizeof(args)) == -EBUSY); +} + void radeon_bomgr_init_functions(struct radeon_drm_winsys *ws) { ws->base.buffer_get_cs_handle = radeon_drm_get_cs_handle; @@ -892,4 +922,8 @@ void radeon_bomgr_init_functions(stru
Re: [Mesa-dev] [PATCH] r600g: Use new kernel interface to wait for fences
On Tuesday 31 January 2012, Michel Dänzer wrote: > On Die, 2012-01-31 at 17:02 +0000, Simon Farnsworth wrote: > > Instead of busywaiting for the GPU to finish a fence, use the new kernel > > interface to wait for fence completion. > > > > If the new kernel interface is unavailable, fall back to busywaiting. > > > > if (timeout != PIPE_TIMEOUT_INFINITE && > > os_time_get() - start_time >= timeout) { > > return FALSE; > > Maybe add something like > > if (rscreen->fences.data[rfence->index]) > return TRUE; > > before the timeout check? Otherwise there may be a false negative if the > fence signalled just before the timeout. I'll fix this - I think I'd prefer to use a ternary operator in the return. > > diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c > > b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c > > index 143dcf9..487fc58 100644 > > --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c > > +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c > > @@ -892,4 +922,9 @@ void radeon_bomgr_init_functions(struct > > radeon_drm_winsys *ws) > > ws->base.buffer_from_handle = radeon_winsys_bo_from_handle; > > ws->base.buffer_get_handle = radeon_winsys_bo_get_handle; > > ws->base.buffer_get_virtual_address = radeon_winsys_bo_va; > > +if (ws->info.drm_major > 2 || > > +(ws->info.drm_major == 2 && ws->info.drm_minor >= 15)) > > +ws->base.buffer_wait_fence = radeon_winsys_bo_wait_fence; > > +else > > +ws->base.buffer_wait_fence = radeon_winsys_bo_wait_fence_nokernel; > > } > > We have no idea what kind of API a hypothetical major version > 2 might > have, so I think it's better to only check for (ws->info.drm_minor >= > 15) here. > I'll make that change as well and respin. -- Simon Farnsworth Software Engineer ONELAN Limited http://www.onelan.com/ signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] Do not use sched_yield() on Linux
On Tuesday 31 January 2012, Alan Swanson wrote: > As discussed back in 2003, sched_yield() on Linux is no longer > equivalent to other POSIX variations. From a LWN article; "This call > used to simply move the process to the end of the run queue; now it > moves the process to the "expired" queue, effectively cancelling the > rest of the process's time slice. So a process calling sched_yield() now > must wait until all other runnable processes in the system have used up > their time slices before it will get the processor again." > > However its use on Linux has sneaked back in causing suboptimal > performance such as reported by Simon Farnsworth on r600g. Use sleep on > Linux instead. > > Signed-off-by: Alan Swanson > I should note that I'm running my 3D app real-time (SCHED_RR) - the performance pain I was facing is that I got to hog one CPU core waiting for the GPU, while having several (three or four, typically) other processes (SCHED_OTHER priority) waiting for free CPU time. As the E-350 is not overblessed with CPU time in the first place, this hurt. I've fixed my problem by a different route (in other patches sent to the list); I no longer spin at all when waiting for fences to complete, instead waiting for an interrupt from the GPU. Having said that, I'll rebase your patch on top of my changes, and confirm that it doesn't break things for me. -- Simon Farnsworth Software Engineer ONELAN Limited http://www.onelan.com/ signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g: Use new kernel interface to wait for fences
Instead of busywaiting for the GPU to finish a fence, use the new kernel interface to wait for fence completion. If the new kernel interface is unavailable, fall back to busywaiting. Signed-off-by: Simon Farnsworth --- This builds on top of the kernel interface I add in Message-Id: <1328029169-12729-1-git-send-email-simon.farnswo...@onelan.co.uk> and depends on libdrm's copy of radeon_drm.h being synced with the changes made in that patch. As with the kernel patch, I'm working atop Jerome's 2D tiling work, so this may not apply to any current tree. I'm happy to rebase against another tree on request, though. src/gallium/drivers/r600/r600_hw_context.c|2 +- src/gallium/drivers/r600/r600_pipe.c | 12 +++- src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 35 + src/gallium/winsys/radeon/drm/radeon_winsys.h | 16 +++ 4 files changed, 56 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index 8eb8e6d..35a57a7 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -1618,7 +1618,7 @@ void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource *fen ctx->pm4[ctx->pm4_cdwords++] = EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_TS_EVENT) | EVENT_INDEX(5); ctx->pm4[ctx->pm4_cdwords++] = va & 0xUL; /* ADDRESS_LO */ /* DATA_SEL | INT_EN | ADDRESS_HI */ - ctx->pm4[ctx->pm4_cdwords++] = (1 << 29) | (0 << 24) | ((va >> 32UL) & 0xFF); + ctx->pm4[ctx->pm4_cdwords++] = (1 << 29) | (2 << 24) | ((va >> 32UL) & 0xFF); ctx->pm4[ctx->pm4_cdwords++] = value; /* DATA_LO */ ctx->pm4[ctx->pm4_cdwords++] = 0; /* DATA_HI */ ctx->pm4[ctx->pm4_cdwords++] = PKT3(PKT3_NOP, 0, 0); diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index c38fbc5..12c5bf5 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -595,7 +595,6 @@ static boolean r600_fence_finish(struct pipe_screen *pscreen, struct r600_screen *rscreen = (struct r600_screen *)pscreen; struct r600_fence *rfence = (struct r600_fence*)fence; int64_t start_time = 0; - unsigned spins = 0; if (timeout != PIPE_TIMEOUT_INFINITE) { start_time = os_time_get(); @@ -605,13 +604,10 @@ static boolean r600_fence_finish(struct pipe_screen *pscreen, } while (rscreen->fences.data[rfence->index] == 0) { - if (++spins % 256) - continue; -#ifdef PIPE_OS_UNIX - sched_yield(); -#else - os_time_sleep(10); -#endif + rscreen->ws->buffer_wait_fence(rscreen->fences.bo->buf, + rfence->index << 2, + 0, + timeout); if (timeout != PIPE_TIMEOUT_INFINITE && os_time_get() - start_time >= timeout) { return FALSE; diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c index 143dcf9..487fc58 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c @@ -879,6 +879,36 @@ static uint64_t radeon_winsys_bo_va(struct pb_buffer *buffer) return bo->va; } +/* No kernel support for doing this faster - just spin */ +static void radeon_winsys_bo_wait_fence_nokernel(struct pb_buffer *buf, +unsigned offset, +uint32_t value, +uint64_t timeout) +{ +#ifdef PIPE_OS_UNIX +sched_yield(); +#else +os_time_sleep(10); +#endif +} + +static void radeon_winsys_bo_wait_fence(struct pb_buffer *_buf, + unsigned offset, + uint32_t value, + uint64_t timeout) +{ +struct radeon_bo *bo = get_radeon_bo(_buf); +struct drm_radeon_gem_wait_user_fence args; +memset(&args, 0, sizeof(args)); +args.handle = bo->handle; +args.ring = RADEON_CS_RING_GFX; +args.offset = offset; +args.value = value; +args.timeout_usec = timeout; +while (drmCommandWrite(bo->rws->fd, DRM_RADEON_GEM_WAIT_USER_FENCE, + &args, sizeof(args)) == -EBUSY); +} + void radeon_bomgr_init_functions(struct radeon_drm_winsys *ws) { ws->base.buffer_get_cs_handle = radeon_drm_get_cs_handle; @@ -892,4 +922,9 @@ void radeon_bomgr_init
Re: [Mesa-dev] [RFC] r600-r800 2D tiling
(resending due to my inability to work my e-mail client - I neither cc'd Jerome, nor used the correct identity, so the original appears to be held in moderation). On Thursday 12 January 2012, Jerome Glisse wrote: > Hi, > > I don't cross post as i am pretty sure all interested people are reading > this mailing-list. > > Attached is kernel, libdrm, ddx, mesa/r600g patches to enable 2D tiling > on r600 to cayman. I haven't yet done a full regression testing but 2D > tiling seems to work ok. I would like to get feedback on 2 things : > > - the kernel API I notice that you don't expose all the available Evergreen parameters to user control (TILE_SPLIT_BYTES, NUM_BANKS are both currently fixed by the kernel). Is this deliberate? It looks like it's leftovers from a previous attempt to force Evergreen's flexible 2D tiling to behave like R600's fixed-by-hardware 2D tiling. > - using libdrm/radeon as common place for surface allocation > > The second question especialy impact the layering/abstraction of gallium > btw winsys as it make libdrm/radeon_surface API a part of the winsys. > The ddx doesn't need as much knowledge as mesa (pretty much the whole > mipmap tree is pointless to the ddx). So anyone have strong feeling > about moving the whole mipmap tree computation to this common code ? > I'm in favour - it means that all the code relating to the details of how modern Radeons tile surfaces is in one place. I've looked at the API you introduce to handle this, and it should be very easy to port to a non-libdrm platform - the only element of the API that's currently tied to libdrm is radeon_surface_manager_new, so a new platform shouldn't struggle to adapt it. I do have one question; how are you intending to handle passing the tiling parameters from the DDX to Mesa for GLX_EXT_texture_from_pixmap? Right now, it works because the DDX uses the surface manager's defaults for tiling, as does Mesa; I would expect Mesa to read out the parameters as set in the kernel and use those. At a future date, I can envisage the DDX wanting to choose a different tiling layout for DRI2 buffers, or XComposite backing pixmaps (e.g. because someone's benchmarked it and found that choosing something beyond the bare minimum that meets constraints improves performance); it would be a shame if we can't do this because Mesa's not flexible enough. -- Simon Farnsworth Software Engineer ONELAN Limited http://www.onelan.com/ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: Fix tiling alignment to match docs
On Tuesday 13 December 2011, Simon Farnsworth wrote: > R600 and Evergreen have different tiling requirements. Fix Mesa to match the > documented requirements. > > Signed-off-by: Simon Farnsworth > --- > > This doesn't fix my problems with enabling macro tiling, but it does help > somewhat. I also need to fix tile shape in the kernel and alignment in the > DDX to avoid misrendering, at which point I see CP lockups instead. > > I'm therefore sending this, the DDX patch, and the kernel patch out in order > to avoid getting stuck in a world where I have an 80% fix, someone else has > an 80% fix, and if only we talked, we'd have a 100% fix. > Jerome has told me that he's got closer to fixing 2D tiling than I have - I'm going to collaborate with him on fixing his remaining bugs rather than continue with these patches. -- Simon Farnsworth Software Engineer ONELAN Limited http://www.onelan.com/ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g: Fix tiling alignment to match docs
R600 and Evergreen have different tiling requirements. Fix Mesa to match the documented requirements. Signed-off-by: Simon Farnsworth --- This doesn't fix my problems with enabling macro tiling, but it does help somewhat. I also need to fix tile shape in the kernel and alignment in the DDX to avoid misrendering, at which point I see CP lockups instead. I'm therefore sending this, the DDX patch, and the kernel patch out in order to avoid getting stuck in a world where I have an 80% fix, someone else has an 80% fix, and if only we talked, we'd have a 100% fix. src/gallium/drivers/r600/evergreen_state.c |2 +- src/gallium/drivers/r600/r600_texture.c| 140 2 files changed, 103 insertions(+), 39 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index d0c02d5..d6b97da 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -1447,7 +1447,7 @@ static void evergreen_cb(struct r600_pipe_context *rctx, struct r600_pipe_state offset >> 8, 0x, &rtex->resource, RADEON_USAGE_READWRITE); r600_pipe_state_add_reg(rstate, R_028C78_CB_COLOR0_DIM + cb * 0x3C, - 0x0, 0x, NULL, 0); + S_028C78_WIDTH_MAX(surf->base.width) | S_028C78_HEIGHT_MAX(surf->base.height), 0x, NULL, 0); r600_pipe_state_add_reg(rstate, R_028C70_CB_COLOR0_INFO + cb * 0x3C, color_info, 0x, &rtex->resource, RADEON_USAGE_READWRITE); diff --git a/src/gallium/drivers/r600/r600_texture.c b/src/gallium/drivers/r600/r600_texture.c index 6143133..16ffe23 100644 --- a/src/gallium/drivers/r600/r600_texture.c +++ b/src/gallium/drivers/r600/r600_texture.c @@ -88,23 +88,42 @@ static unsigned r600_get_block_alignment(struct pipe_screen *screen, unsigned pixsize = util_format_get_blocksize(format); int p_align; - switch(array_mode) { - case V_038000_ARRAY_2D_TILED_THIN1: - p_align = MAX2(rscreen->tiling_info.num_banks, - (((rscreen->tiling_info.group_bytes / 8) / pixsize) * rscreen->tiling_info.num_banks)) * 8; - /* further restrictions for scanout */ - p_align = MAX2(rscreen->tiling_info.num_banks * 8, p_align); - break; - case V_038000_ARRAY_1D_TILED_THIN1: - p_align = MAX2(8, (rscreen->tiling_info.group_bytes / (8 * pixsize))); - /* further restrictions for scanout */ - p_align = MAX2((rscreen->tiling_info.group_bytes / pixsize), p_align); - break; - case V_038000_ARRAY_LINEAR_ALIGNED: - case V_038000_ARRAY_LINEAR_GENERAL: - default: - p_align = MAX2(64, rscreen->tiling_info.group_bytes / pixsize); - break; + if (rscreen->chip_class >= EVERGREEN) + { + switch(array_mode) { + case V_038000_ARRAY_2D_TILED_THIN1: + /* Kernel uses bank width of 8, macro tile aspect of 1 */ + p_align = rscreen->tiling_info.num_channels * 8 * 1 * 8; + break; + case V_038000_ARRAY_1D_TILED_THIN1: + p_align = MAX2(8, rscreen->tiling_info.group_bytes / + (8 * 1 * pixsize * 1)); + break; + case V_038000_ARRAY_LINEAR_ALIGNED: + p_align = MAX2(64, rscreen->tiling_info.group_bytes / pixsize); + break; + case V_038000_ARRAY_LINEAR_GENERAL: + default: + p_align = 8; + } + } + else + { + switch(array_mode) { + case V_038000_ARRAY_2D_TILED_THIN1: + p_align = MAX2(rscreen->tiling_info.num_banks * 8, + ((rscreen->tiling_info.group_bytes / 8) / pixsize) * rscreen->tiling_info.num_banks); + break; + case V_038000_ARRAY_1D_TILED_THIN1: + p_align = MAX2(8, (rscreen->tiling_info.group_bytes / (8 * pixsize))); + break; + case V_038000_ARRAY_LINEAR_ALIGNED: + p_align = MAX2(64, rscreen->tiling_info.group_bytes / pixsize); + break; + case V_038000_ARRAY_LINEAR_GENERAL: + default: + p_align = 1; + } } return p_align; } @@ -115,16 +134,36 @@ static unsigned r600_get_height_alignment(struct pipe_screen *screen, struct r600_screen* rscreen = (st
[Mesa-dev] [PATCH 2/2] r600g: Set tile_type correctly when textures are created
>From discussion on IRC, tile_type should be 1 for depth or stencil textures. On Evergreen, it should also be 1 for some color buffers, but that's handled in the evergreen_cb function when required. Signed-off-by: Simon Farnsworth --- Doesn't fix my tiled scanout fun, but is needed for correctness. src/gallium/drivers/r600/r600_texture.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/src/gallium/drivers/r600/r600_texture.c b/src/gallium/drivers/r600/r600_texture.c index 5ac39aa..6143133 100644 --- a/src/gallium/drivers/r600/r600_texture.c +++ b/src/gallium/drivers/r600/r600_texture.c @@ -391,6 +391,7 @@ r600_texture_create_object(struct pipe_screen *screen, resource->b.b.b.screen = screen; rtex->pitch_override = pitch_in_bytes_override; rtex->real_format = base->format; + rtex->tile_type = util_format_is_depth_or_stencil(base->format) ? 1 : 0; /* We must split depth and stencil into two separate buffers on Evergreen. */ if (!(base->flags & R600_RESOURCE_FLAG_TRANSFER) && -- 1.7.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] r600g: Make tiling alignment restrictions match the DDX
It's unclear exactly what the alignment requirements are for Radeon tiled surfaces, but they differ between the DDX and Mesa. Make Mesa match the DDX. Signed-off-by: Simon Farnsworth --- This basically copies across the DDX versions of the restrictions. They differ, and I'm not sure which one's right. Someone from AMD should review this before it's applied. src/gallium/drivers/r600/r600_texture.c | 22 ++ 1 files changed, 10 insertions(+), 12 deletions(-) diff --git a/src/gallium/drivers/r600/r600_texture.c b/src/gallium/drivers/r600/r600_texture.c index 2d041b0..5ac39aa 100644 --- a/src/gallium/drivers/r600/r600_texture.c +++ b/src/gallium/drivers/r600/r600_texture.c @@ -89,21 +89,21 @@ static unsigned r600_get_block_alignment(struct pipe_screen *screen, int p_align; switch(array_mode) { - case V_038000_ARRAY_1D_TILED_THIN1: - p_align = MAX2(8, - ((rscreen->tiling_info.group_bytes / 8 / pixsize))); - break; case V_038000_ARRAY_2D_TILED_THIN1: p_align = MAX2(rscreen->tiling_info.num_banks, - (((rscreen->tiling_info.group_bytes / 8 / pixsize)) * - rscreen->tiling_info.num_banks)) * 8; + (((rscreen->tiling_info.group_bytes / 8) / pixsize) * rscreen->tiling_info.num_banks)) * 8; + /* further restrictions for scanout */ + p_align = MAX2(rscreen->tiling_info.num_banks * 8, p_align); break; - case V_038000_ARRAY_LINEAR_ALIGNED: - p_align = MAX2(64, rscreen->tiling_info.group_bytes / pixsize); + case V_038000_ARRAY_1D_TILED_THIN1: + p_align = MAX2(8, (rscreen->tiling_info.group_bytes / (8 * pixsize))); + /* further restrictions for scanout */ + p_align = MAX2((rscreen->tiling_info.group_bytes / pixsize), p_align); break; + case V_038000_ARRAY_LINEAR_ALIGNED: case V_038000_ARRAY_LINEAR_GENERAL: default: - p_align = rscreen->tiling_info.group_bytes / pixsize; + p_align = MAX2(64, rscreen->tiling_info.group_bytes / pixsize); break; } return p_align; @@ -121,11 +121,9 @@ static unsigned r600_get_height_alignment(struct pipe_screen *screen, break; case V_038000_ARRAY_1D_TILED_THIN1: case V_038000_ARRAY_LINEAR_ALIGNED: - h_align = 8; - break; case V_038000_ARRAY_LINEAR_GENERAL: default: - h_align = 1; + h_align = 8; break; } return h_align; -- 1.7.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallium: Set renderbuffer's InternalFormat when rendering to texture
When an FBO is rendering to a texture (rather than a renderbuffer), Gallium sets up an internal renderbuffer to handle the rendering, and copies over enough texture state to make this work. InternalFormat was missed out, causing glTexCopyImage to take a slow path unnecessarily. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=41263 Signed-off-by: Simon Farnsworth --- src/mesa/state_tracker/st_cb_fbo.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_fbo.c b/src/mesa/state_tracker/st_cb_fbo.c index 05139ec..4d32158 100644 --- a/src/mesa/state_tracker/st_cb_fbo.c +++ b/src/mesa/state_tracker/st_cb_fbo.c @@ -381,6 +381,7 @@ st_render_texture(struct gl_context *ctx, rb->Width = texImage->Width2; rb->Height = texImage->Height2; rb->_BaseFormat = texImage->_BaseFormat; + rb->InternalFormat = texImage->InternalFormat; /*printf("* render to texture level %d: %d x %d\n", att->TextureLevel, rb->Width, rb->Height);*/ /*printf("* pipe texture %d x %d\n", pt->width0, pt->height0);*/ -- 1.7.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Regression in i965 Mesa driver caused by commit c7c64d97836c71eaf2ee3fc6d384877170b8c844
Hello Kristian, Your commit: commit c7c64d97836c71eaf2ee3fc6d384877170b8c844 Author: Kristian Høgsberg Date: Tue Jun 1 14:33:43 2010 -0400 intel: Fallback to meta if we're asked to CopyTexImage2D from RGB to RGBA The pixel transfer rules state that we must set alpha to 1.0 in this case which we can't easily do with the blitter. We can do to passes: one that sets the alpha to 0xff and one that copies the RGB bits or we can just use the 3D engine. Neither approach seems worth it for this case. has broken my use of Mesa to snapshot the display using an FBO. In my code, I do (where width and height are the size of the screen, snapshot_width and snapshot_height are the target size of my snapshot): glBindFramebufferEXT( GL_FRAMEBUFFER_EXT, framebuffer_object ); glBindRenderbufferEXT( GL_RENDERBUFFER_EXT, renderbuffer ); glRenderbufferStorageEXT( GL_RENDERBUFFER_EXT, GL_RGB8, snapshot_width, snapshot_height ); glFramebufferRenderbufferEXT( GL_FRAMEBUFFER_EXT, GL_COLOR_ATTACHMENT0_EXT, GL_RENDERBUFFER_EXT, renderbuffer ); glBindFramebufferEXT( GL_READ_FRAMEBUFFER_EXT, 0 ); glBlitFramebufferEXT( 0, 0, width, height, 0, snapshot_height, snapshot_width, 0, GL_COLOR_BUFFER_BIT, GL_LINEAR ); glBindFramebufferEXT( GL_FRAMEBUFFER_EXT, 0 ); to get a snapshot of the current screen contents in my FBO. I then have a separate thread (using its own GL context) get the pixel contents of the FBO with glReadPixels() and turn it into a PNG file. With current Mesa master (eb7ef433bbbeabda963e74adf0ef61c47883f292), the glBlitFramebufferEXT() call causes my fullscreen application to get stuck - the front buffer and back buffer at the time of the glBlitFramebufferEXT() call swap back and forth as I call glXSwapBuffers(), but no further changes to screen contents occur. The FBO appears to have the correct contents the first time I do this, but not on future attempts to snapshot the screen. git bisect fingered this commit as the cause - reverting just this commit makes things work as they used to. Is there a better fix (either to my code or to Mesa) that I should work on; I can't glReadPixels() directly from the main framebuffer, as (not entirely surprisingly) this causes my rendering thread to stall for a short but visible period - and while I can stall the snapshotting thread indefinitely, I'm not permitted to stall the rendering thread. I've tried using both GL_RGBA8 and GL_RGB8 renderbuffers, which hasn't helped - but it looks from the code like it's the format of the source buffer (the system framebuffer) that matters, not the format of the destination buffer. -- Thanks in advance for any advice you can offer, Simon Farnsworth ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev